Special $ References In JavaScript's String.replace() Method
In honor of the 4th Annual Regular Expression Day, I thought I would do some further exploration of Regular Expressions. And, as it so happens, I just learned something new about Javascript regular expressions while reading Javascript: The Good Parts by Douglas Crockford. In the book, Crockford outlines the special characters that are available in the String.replace() method. I had always known about the use of $N to denote captured groups; I was not aware, however, that $&, $`, and $' were also available in the replace() method.
I don't have Javascript: The Good Parts in front of me at the moment. However, I was able to double-check the syntax on Jan Goyvaerts' website (co-author of The Regular Expression Cookbook). On his website, Jan outlines the above replacement references as follows:
$& - Refers to the entire text of the current pattern match.
$` - Refers to the text to the left of the current pattern match.
$' - Refers to the text to the right of the current pattern match.
To see these references in action, I set up the following demo. In this code, our replacement execution is completely non-functional; it is meant only to elucidate the values contained within each replacement reference:
<!DOCTYPE html>
<html>
<head>
<title>Using The $ In JavaScript RegEx Replace</title>
<script type="text/javascript">
// Create a test string in which we will match our pattern.
var value = "My number is 212-555-1234.";
// Creat the pattern to match the phone number.
var pattern = new RegExp( "(\\d+)", "g" );
// Replace into the value the special "$"-based matches.
var result = value.replace(
pattern,
"|-- [$&] [$`] [$'] --|"
);
// Output the replacement result to see $ functionality.
console.log( result );
</script>
</head>
<body>
<!-- Left intentionally blank. -->
</body>
</html>
As you can see in the above code, we are matching each individual group of digits within the phone number contained within the source text. When we execute the String.replace() method, we're simply replacing the captured number with each of the above $-based values.
When we run the above code, we get the following console output:
My number is |-- [212] [My number is ] [-555-1234.] --|-|-- [555] [My number is 212-] [-1234.] --|-|-- [1234] [My number is 212-555-] [.] --|.
This is a little bit hard to read, so I'm going to break out the replacement portion of each match on its own line:
|-- [212] [My number is ] [-555-1234.] --|
|-- [555] [My number is 212-] [-1234.] --|
|-- [1234] [My number is 212-555-] [.] --|.
As you can see, the $& referred to the entire matched pattern (in our case, that was also the first captured group and could be referred to as $1). The &` referred to the entire text value of the source to the left of the current match. And, the $' referred to the entire text value of the source text to the right of the current match.
I can understand the usefulness of $& in order to refer to the match without having to employ a captured group. But, to be honest, I can't quite see how I would ever use the $` and $' references in a Javascript regular expression replace. In any case, it's always fun to learn more about how something works, even if the value of it is not immediately evident.
Want to use code from this post? Check out the license.
Reader Comments
Nice one Ben. I can see the usefulness of having the entire match available.
@Ben: They're available in Perl too.
http://www.regular-expressions.info/perl.html
At the beginning of Chapter 7 - Regular Expressions, Crockford says that the JS implementation is very close to the original Bell Labs formulations, with some reinterpretations and extensions from Perl. But I don't know whether these 3 back references came from Bell Labs or Perl.
@Andy,
Word up.
@WebManWalking,
Apparently there are also some other $-based references available, but they are browser extensions and are not universally supported by the browsers.
Regular expressions are literals in ECMAscript. Every regex function is available everywhere.
@John,
I use the literal notation every now and then, but sometimes I find it hard to swallow. I find that I get a lot of weird slash patterns. It's probably just what I'm used to - I find this:
"\\bfoo\\b"
... easier to read:
/\bfoo\b/
The latter just throws me off with the flippy-floppy slashes.
Just personal preference, though.
In a sort of proof-of-concept use of $` and $', you could parse with two regular expressions to get the text between two tags.
You could run the regular expression for the start tag and get $' for everything to the right (or after). With that result you run the regular expression for the end tag and use $` for everything to the left (or before) and you then get the text in between the two tags.
I'm not saying it's the most practical way, but would allow you to find text after some flag you denote as note-worthy. The more practical use of those two, I'm not really sure.
@John,
I kind of see what you're saying. I'd have to play around with it a bit to wrap my head around it.
Wow that just made my life a whole lot easier. It makes for a great string highlighter: