Creating A ColdFusion Closure-Based Wrapper To Java's Pattern Matcher For Regular Expression String Replacement
CAUTION: This is primarily a note to self.
Over the weekend, I was doing some Regular Expression pattern matching in ColdFusion and I wanted to use a function that behaved like JavaScript's String.prototype.replace() method. That is, I wanted a function that took a Regular Expression pattern and a callback and then used that callback to calculate a replacement string on a per-match basis. As it turns out, I did this very thing in an old exploration of Closures in ColdFusion 10 Beta; but, I had a heck of a hard time finding it. So, I just wanted to break the closure-based Regular Expression pattern matcher out into its own post - give it the love it deserves.
If you're not familiar with JavaScript's String.prototype.replace() method, it takes a Regular Expression pattern and a callback. It then finds all of the pattern matches in the context String; and, for each match, it passes all the captured groups into the callback. The callback then returns another String which it would like to use as the replacement for the matched substring. Ultimately, the .replace() method returns a new string that composes all of the individual replacements merged into the original context string.
An approach like this provides much more flexibility when compared to ColdFusion's reReplace() and reReplaceNoCase() methods because your logic can be influenced by each individual match rather than just a single pattern. However, unlike the JavaScript version, there is no "context" string in ColdFusion (at least not in ColdFusion 10). As such, my version will have to accept the target string as an argument:
jreReplaceEach( targetText, patternText, callback ) :: String
... where the Callback will receive the following invocation arguments on each pattern match:
- The full match.
- Each captured groups as spread arguments (2...N).
- The 1-based offset of the match.
- The full target text.
To see this in action, I'm going to take the content of a poem that uses lower-case characters and then upper-cases the first character of each line:
<cfscript>
// Create a poem with majority lower-case characters.
content = "
roses are red,
violets are blue,
ColdFusion is the bee's knees,
and so are you!
";
// Let's transform the poem into one in which the first character of each line is
// upper-cased. In this case, we are taking into account that each line may contain
// leading white-space.
// --
// NOTE: In the following pattern, I could have made the [a-z] group mandatory and
// the omitted the isNull() check. However, I wanted to demonstrate that the captured
// group could be optional and would be passed-in as undefined.
replacement = jreReplaceEach(
content,
"(?m)^(\s*)([a-z])?",
function ( $0, leadingSpaces, firstCharacter ) {
if ( isNull( firstCharacter ) ) {
return( $0 );
} else {
return( ucase( firstCharacter ) );
}
}
);
// NOTE: Using "text" format so we can see the white-space.
writeDump( var = replacement, format = "text" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I use Java's Pattern / Matcher libraries to replace matched patterns using the
* given operator function.
*
* @targetText I am the text being scanned.
* @patternText I am the Java Regular Expression pattern used to locate matches.
* @operator I am the Function or Closure used to provide the match replacements.
* @output false
*/
public string function jreReplaceEach(
required string targetText,
required string patternText,
required function operator
) {
var matcher = createObject( "java", "java.util.regex.Pattern" )
.compile( javaCast( "string", patternText ) )
.matcher( javaCast( "string", targetText ) )
;
var buffer = createObject( "java", "java.lang.StringBuffer" ).init();
// Iterate over each pattern match in the target text.
while ( matcher.find() ) {
// When preparing the arguments for the operator, we need to construct an
// argumentCollection structure in which the argument index is the numeric
// key of the argument offset. In order to simplify overlaying the pattern
// group matching over the arguments array, we're simply going to keep an
// incremented offset every time we add an argument.
var operatorArguments = {};
var operatorArgumentOffset = 1; // Will be incremented with each argument.
var groupCount = matcher.groupCount();
// NOTE: Calling .group(0) is equivalent to calling .group(), which will
// return the entire match, not just a capturing group.
for ( var i = 0 ; i <= groupCount ; i++ ) {
operatorArguments[ operatorArgumentOffset++ ] = matcher.group( javaCast( "int", i ) );
}
// Including the match offset and the original content for parity with the
// JavaScript String.replace() function on which this algorithm is based.
// --
// NOTE: We're adding 1 to the offset since ColdFusion starts offsets at 1
// where as Java starts offsets at 0.
operatorArguments[ operatorArgumentOffset++ ] = ( matcher.start() + 1 );
operatorArguments[ operatorArgumentOffset++ ] = targetText;
var replacement = operator( argumentCollection = operatorArguments );
// In the event the operator doesn't return a value, we'll assume that the
// intention is to replace the match with nothing.
if ( isNull( replacement ) ) {
replacement = "";
}
// Since the operator is providing the replacement text based on the
// individual parts found in the match, we are going to assume that any
// embedded group reference is coincidental and should be consumed as a
// string literal.
matcher.appendReplacement(
buffer,
matcher.quoteReplacement( javaCast( "string", replacement ) )
);
}
matcher.appendTail( buffer );
return( buffer.toString() );
}
</cfscript>
As you can see, we're using a multi-line Regular Expression pattern in order to target individual lines of text in the poem. Then, we're replacing the first character of each line with its upper-cased counterpart. And, when we run the above code, we get the following output:
Works like a charm! I was able to inspect each individual Regular Expression pattern match in my ColdFusion closure and then return a customized replacement string.
Working with Regular Expression patterns always make me happy, like a pig in slop. And now, hopefully when I need to remember how to use ColdFusion closures to leverage the Java Pattern Matching library, I'll be able to find it a bit more easily.
Want to use code from this post? Check out the license.
Reader Comments