Playing With Java Pattern's Named Capture Groups In ColdFusion
In yesterday's post on the new jreExtract()
method in JRegEx.cfc
, I was aliasing the Regular Expression (RegEx) capture groups with human-friendly labels as part of my demo-output. This reminded me that the Java Pattern
class added named capture groups in Java 7 (circa 2010); and, I've yet to ever try them out. As such, I thought it was high-time to give named capture groups a try in ColdFusion.
Named capture groups give the developer the ability to reference a captured group by either the traditional left-based index or by name. The name is provided in the Regular Expression pattern itself in the form of:
(?<name>X)
... where X
is the pattern being captured and name
is how the group can be referenced (either as a back-reference in the pattern itself or in the Matcher
API). To see this in action, I'm going to try and match an email address and then capture the aspects of the email address in three named capture groups: user, hash, and domain.
In the following code, I'm using the Verbose Regular Expression flag in order to make the pattern easier to read. Note that the middle group - the mailbox hash - is optional.
<cfscript>
// The following pattern uses a VERBOSE Regular Expression flag to allow for comments
// and whitespace to make the pattern easier to read. In this case, we're attempting
// to extract parts of an email address using NAMED CAPTURE GROUPS.
pattern = "(?x)^
(?<user> [^+@]+ )
(
\+ (?<hash> [^@]+ )
)?
@
(?<domain> .+ )
";
extractEmail( "jane.doe@example.com" )
extractEmail( "jane.doe+spam@example.com" )
extractEmail( "j.a.n.e.d.o.e@example.com" )
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I match the Java Regular Expression pattern against the given input and then output
* the NAMED CAPTURE GROUPS.
*/
public void function extractEmail( required string input ) {
var matcher = createObject( "java", "java.util.regex.Pattern" )
.compile( pattern )
.matcher( input )
;
while ( matcher.find() ) {
// NOTE: With named capture groups, the Java Pattern Matcher exposes a way for
// us to access each group by name; but, I don't see any way to use reflection
// to get the list of named groups - only the number of captured groups.
dump(
label = "Input: #input#",
var = [
"user": matcher.group( "user" ) ,
"hash": matcher.group( "hash" ),
"domain": matcher.group( "domain" )
]
);
}
}
</cfscript>
Java's Matcher
class exposes two group()
methods:
group( int )
- access the ordered capture group.group( string )
- access the named capture group.
In this case, we're using the latter version to output the named group. And, when we run this ColdFusion code, we get the following output:
As you can see, each of the named capture groups were accessible by name in the capture extraction.
ASIDE: It might be worth noting that you can only use a name once within a Java Regular Expression. Attempting to use the same name twice will lead to an error.
Named capture groups are capture groups that you would have already been capturing in your Java Regular Expression. So, these don't fundamentally change the way your pattern would be architected. But, they do allow for some self-documentation; and might make the consuming code a bit easier to discern. As such, I can definitely see value for named capture groups in my ColdFusion code.
Want to use code from this post? Check out the license.
Reader Comments
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →