ColdFusion RE NoCase Functions vs. Case Insensitive RegEx Flag
Just out of curiosity, I want to see to see if there was any speed difference between using the NoCase regular expression functions built into ColdFusion (REFindNoCase() and REReplaceNoCase()) and using the standard regular expression functions (REFind() and REReplace()) using the regular expression case-insensitive flag (?i). To test this, I set up a large string and replaced out all of the words.
First, I had to set up the large test string. It has to be large because ColdFusion is so freakin awesome that everything it does on small data is insanely fast.
<!--- Set up a test string. --->
<cfsavecontent variable="strText">
Down the road, in a gym far away
A young man was heard to say,
"No matter what I do, my legs won't grow!"
He tried leg extensions, leg curls, leg presses too.
Trying to cheat, these sissy workouts he'd do!
From the corner of the gym where the big guys train,
Through a cloud of chalk and the midst of pain,
Where the big iron rides high, and threatens lives,
Where the noise is made with big forty-fives,
A deep voice bellowed as he wrapped his knees,
A very big man with legs like trees,
Laughing as he snatched another plate from the stack,
Chalked his hands and monstrous back,
Said, "Boy, stop lying and don't say you've forgotten!
Trouble with you is you ain't been SQUATTIN'!"
</cfsavecontent>
<!---
Now, we are going to repeat the string a number of
times just to make a really big string.
--->
<cfset strText = RepeatString(
REReplace( strText, "[,!'"".-]+", "", "ALL" ),
20
) />
Notice that I am repeating that string 20 times. This should make it a good size. I am also stripping out all the junk characters so that I don't have to deal with them.
Now, I need to get the words to replace. We are going to replace every word in the passage. So, you get all the words, we are going to treat the passage as a list and convert it to an array using several different list delimiters:
<!---
Let's set up an array of words that we want to find
and replace with case insensitivity. Let's use every
single word in the passage as a word to replace
--->
<cfset arrWords = ListToArray(
strText,
" ,'!""-#Chr( 13 )##Chr( 10 )#"
) />
Now, I am going to test the speed of the REFindNoCase() and the REReplaceNoCase() methods. You will notice that in my Replace method, I am only replacing ONE match at time. This is only done so that the replace will take longer and we will be more likely to see a difference in speed.
<!---
Now, let's get a copy of the passage for the first round
of testing. This will test the case insensitive search
using the build in ColdFusion replace function.
--->
<cfset strTargetText = strText />
<!--- Set up the timer. --->
<cftimer label="ColdFusion REReplaceNoCase" type="outline">
<!--- Loop over the words to replace. --->
<cfloop
index="intI"
from="1"
to="#ArrayLen( arrWords )#"
step="1">
<!---
Keep looping while there is still a reference to
this word. We are only going to replace one at a
time to make it slower.
--->
<cfloop condition="REFindNoCase( '\b#UCase( arrWords[ intI ] )#\b', strTargetText )">
<!---
Replce the word with an empty space. When using
the word, convert it to upper case just to make
sure we are doing case-insensitive.
--->
<cfset strTargetText = REReplaceNoCase(
strTargetText,
"\b#UCase( arrWords[ intI ] )#\b",
"",
"ONE"
) />
</cfloop>
</cfloop>
</cftimer>
This ran on average between 1,437 ms.
Now, let's test the REFind() and REReplace() methods. Notice that we are doing the exact same thing, the only difference that we are using the case insensitive flag (?i) instead of the NoCase methods:
<!---
Now, let's get a copy of the passage for the next round
of testing. This will test the case insensitive search
using the case insensitive flag with the build in Cold
Fusion case sensitive search function.
--->
<cfset strTargetText = strText />
<!--- Set up the timer. --->
<cftimer label="ColdFusion REReplace" type="outline">
<!--- Loop over the words to replace. --->
<cfloop
index="intI"
from="1"
to="#ArrayLen( arrWords )#"
step="1">
<!---
Keep looping while there is still a reference to
this word. We are only going to replace one at a
time to make it slower.
--->
<cfloop condition="REFind( '(?i)\b#UCase( arrWords[ intI ] )#\b', strTargetText )">
<!---
Replce the word with an empty space. When using
the word, convert it to upper case just to make
sure we are doing case-insensitive.
--->
<cfset strTargetText = REReplace(
strTargetText,
"(?i)\b#UCase( arrWords[ intI ] )#\b",
"",
"ONE"
) />
</cfloop>
</cfloop>
</cftimer>
This ran on average of about 1,344 ms.
So, there seemed to be a slight speed advantage of using the case insensitive flag over the NoCase methods. However, we had to run a really inefficient test to see any difference. And, the testing was not always consistent. The results above are what were trended, but they were not always consistent. Sometimes the NoCase methods were faster, but on average they were just a bit slower.
Want to use code from this post? Check out the license.
Reader Comments
Just remember two things:
Iteration testing isn't the best test of actual performance--as there's lots of things that can affect performance w/in a loop (such as garbage collection, other processes, etc.)
When you're talking about miniscule differences, don't forget about readability/complexity of code.
I know most of your tests are for curiosity's sake, but one problem we developers get into from time to time is going overboard in trying to squeeze 10ms out of a template only to end up making our code harder to read and maintain.
Dan,
I am 100% in agreement with what you are saying. If I don't see any significant difference (which I am not seeing in this example), I opt for which ever one is the most readable / maintainable. In fact, I do so much of RegEx directly in the Java string itself, I don't have the option of REReplaceNoCase() anyway, in which case, I need the (?i) flag.
So yeah, this is all just for explorations sake. I try to not write about any implications of one thing or another for the very reason you are talking about. So much goes into affecting performance. I just state the facts of the finding, not the "what does that mean for you" type stuff.