Ask Ben: Counting Spaces In A Given String
How can I get the number of spaces in a string?
This seemingly simple problem does not have the most simple answer. I wish there was some sort of ValueCount() method in ColdFusion, but right, I think that only applies to List (ie. ListValueCount()). Luckily for your particular problem, there is a mostly simple solution. Since you are looking for just spaces, we can strip out everything that is NOT a space and then just get the length of the resultant string:
<cfset intLength = Len(
REReplace(
"You are simply a vision in that dress!",
"[^ ]+",
"",
"ALL"
)
) />
This really only works when you are looking for single characters. If you want to search for all instances of a word, then things get a bit hairy. The easy solution is simple to keep searching the string untill you cannot find any instances.
<!--- The test value. --->
<cfset strTest = "You are the best and the most beautiful person." />
<!--- The target instance. --->
<cfset strTarget = "the" />
<!--- The instance counter. --->
<cfset intCount = 0 />
<!--- Get the initial position. --->
<cfset intPosition = Find( strTarget, strTest, 0 ) />
<!--- Keep searching till no more instances are found. --->
<cfloop condition="intPosition">
<!--- Increment instance counter. --->
<cfset intCount = (intCount + 1)>
<!--- Get the next position. --->
<cfset intPosition = Find(
strTarget,
strTest,
(intPosition + Len( strTarget ))
) />
</cfloop>
<!--- Output the number of target instances. --->
#intCount#
Each time we do a search, we have to increment the counter and then start the search again after the given instance. Not the greatest solution, but it works.
Want to use code from this post? Check out the license.
Reader Comments
Why not leverage Java?
intCount=ArrayLen(strTest.split(strTarget.replaceAll("\W","\$1")))
Erm, make that:
intCount=DecrementValue(ArrayLen(strTest.split(strTarget.replaceAll("\W","\$1"))))
Silly off-by-one error.
Okay, last try, I promise.
<cfset strTest = "You are \the\ best (and) the [most] beautiful girl.">
<cfset strTarget = "\">
<cfset newTest=Chr(1) & strTest & Chr(1)>
<cfset intCount=DecrementValue(ArrayLen(newTest.split(strTarget.replaceAll("(\W)","\\$1"))))>
<cfoutput>#intCount#</cfoutput>
Here's my simple take on it:
<cfset theString = "You are simply a vision in that dress!">
<cfset count = ListLen(theString," ") - 1>
This could be done for phrases as well:
<cfset theString = replace("Today the times are changing, the weather is changing and there is something in the air"," the ","|","all")>
<cfset count = ListLen(theString,"|") - 1>
Obviously it would return wrong results if the phrase is at the begining or the endt of the string. This can easily be fixed by prepending and appending the string with some rubbish phrases.
ps - Ben those spam fighthing math equations are hard on me early in the morning ;)
Rick, Trond,
Excellent suggestions all around. As we can see, there are a number of solutions to this problem, but still, I think this would be an easy method for CF to build in, right?
Trond, good call with the replacing the phrase with the "delimiter". That never even occurred to me. The only red flag I could see is that you might use a delimiter character that is already in the string (and therefore would throw off the count). This of course can be offset by using extrememly rare characters or even by replacing that character out before replacing out the target phrase.
Good stuff all around. Also sorry about the math, but it keeps the SPAM out :)
We can take this further...
<cfset intLen = listLen(reReplaceNoCase(strTarget, "(?:(?!test)[\S\s])+", ",", "ALL")) />
Test, tester, and retest count as one match each, testtest counts as two matches.
<cfset intLen = listLen(reReplaceNoCase(strTarget, "(?:(?!\btest\b)[\S\s])+", ",", "ALL")) />
Test counts as one match, tester, retest, and testtest do not count as matches.
<cfset intLen = listLen(reReplaceNoCase(strTarget, "\b(?:(?!test)[\S\s])+\b", ",", "ALL")) />
Test, tester, retest, and testtest count as one match each.
Or, using my reMatch() UDF (http://badassery.blogspot.com/2007/01/coldfusion-regex-support-udfs-rematch.html), the regexes become even simpler...
<cfset intLen = arrayLen(reMatchNoCase("test", strTarget, 1, "ALL")) />
Test, tester, and retest count as one match each, testtest counts as two matches.
<cfset intLen = arrayLen(reMatchNoCase("\btest\b", strTarget, 1, "ALL")) />
Test counts as one match, tester, retest, and testtest do not count as matches.
<cfset intLen = arrayLen(reMatchNoCase("\b\w*?test\w*\b", strTarget, 1, "ALL")) />
Test, tester, retest, and testtest count as one match each.
Note that I'm not familiar with using the underlying Java regex methods such as split(). I'm sure that at least my first three, non-reMatch()-based examples could be written more elegantly using the Java core. Goddamn CF7's lame regex support and available functions...
Yeah, Java's regex stuff is really cool and very powerful. It can handle most of the regular expression stuff that straight-up CFMX method calls cannot handle. I use them all the time. I find that they are also a good bit faster.
Thanks Ben! I used this to find the first space after the midway point in a document, so that I could split it into near length columns. I seem to end up are your blog posts more often than Adobe LiveDocs...
-Kyle
Excellent post Ben, very simple solution. As Kyle already mentioned in general your blog is way more usefull and interesting then the stuff on adobe's website. You could loose some of the comments in the scripts though, but hey that's just my opinion.