Understanding The TrimWhitespace() Function In Lucee CFML
The other day, when I was looking into which whitespace characters are removed by trim()
, I came across a Lucee CFML function that I hadn't seen before: trimWhitespace()
. The Function doesn't have an in-depth description; and, looking at the Java code didn't immediately clarify the function's behavior. As such, I wanted to try it out for myself in order to see if the function might be useful to me in the future.
To start building up a mental model, I created a <cfsavecontent>
buffer that combined various whitespace and non-space characters in various orders. However, whenever I went to save the file, SublimeText kept trying to trim some of the spaces (which is what I want it to do in most cases). So, instead of using whitespace directly, I used some placeholder characters:
+
→ Space (Chr 32)~
→ Tab (Chr 9)
Then, I replaced these with the proper whitespace character before calling trimWhitespace()
:
<cfsavecontent variable="buffer">
++~++
+__+~+__~+~__~~__~++++
+++++
+~
+__++__+~+__~+~__~~__~
~++++
</cfsavecontent>
<cfscript>
cleaned = buffer
.replace( "+", chr( 32 ), "all" )
.replace( "~", chr( 9 ), "all" )
.trimWhitespace()
.replace( chr( 9 ), "T", "all" )
.replace( chr( 10 ), "N", "all" )
.replace( chr( 32 ), "S", "all" )
;
echo( cleaned );
</cfscript>
As you can see, I have all manner of whitespace character combinations. And, when we run this Lucee CFML code, we get the following output:
N__S__T__T__N__S__S__T__T__N
After going back-and-forth between the input and the output, I think I finally understand the rules:
Any series of whitespace characters that contains a Newline is collapsed down into a single Newline character.
And series of whitespace characters that does not contain a Newline is collapsed down into the first whitespace character in the series.
Ironically, the trimWhitespace()
function doesn't actually "trim" the string (leaving Newlines on both ends in my example). Really, it's "collapsing" whitespace, not trimming it. That said, I do like the fact that it reduces multiple newlines down into a single newline. I can see that being helpful in various text-processing workflows.
Want to use code from this post? Check out the license.
Reader Comments
The java source for Lucee's trimWhitespace() function is available at:
https://github.com/lucee/Lucee/blob/8554dddfffcdc5fdb0c4d9f298c61bc0d6c837d2/core/src/main/java/lucee/runtime/functions/string/TrimWhiteSpace.java#L9
It looks like it filters some common ASCII7 space characters, but not UTF-8 or the non-breaking space (NBSP; ASCII code 160)
I've experienced some abuses where UTF-8 "thin & hair spaces" or "zero-width space/non-joiner/joiner" characters are used. (Comment form spammers attempt to bypass filters by adding some of these non-visible characters in the middle of spammy phrases.)
@James,
I love that we can see Lucee's code! Such a benefit of having it open-source. I can't tell you how many times I've wanted to see what Adobe ColdFusion is doing behind the scenes!
As a related code kata, I've created some additional trimming functions:
blockTrim()
,inlineTrim()
, andtrailingTrim()
:www.bennadel.com/blog/4635-creating-blocktrim-inlinetrim-and-trailingtrim-functions-in-coldfusion.htm
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →