Always Include Charset With fileRead() In ColdFusion
Lately, I've been having trouble with Russian spam being posted to this blog. You all don't see it because it goes through comment moderation first; but, it really shouldn't even be getting that far (through to the moderation step) - it should be getting blocked by my automatic content evaluation. This morning, I finally starting digging through my code to find the logic gap and realized that it was a problematic fileRead()
call. Unfortunately, this code pre-dates my understanding of character encoding, and was missing the charset
argument.
When you post a comment to this blog, before anything else significant happens, I run the comment through a whole lot of Regular Expression (RegEx) pattern matching. This barrage of patterns has been built-up over time in response to the spam that I see posted. I maintain these patterns in a .txt
file in which each line represents an individual RegEx pattern source.
For example, a portion of this file looks like this:
viagra|cialis|sildenafil|tadalafil
printer.?repair
laptop.?battery
ugg.?(boot|shoe)
As I've started to get Russian spam, I've been adding Russian-based patterns to this text file. And, those patterns have been working fine in my local development environment which is a nix-based Docker container. But, once I deployed these patterns to production - a Windows-based VPS - they stopped working.
Here's a snippet of my code that is loading the patterns from the .txt
file during ColdFusion application initialization:
component {
// ... truncated code ...
private array function loadAndCompilePatterns( required string filepath ) {
var patterns = fileRead( filepath )
.listToArray( chr( 13 ) & chr( 10 ) )
.map(
( patternText ) => {
var pattern = createObject( "java", "java.util.regex.Pattern" )
.compile( "(?i)#patternText#" )
;
return( pattern );
}
)
;
return( patterns );
}
}
Notice that I have no charset
included with my fileRead()
invocation:
fileRead( filepath )
In my Docker container, the Russian characters worked fine. But, once this ColdFusion code made its way to the Windows server, it seems that the Russian characters weren't being decoded properly; and, were no longer being caught by my pattern matching.
To fix this, I just included utf-8
in the fileRead()
call:
fileRead( filepath, "utf-8" )
With this update, my ColdFusion code - on the Windows server - was able to read-in the Russian characters properly, compile the Java Pattern
objects, and is now successfully blocking Russian spam before it even gets to the comment moderation step.
Long-story short - always include the charset
argument when you are performing a fileRead()
operation in ColdFusion. In fact, any time you are reading or writing text data, you should include the charset
.
Want to use code from this post? Check out the license.
Reader Comments
Thank you for this! I had some Japanese encoding go weird but everything worked fine locally. I also had to apply this to fileWrite() as well. Figured it had something to do with windows server.
@Tyler,
My pleasure! It's one of those super subtle bugs because it doesn't "break", per se, it just doesn't work 😆
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →