Which ASCII Characters Does urlEncodedFormat() Escape In ColdFusion
urlEncodedFormat() is one of those functions that I've been using forever; but, when I stop and think about it, I'm not 100% sure what it actually does. I mean, I know that it prepares a value to be used in a URL; but I don't think I've ever actually read the documentation on it. And, I've definitely never experimented with it. As such, I thought I would do a little "note to self" blog post and see what actually happens when I apply urlEncodedFormat() to individual characters.
This experiment is simple - loop over each character, apply urlEncodedFormat(), and see if the resultant value is different. If so, it means that urlEncodedFormat() encoded the value.
<cfscript>
// NOTE: Only going between 32 and 126 because urlEncodedFormat() appears to
// encode all control characters as well as anything above 127 (inclusive).
for ( i = 32 ; i <= 126 ; i++ ) {
charValue = chr( i );
escapedValue = urlEncodedFormat( charValue, "utf-8" );
// If the two values don't match, it means that urlEncodedFormat() is
// escapeing the value.
if ( compare( charValue, escapedValue ) ) {
writeOutput( "#i# ... #charValue# ... #escapedValue#<br />" );
}
}
</cfscript>
I'm only looping from 32 to 126 because urlEncodedFormat() seems to encode all control characters (most of which are 0-31) and all characters on or above 127. So, for the sake of the demo, I've limited it to the area of the basic ASCII set where things are interesting.
When we run the above code, we get the following output:
32 ... ... %20
33 ... ! ... %21
34 ... " ... %22
35 ... # ... %23
36 ... $ ... %24
37 ... % ... %25
38 ... & ... %26
39 ... ' ... %27
40 ... ( ... %28
41 ... ) ... %29
42 ... * ... %2A
43 ... + ... %2B
44 ... , ... %2C
45 ... - ... %2D
46 ... . ... %2E
47 ... / ... %2F
58 ... : ... %3A
59 ... ; ... %3B
60 ... < ... %3C
61 ... = ... %3D
62 ... > ... %3E
63 ... ? ... %3F
64 ... @ ... %40
91 ... [ ... %5B
92 ... \ ... %5C
93 ... ] ... %5D
94 ... ^ ... %5E
95 ... _ ... %5F
96 ... ` ... %60
123 ... { ... %7B
124 ... | ... %7C
125 ... } ... %7D
126 ... ~ ... %7E
As you can see, urlEncodedFormat() escaped every non-alpha-numeric character. Which is, ironically, exactly what the documentation says:
Generates a URL-encoded string. For example, it replaces spaces with %20, and non-alphanumeric characters with equivalent hexadecimal escape sequences. Passes arbitrary strings within a URL (ColdFusion automatically decodes URL parameters that are passed to a page).
Ok - this all makes sense now. My mental model has been updated.
Want to use code from this post? Check out the license.
Reader Comments
A comparison with EncodeForURL() (CF10+) might be an interesting experiment as well.
@Sean,
Oooh, most excellent suggestion. I actually haven't played around with any of the new encoding methods. I think those are all based on OWASP standards; but, not sure. I'll take a look, thanks!
@Ben,
ColdFusion Security Resources at OWASP.org.
https://www.owasp.org/index.php/ColdFusion_Security_Resources
Out of curiosity I used the above code and swapped out urlEncodedFormat( charValue, "utf-8" ) for encodeForUrl( charValue ) and the results were the same except for char(32). . .
@Tony,
Awesome - thanks for doing that. I'm surprised that it doesn't replace the space with a "+".
@Ben,
You know what, I was looking at it wrong. It indeed does! Oops ;)
@Tony,
Team work! High-five!
This whole urlEncodedFormat() exploration was brought on by the fact that I've recently run into some issues with encoding Amazon S3 object keys in pre-signed URL generation:
www.bennadel.com/blog/2656-url-encoding-amazon-s3-resource-keys-for-pre-signed-urls-in-coldfusion.htm
It looks like you have to undue some of the encoding that urlEncodedFormat() does when using Amazon S3.
There's an extensive character comparison of old vs. new HTML, XML, URL and JS encoders here:
http://damonmiller.github.io/esapi4cf/tutorials/Encoding.html