OWASP Java Encoder Project Recommends Using Both URL and Attribute Encoding For HREF Attributes
In ColdFusion, whenever I'm constructing a dynamic URL, I always run the dynamic parts through the encodeForUrl()
function in order to maintain URL integrity. But, if I then use that dynamic URL to populate an anchor tag's href
attribute, I end up operating in nested contexts; and, I'm always left wondering if I should have used the encodeForHtmlAttribute()
function instead. For more insight, I went to the OWASP Java Encoder documentation; and, according to their "Common Mistakes" section, I should actually be using both encoding methods.
Here's what the OWASP (Open Worldwide Application Security Project) Java Encoder documentation says:
There will be situations where you use a URL in different contexts. The most common one would be adding it to an
href
orsrc
attribute of an<a>
tag. In these scenarios, you should do URL encoding, followed by HTML attribute encoding.:url = "https://site.com?data=" + urlencode(parameter) <a href='attributeEncode(url)'>link</a>
To translate this into a ColdFusion / CFML scenario, it would look like this:
<cfscript>
site = "https://www.bennadel.com/";
keywords = """regular expressions""";
// When constructing the URL, I will always encode the dynamic URI component in order
// to escape any content that would cause a single URI component to be misinterpreted
// as multiple components or other special characters.
href = "https://www.google.com/search?q=#encodeForUrl( 'site:#site#' )#+#encodeForUrl( keywords )#";
</cfscript>
<cfoutput>
<!--- Then, I would normally just include that URL into my href. --->
<p>
<a href="#href#">Search site</a>
</p>
<!---
But, the OWASP Java Encoder project requests that the given URL be further encoded
for an HTML attribute context.
--->
<p>
<a href="#encodeForHtmlAttribute( href )#">Search site</a>
</p>
</cfoutput>
Notice that in the latter <a>
tag, I'm passing the href
value through ColdFusion's encodeForHtmlAttribute()
function. Doing this results in the following HTML output:
<p>
<a href="https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fwww.bennadel.com%2F+%22regular+expressions%22">Search site</a>
</p>
<p>
<a href="https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fwww.bennadel.com%2F+%22regular+expressions%22">Search site</a>
</p>
As you can see, the double-encoding leads to many more characters being escaped within the HTML source.
The OWASP Java Encoder documentation doesn't offer much insight into what vulnerability is exposed in the former approach and then mitigated in the latter approach. Only that the dynamic values are technically operating in nested contexts (a URL inside an attribute); and therefore, should be escaped for both contexts.
I tried to get ChatGPT to explain it to me, and all it kept saying is that the browser might interpret %22
as an early termination of the quoted attribute value. However, it seems that no modern browser (that I have access to) will actually do such a thing. As such, I could not get ChatGPT to give me a reproducible XSS attack when only encodeForUrl()
was applied (and which was then mitigated when encodeForHtmlAttribute()
was subsequently applied).
I kept telling ChatGPT that this wasn't reproducible (with the examples it was giving me); and we just kept chatting around in circles. Honestly, this is pretty typical of my ChatGPT experience.
To be clear, I always use the encodeForHtmlAttribute()
function when I'm embedded untrusted content within an attribute. For example, if I have to populate the value
attribute of an input form field:
<input value="#encodeForHtmlAttribute( value )#">
Or, when I'm outputting the content of an Open Graph (OG) meta tag:
<meta property="og:title" content="#encodeForHtmlAttribute( value )#">
The usage here is critical because the given value
could be anything. The value
doesn't have to be malicious—simply having a value with an embedded quote character is sufficiently problematic and warrants the use of encodeForHtmlAttribute()
.
My lack of clarity only comes from the use case of a dynamic href
or src
attribute in which I'm explicitly composing the URL with dynamic parts. My core question is, once I use the encodeForUrl()
function, is the resultant content still able to break out of the subsequent attribute context?
Is this possibly a hold-over from some really old browsers? After all, the OWASP Java Encoder project has been around for a really long time (I believe that the OWASP ESAPI project, which is the predecessor to the Encoder project, came out in 2006).
What are other people doing? Are y'all double-encoding your dynamic URLs in your href
attributes?
Want to use code from this post? Check out the license.
Reader Comments
One of the things that makes this topic all the more confusing is that while the OWASP Java Encoder project says you should double-encode
href
attributes in the "common mistakes" section, they don't actually do this in any of their examples.If you look at their GitHub Wiki (see section 2), you'll see that they only encode the URL parameter in their "rest parameters" section.
I thought I had this in my post somewhere, but I guess not. Here are the Java Docs for the
Encoder
class - this explicitly lists out all of the characters that get encoded in each method.https://javadoc.io/doc/org.owasp.encoder/encoder/latest/index.html
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →