Skip to main content
Ben Nadel at cf.Objective() 2011 (Minneapolis, MN) with: James Spoonmore
Ben Nadel at cf.Objective() 2011 (Minneapolis, MN) with: James Spoonmore

OWASP Java Encoder Project Recommends Using Both URL and Attribute Encoding For HREF Attributes

By
Published in Comments (2)

In ColdFusion, whenever I'm constructing a dynamic URL, I always run the dynamic parts through the encodeForUrl() function in order to maintain URL integrity. But, if I then use that dynamic URL to populate an anchor tag's href attribute, I end up operating in nested contexts; and, I'm always left wondering if I should have used the encodeForHtmlAttribute() function instead. For more insight, I went to the OWASP Java Encoder documentation; and, according to their "Common Mistakes" section, I should actually be using both encoding methods.

Here's what the OWASP (Open Worldwide Application Security Project) Java Encoder documentation says:

There will be situations where you use a URL in different contexts. The most common one would be adding it to an href or src attribute of an <a> tag. In these scenarios, you should do URL encoding, followed by HTML attribute encoding.:

url = "https://site.com?data=" + urlencode(parameter)
<a href='attributeEncode(url)'>link</a>

To translate this into a ColdFusion / CFML scenario, it would look like this:

<cfscript>

	site = "https://www.bennadel.com/";
	keywords = """regular expressions""";

	// When constructing the URL, I will always encode the dynamic URI component in order
	// to escape any content that would cause a single URI component to be misinterpreted
	// as multiple components or other special characters.
	href = "https://www.google.com/search?q=#encodeForUrl( 'site:#site#' )#+#encodeForUrl( keywords )#";

</cfscript>
<cfoutput>

	<!--- Then, I would normally just include that URL into my href. --->
	<p>
		<a href="#href#">Search site</a>
	</p>
	<!---
		But, the OWASP Java Encoder project requests that the given URL be further encoded
		for an HTML attribute context.
	--->
	<p>
		<a href="#encodeForHtmlAttribute( href )#">Search site</a>
	</p>

</cfoutput>

Notice that in the latter <a> tag, I'm passing the href value through ColdFusion's encodeForHtmlAttribute() function. Doing this results in the following HTML output:

<p>
	<a href="https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fwww.bennadel.com%2F+%22regular+expressions%22">Search site</a>
</p>
<p>
	<a href="https&#x3a;&#x2f;&#x2f;www.google.com&#x2f;search&#x3f;q&#x3d;site&#x25;3Ahttps&#x25;3A&#x25;2F&#x25;2Fwww.bennadel.com&#x25;2F&#x2b;&#x25;22regular&#x2b;expressions&#x25;22">Search site</a>
</p>

As you can see, the double-encoding leads to many more characters being escaped within the HTML source.

The OWASP Java Encoder documentation doesn't offer much insight into what vulnerability is exposed in the former approach and then mitigated in the latter approach. Only that the dynamic values are technically operating in nested contexts (a URL inside an attribute); and therefore, should be escaped for both contexts.

I tried to get ChatGPT to explain it to me, and all it kept saying is that the browser might interpret %22 as an early termination of the quoted attribute value. However, it seems that no modern browser (that I have access to) will actually do such a thing. As such, I could not get ChatGPT to give me a reproducible XSS attack when only encodeForUrl() was applied (and which was then mitigated when encodeForHtmlAttribute() was subsequently applied).

I kept telling ChatGPT that this wasn't reproducible (with the examples it was giving me); and we just kept chatting around in circles. Honestly, this is pretty typical of my ChatGPT experience.

To be clear, I always use the encodeForHtmlAttribute() function when I'm embedded untrusted content within an attribute. For example, if I have to populate the value attribute of an input form field:

<input value="#encodeForHtmlAttribute( value )#">

Or, when I'm outputting the content of an Open Graph (OG) meta tag:

<meta property="og:title" content="#encodeForHtmlAttribute( value )#">

The usage here is critical because the given value could be anything. The value doesn't have to be malicious—simply having a value with an embedded quote character is sufficiently problematic and warrants the use of encodeForHtmlAttribute().

My lack of clarity only comes from the use case of a dynamic href or src attribute in which I'm explicitly composing the URL with dynamic parts. My core question is, once I use the encodeForUrl() function, is the resultant content still able to break out of the subsequent attribute context?

Is this possibly a hold-over from some really old browsers? After all, the OWASP Java Encoder project has been around for a really long time (I believe that the OWASP ESAPI project, which is the predecessor to the Encoder project, came out in 2006).

What are other people doing? Are y'all double-encoding your dynamic URLs in your href attributes?

Want to use code from this post? Check out the license.

Reader Comments

15,902 Comments

One of the things that makes this topic all the more confusing is that while the OWASP Java Encoder project says you should double-encode href attributes in the "common mistakes" section, they don't actually do this in any of their examples.

If you look at their GitHub Wiki (see section 2), you'll see that they only encode the URL parameter in their "rest parameters" section.

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel