IMPORTANT UPDATE: XML Parsing Is WAY Faster Than ColdFusion Custom Tags

By Ben Nadel

Published 2008-09-04 in ColdFusion — Comments (16)

Earlier today, I posted about how ColdFusion custom tags executed much faster than XML parsing. To make the example more general, I was using XmlSearch() with XPath to get at the XML nodes. I did this because that way, the nature of the XML document could be more variable, just like the nature of ColdFusion custom tags. Tony Petruzzi suggested that removing the XmlSearch() would help a bit. I assumed it would, but at lunch (just now) decided to give it a go.

Here is the updated tag that parses the XML and creates a comma separated values (CSV) file. Notice that rather than using XmlSearch(), I am using the pseudo-array that ColdFusion makes available in XML documents when you refer to XML nodes by tag name:

<!--- Check to see which tag mode we are executing. --->
<cfswitch expression="#THISTAG.ExecutionMode#">

	<cfcase value="Start">

		<!--- Set the path to our output file. --->
		<cfset THISTAG.FilePath = ExpandPath( "xml_data2.csv" ) />

	</cfcase>

	<cfcase value="End">

		<!--- Parse the XML that was generated in this tag. --->
		<cfset THISTAG.XmlData = XmlParse(
			Trim( THISTAG.GeneratedContent )
			) />

		<!---
			Create a string buffer to hold intermediary data so
			we don't have to write to the file just yet.
		--->
		<cfset THISTAG.Buffer = CreateObject(
			"java",
			"java.lang.StringBuffer"
			).Init()
			/>


		<!---
			Loop over rows using the pseudo-array that ColdFusion
			provides when referencing XML nodes by name.
		--->
		<cfloop
			index="THISTAG.RowIndex"
			from="1"
			to="#ArrayLen( THISTAG.XmlData.data.row )#"
			step="1">

			<!--- Get a reference to the current row. --->
			<cfset THISTAG.XmlRow = THISTAG.XmlData.data.row[ THISTAG.RowIndex ] />

			<!---
				Loop over values using the pseudo-array that
				ColdFusion provides when referencing XML nodes
				by name.
			--->
			<cfloop
				index="THISTAG.ValueIndex"
				from="1"
				to="#ArrayLen( THISTAG.XmlRow.value )#"
				step="1">

				<!--- Get a reference to the current value. --->
				<cfset THISTAG.XmlValue = THISTAG.XmlRow.value[ THISTAG.ValueIndex ] />

				<!---
					Add value to string buffer. Add a tab after
					each value (this will leave a tag at the end
					of every line, but I am worried about speed,
					not extra characters).
				--->
				<cfset THISTAG.Buffer.Append(
					JavaCast(
						"string",
						(
							THISTAG.XmlValue.XmlText &
							Chr( 9 )
						))
					) />

			</cfloop>


			<!--- Now that we added the values, add new line. --->
			<cfset THISTAG.Buffer.Append(
				JavaCast( "string", (Chr( 13 ) & Chr( 10 )) )
				) />

		</cfloop>


		<!---
			Our string buffer should contain our CSV data. Now,
			let's write that to the output file.
		--->
		<cffile
			action="write"
			file="#THISTAG.FilePath#"
			output="#THISTAG.Buffer.ToString()#"
			/>

		<!--- Reset the content. --->
		<cfset THISTAG.GeneratedContent = "" />

	</cfcase>

</cfswitch>

The previous version of this used to run at just over 13 seconds. This new version that uses pseudo-xml-arrays runs in about 800 milliseconds!

When I first saw this result, I just assumed something was going wrong. I renamed the CSV file (xml_data2.csv) and ran it again. But sure enough, it ran in a little of 700 milliseconds and the new file (xml_data2.csv) contained all 1,000 rows of data.

Holy Cow! As it turns out, XML Parsing blows the pants off of ColdFusion custom tags when it comes to performance. Obviously, there is going to be an eventual tradeoff as the XML parsing has to be done in-memory, but for 1000 rows, this was INSANELY fast. Two things:

I am shocked at how slow XmlSearch() is! This is good information to know. It was the XmlSearch() alone that add 13 seconds to the processing time in the previous example.
I am a little surprised at how slow ColdFusion custom tags seem to be, comparatively. Over 5 seconds to do what XML parsing did in milliseconds? That's kind of whack.

So any way, sorry for misleading people in my last post. This makes me want to try an experiment where I recode my POI stuff using XML parsing rather than Custom Tags. I wonder if that would make it wicked fast.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/1342

Reader Comments

Raymond Camden Sep 4, 2008 at 3:23 PM

362 Comments

I'm doing some testing on this.

Did you notice that you cleared generatedContent in all 3 layers of your custom tags? That isn't necessary. Only the parent needs to do this. When I removed those lines from your two child tags, the processing time dropped dramatically. Not to < 1 second, but to about 2.2 or 2.5 seconds. About twice as quick.

p.s. An off topic recommendation. To test your code, I had to rename a bunch of "snippet_N.txt" files. This was confusing. In the future, could you provide a zip with the files named right? Also - your code all had headers with <---. No !. This made them show up in the output.

Bobbie Sep 4, 2008 at 4:11 PM

2 Comments

Good post - that graphic is kind of ... NSFW, though. I hope nobody saw it over my shoulder as have a big CRT!

Dave DuPlantis Sep 4, 2008 at 4:13 PM

5 Comments

To echo some comments from the previous post, I'd recommend looking into XSLT as an additional option for this exercise. To touch on Ray's earlier point, for more complex parsing, while XSLT might be better suited to the task, some people may be more comfortable working solely in CF, making it preferable if the speeds are comparable.

If you put together an XSL file, really all you'll need to do in CF is use the XmlTransform function to get your CSV file.

I'd recommend w3schools.com for a quick intro to XSLT and XPath ... I used that site to learn enough about XSL to move data from Oracle to text files or Word docs. Unfortunately it was in Java (as was my related POI experience) and at a previous employer, so I have no code readily available to post, but I'm sure there are others who can post some good XSL files if you wanted more examples.

Tony Petruzzi Sep 4, 2008 at 4:56 PM

44 Comments

@Ben,

Awesome man, just awesome. I had a feeling it would be faster seeing how XPATH lookups are extremely slow in any language.

It's going to blow your mind how fast your POI utility is when you rewrite it.

Ben Nadel Sep 4, 2008 at 6:45 PM

15,996 Comments

@Ray,

Hmmm, when I remove the generated content clearing lines, I am not seeing any increase in speed. Of course, I wouldn't go so far as to say my DEV service is a powerful box :) If you go back and add IN the lines again, does it slow down?

Also, yeah, the code downloading is a bit hacky on the site, I'll admit it. It actually builds the code downloads based on the code in the actual blog post (I am not uploading any separate download file). Therefore, the Snippet.txt files can't have any meaningful name or ordering - they are in the same order as the code in the post. I'll put my thinking cap on to see if I can come up with anything better.

@Bobbie,

She's actually fully clothed and wearing a tub-top, you just can't see... shame shame, where is your mind ;)

@Dave,

Yeah, XSLT is cool. I have some limitted experience with it, but from what I have seen it is cool. I tried to write a tutorial for my former company, if anyone is interested:

www.bennadel.com/index.cfm?dax=blog:952.view

@Tony,

This is good news, but not sure how I want to apply it just yet. The POI system I use doesn't use XML yet, so I am not worried about the XPath performance. However, it does heavily use ColdFusion custom tags; if I take those out, I might see some good performance. We'll see what I try.

Raymond Camden Sep 4, 2008 at 9:07 PM

362 Comments

When I ran your code as is, it actually took like 12-13 seconds on my machine, which I thought was rather beefy, but I was doing quite a bit at that time. But for me, the change was even more dramatic (down 10 seconds).

Ben Nadel Sep 5, 2008 at 8:33 AM

15,996 Comments

@Ray,

That's a pretty big difference in processing time! I wonder what it could be doing? I assume it just resetting some internal buffer for each tag. What version of CF are you running? 8 I assume (me too).

Raymond Camden Sep 5, 2008 at 9:41 AM

362 Comments

Ye, 8.0.1.

Ben Nadel Sep 5, 2008 at 10:16 AM

15,996 Comments

Hmmmm. Not sure why it would be so different.

Bobbie Sep 5, 2008 at 10:58 AM

2 Comments

Thanks for the reply, I am really learning a lot from this site!

Elliott Sprehn Sep 5, 2008 at 12:32 PM

132 Comments

@Ben

You should try using arrayNew(1) and arrayAppend and finally arrayToList() instead of that StringBuffer.

People seem to think that StringBuffer is the "right way" to build up strings, but using an array and arrayToList(buffer,"") is actually faster!

I see about a 30% performance difference for large buffers.

Raymond Camden Sep 5, 2008 at 12:59 PM

362 Comments

Elliott, what you say makes sense to me, but I'm not seeing any speed increases with the default 1k row query Ben's data uses. Did you dramatically increase the size?

Ben Nadel Sep 5, 2008 at 1:39 PM

15,996 Comments

@Elliott,

You make a good point - I (and maybe others) do have a bit of a love affair with the String Buffer. I guess we have been made so afraid of string concatenation that its just fear-based decisions.

However, at the end of the day, both examples use string buffer, so the comparison between XML and ColdFusion custom tags is still valid (I believe).

Bash Sep 5, 2008 at 4:32 PM

3 Comments

@Ray and Ben,

In reference to: "Hmmmm. Not sure why it would be so different."

Could it be the environment (Mac vs. Win)?

Ben Nadel Sep 5, 2008 at 4:35 PM

15,996 Comments

@Bash,

I am on Windows Server.

Elliott Sprehn Sep 5, 2008 at 5:15 PM

132 Comments

@Ray

Yes, the really noticeable difference is in big sets.

That beats the StringBuilder by 30% on my machine. If I bump it up to 8000 instead I see a difference more like 50-100% faster in some cases.

If you look at smaller, like 1000, iterations, then I see stuff like 15-22ms for the Buffer and 7-10ms for the array.

Even if you don't see noticeable differences on your machine for small cases, why use Java objects when CF provides you with a native solution anyway? :)

You also get the benefit of this code working on BD.NET, if that matters to you.

I think the really important thing here though is that coding hoops into your apps to use StringBuilder/StringBuffer is silly. For instance Fusebox uses a StringBuffer and a FakeStringBuffer.cfc to "work around" the fact that not all systems have it, which is silly, since they could have just used an array! :P

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.