The other day, someone contacted me needed help to speed up one of their files. They were writing eCommerce order data to a CSV file and the page kept throwing the error:
The request has exceeded the allowable time limit Tag: cffile
This is a case study on that piece of code and how it can be optimized.
PLEASE NOTE: Because I do not have this database, I cannot run this code. That means that it might be buggy. Just follow the concepts, not the exact syntax.
Here is the original code that was throwing ColdFusion timeout exceptions:
Launch code in new window » Download code as text file »
Looking at this code, there were several things that popped out at me immediately. The first thing was all the ColdFusion CFFile tags. In this case, a CFFile Append action was being called for every single row of the target CSV file. Writing data to the file system is a very costly procedure (when compared to how fast calculations will process). This is not something that we want to constantly be doing. I would suggest minimizing the number of times that we actually go to the file system (within reason).
Instead of using so many ColdFusion CFFile tags, I would recommend using a Java StringBuffer to build the entire data file output before writing it to disk. The beauty of the StringBuffer is that is can build enormous strings without the overhead cost of string concatenation. It does this by not actually building a single string, but instead, keeping track of the many parts of the constructed string. The final string is only built once at the end when the ToString() method is called (on the StringBuffer).
The second thing that drew my attention was the repeated database calls. The way it was constructed, ColdFusion had to query the database for every single order. This could be a huge cost depending on how many orders there are. Instead of doing it this way, I would recommend creating a single query that joins the order information to the order detail information. This is a bit of a tradeoff; on one hand, you are pulling down duplicate order headers, but on the other hand, you are going to save a lot in terms of database communication costs.
In my sample code below, I am using an INNER JOIN to join the orders and the cart information. I am assuming that this is a valid relationship (one requires the other), but if that is not, you would need to change the INNER JOIN to a LEFT OUTER JOIN and update the CFLoop logic a bit. Also, please note that I am using SELECT * only because I do not know the underlying database structure. As part of the optimization process, only pull down the data columns that are required.
Taking those two red flags into account, here is my suggested optimization (again, keep in mind that I could NOT run this to validate syntax and ideas):
Launch code in new window » Download code as text file »
In the final code, we have ONE database call and ONE CFFile tag. I would be shocked if this method was not faster. Hopefully, I can get some feedback (if this advice is taken). Also, on a final note, increasing the CFSetting RequestTimeOut did solve this problem; but, I don't think this would be necessary if the code itself was optimized.
Download Code Snippet ZIP File
Comments (19) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Defining A CSS Selector That Requires A Multi-Class Union
Testing String Equality Of Any Length Happens Instantly In ColdFusion
The string buffer makes a huge difference. Years ago I optimized some ColdFusion code as much as possible in create a csv string and the time just grew relative with the number of records. Hundreds of records took an acceptable amount of time but something like 10k records would take a few minutes, which wasn't acceptable. Luckily CFDJ had an article about the string buffer right around that same time. That cut down large csv creation from minutes down to seconds.
Posted by Daniel Roberts on May 4, 2007 at 9:14 AM
a generic function along the same lines ;)
http://cfzen.instantspot.com/blog/index.cfm/2007/4/18/queryToCsv2-util-function-updated
Posted by Aaron Longnion on May 4, 2007 at 9:30 AM
I did something similar at my previous job but I used the ArrayAppend() method. I did a blog post a while back: http://www.philduba.com/index.cfm/2006/10/20/Improving-Performance-Concatenating-Large-Strings. The string buffer method is good to. In my testing there wasn't a discernible difference between the two methods and my manager was more willing to do a pure CF use than invoke Java components. Outside of the horrible queries in the reports, this helped reduce the report executing times by 30% and sometimes more.
Posted by Phil Duba on May 4, 2007 at 9:33 AM
ArrayAppend()... very interesting, I would not have thought of that. And then, what, you just did like an ArrayToList() at the end (with no delimiter)? That's cool that there is little difference.
It would be interesting to know what the StringBuffer is actually doing underneath the covers.
Posted by Ben Nadel on May 4, 2007 at 10:55 AM
I'd put #Chr(13)##Chr(10)# as the delimiter to get it on separate lines.
Posted by Phil Duba on May 4, 2007 at 11:01 AM
Oh right :) I totally wasn't even thinking straight.
Posted by Ben Nadel on May 4, 2007 at 11:13 AM
"It would be interesting to know what the StringBuffer is actually doing underneath the covers."
What I've read back when I first started using string buffer is that CF is creating a new variable for every cfset. That means if you are doing 10k concatenations CF is actually recreating a variable 10,000 times to perform the operations. The string buffer does the concatenation on single variable without the overhead of recreating the variable every time. Ok, that doesn't totally answer your question.
Posted by Daniel Roberts on May 4, 2007 at 11:19 AM
Thanks Ben,
This runs ten times faster....
Posted by Bruce Wrighter on May 4, 2007 at 11:55 AM
@Dan,
So you think it is still just storing it in a string variable? I am interested in the data structure underneath - is it a string, is it some sort of dynamic length collection?
@Bruce,
Sweeeeet! Glad I could help in some way :)
Posted by Ben Nadel on May 4, 2007 at 12:31 PM
One additional suggestion which will speed things up even more is to write the file progressively using <cffile action="append"...>.
The benefit of this is that CF doesn't retain the entire generated output in memory and thus doesn't need to manipulate that entire string object when it writes a new line. It is significantly faster for large queries.
Adding 'fixnewline="true"' removes the need to specifiy the #chr(13)# or #chr(10)# as the new line delimiter since CF adds the line break on the append.
-J
Posted by Jon Clausen on May 7, 2007 at 10:29 AM
Sorry. I'm a dork. I was only on my first cup of coffee and didn't read the case for using the string buffer correctly. :-) Your solution is better. My bad.
-J
Posted by Jon Clausen on May 7, 2007 at 11:00 AM
@Jon,
I think there might be a case for both situations. While File writing is a costly operation, you don't want to build massive amounts of data in the server's RAM as that will choke the machine. What might be good to use is a combination of BufferedFileOutput and StringBuffer. This level of optimization goes beyond my understanding of how all the file decorators works, but I suspect it would be the best of both worlds.
Posted by Ben Nadel on May 7, 2007 at 11:06 AM
There's a CFX tag CFX_QueryWrite which does a similar thing. I have used it since CF5 days. You can find it at Adobe Exchange.
And its very fast - takes the query name and a delimiter as a parameter.
Posted by DK on May 9, 2007 at 6:37 PM
Please correct me if I'm wrong but ColdFusion already presents the string buffer capability through cfsavecontent? Thats what I have used to solve a similar problem with the exact same results: a huge increase in speed.
Posted by Dave Nellis on Jul 25, 2007 at 6:50 PM
@Dave,
I am not sure how CFSaveContent is implemented under the covers. I am not sure how it would StringBuffer. I think it is just writing to an alternate content buffer. However, you will probably see speed improvements over standard string concatenation because the alternate content buffer probably acts in similar way for optimization???
Posted by Ben Nadel on Jul 26, 2007 at 7:00 AM
Hi there,
I've used a similar (**coughs** identical) method to create a csv file, but have ran afoul of Excel's 'helpful' cell formatting - I have columns from a dataset in the form of number-hyphen-number, eg. 12-1. When opening the CSV in Excel, it decides to format this to 12-Jan.
The CSV file itself is ok and the data remains 12-1. Any pointers as to how to force Excel to regard these cells as text as opposed to dates?
Posted by Paul McCombie on Mar 12, 2008 at 11:10 AM
@Paul,
I am pretty sure Excel has a Data Import Wizard. In that, you can select the columns to import and I believe format the data. Of course, this has to be done manually since this is a feature / issue with Excel, not with the CSV. The whole point of a CSV is that is has very little formatting.
Posted by Ben Nadel on Mar 12, 2008 at 11:20 AM
Yeh, that's what I was thinking. The page was producing an Excel worksheet using MS' Open XML nonsense, which allows you to format columns/rows/cells til your hearts' content. But the bl**dy client's decided (after it's finished) that they want it in CSV.
Posted by Paul McCombie on Mar 12, 2008 at 11:27 AM
@Paul,
If that's what they want, then that's what they want. Of course, that has implications that they have to deal with.
Posted by Ben Nadel on Mar 12, 2008 at 11:32 AM