Updated: Converting A ColdFusion Query To CSV Using QueryToCSV()

Posted May 27, 2008 at 7:59 AM

Tags: ColdFusion

Last week, I posted a ColdFusion user defined function that took a ColdFusion query object and turned it into a comma separated value (CSV) string. I had made heavy use of the Java StringBuffer to build the string. This however, required many method calls and extra string concatenation that apparently was slowing things down. This is a good example of what happens when you try to optimize something without fully weighing the pros and cons of the tools in use.

Using some of the feedback that I received on my function, I am posting an updated version of it here:

 Launch code in new window » Download code as text file »

  • <cffunction
  • name="QueryToCSV"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take a query and convert it to a comma separated value string.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Query"
  • type="query"
  • required="true"
  • hint="I am the query being converted to CSV."
  • />
  •  
  • <cfargument
  • name="Fields"
  • type="string"
  • required="true"
  • hint="I am the list of query fields to be used when creating the CSV value."
  • />
  •  
  • <cfargument
  • name="CreateHeaderRow"
  • type="boolean"
  • required="false"
  • default="true"
  • hint="I flag whether or not to create a row of header values."
  • />
  •  
  • <cfargument
  • name="Delimiter"
  • type="string"
  • required="false"
  • default=","
  • hint="I am the field delimiter in the CSV value."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!---
  • First, we want to set up a column index so that we can
  • iterate over the column names faster than if we used a
  • standard list loop on the passed-in list.
  • --->
  • <cfset LOCAL.ColumnNames = [] />
  •  
  • <!---
  • Loop over column names and index them numerically. We
  • are working with an array since I believe its loop times
  • are faster than that of a list.
  • --->
  • <cfloop
  • index="LOCAL.ColumnName"
  • list="#ARGUMENTS.Fields#"
  • delimiters=",">
  •  
  • <!--- Store the current column name. --->
  • <cfset ArrayAppend(
  • LOCAL.ColumnNames,
  • Trim( LOCAL.ColumnName )
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Store the column count. --->
  • <cfset LOCAL.ColumnCount = ArrayLen( LOCAL.ColumnNames ) />
  •  
  •  
  • <!---
  • Now that we have our index in place, let's create
  • a string buffer to help us build the CSV value more
  • effiently.
  • --->
  • <cfset LOCAL.Buffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
  •  
  • <!--- Create a short hand for the new line characters. --->
  • <cfset LOCAL.NewLine = (Chr( 13 ) & Chr( 10 )) />
  •  
  •  
  • <!--- Check to see if we need to add a header row. --->
  • <cfif ARGUMENTS.CreateHeaderRow>
  •  
  • <!--- Create array to hold row data. --->
  • <cfset LOCAL.RowData = [] />
  •  
  • <!--- Loop over the column names. --->
  • <cfloop
  • index="LOCAL.ColumnIndex"
  • from="1"
  • to="#LOCAL.ColumnCount#"
  • step="1">
  •  
  • <!--- Add the field name to the row data. --->
  • <cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#LOCAL.ColumnNames[ LOCAL.ColumnIndex ]#""" />
  •  
  • </cfloop>
  •  
  • <!--- Append the row data to the string buffer. --->
  • <cfset LOCAL.Buffer.Append(
  • JavaCast(
  • "string",
  • (
  • ArrayToList(
  • LOCAL.RowData,
  • ARGUMENTS.Delimiter
  • ) &
  • LOCAL.NewLine
  • ))
  • ) />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Now that we have dealt with any header value, let's
  • convert the query body to CSV. When doing this, we are
  • going to qualify each field value. This is done be
  • default since it will be much faster than actually
  • checking to see if a field needs to be qualified.
  • --->
  •  
  • <!--- Loop over the query. --->
  • <cfloop query="ARGUMENTS.Query">
  •  
  • <!--- Create array to hold row data. --->
  • <cfset LOCAL.RowData = [] />
  •  
  • <!--- Loop over the columns. --->
  • <cfloop
  • index="LOCAL.ColumnIndex"
  • from="1"
  • to="#LOCAL.ColumnCount#"
  • step="1">
  •  
  • <!--- Add the field to the row data. --->
  • <cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Append the row data to the string buffer. --->
  • <cfset LOCAL.Buffer.Append(
  • JavaCast(
  • "string",
  • (
  • ArrayToList(
  • LOCAL.RowData,
  • ARGUMENTS.Delimiter
  • ) &
  • LOCAL.NewLine
  • ))
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Return the CSV value. --->
  • <cfreturn LOCAL.Buffer.ToString() />
  • </cffunction>

Now, rather than calling StringBuffer.Append() on every field value, I am build an array of fields in a row, then I join that array using ArrayToList() and append it to the ongoing string value that is held in the StringBuffer. So, I am still using the Java StringBuffer to hold the CSV value, but I am using an intermediary ColdFusion array to build up individual line values. I am also escaping any double-quotes that are in the query values - something which I was neglecting to do before. I have to say that this did greatly simplify the amount of code that was in the method. This pleases me; I love to see more simple solutions be more effective.

Some people did tell me that the StringBuffer was completely overkill and that they, in fact, were getting better results using just array-joining for both the fields and the rows. I found this hard to believe as the StringBuffer has proven useful so many times in the past. I had trouble believing that using just array joining would be the fastest method, and so, I did a little testing of my own. I re-wrote the above user defined method to use only arrays and no StringBuffer:

 Launch code in new window » Download code as text file »

  • <cffunction
  • name="QueryToCSV2"
  • access="public"
  • returntype="string"
  • output="false"
  • hint="I take a query and convert it to a comma separated value string.">
  •  
  • <!--- Define arguments. --->
  • <cfargument
  • name="Query"
  • type="query"
  • required="true"
  • hint="I am the query being converted to CSV."
  • />
  •  
  • <cfargument
  • name="Fields"
  • type="string"
  • required="true"
  • hint="I am the list of query fields to be used when creating the CSV value."
  • />
  •  
  • <cfargument
  • name="CreateHeaderRow"
  • type="boolean"
  • required="false"
  • default="true"
  • hint="I flag whether or not to create a row of header values."
  • />
  •  
  • <cfargument
  • name="Delimiter"
  • type="string"
  • required="false"
  • default=","
  • hint="I am the field delimiter in the CSV value."
  • />
  •  
  • <!--- Define the local scope. --->
  • <cfset var LOCAL = {} />
  •  
  • <!---
  • First, we want to set up a column index so that we can
  • iterate over the column names faster than if we used a
  • standard list loop on the passed-in list.
  • --->
  • <cfset LOCAL.ColumnNames = [] />
  •  
  • <!---
  • Loop over column names and index them numerically. We
  • are going to be treating this struct almost as if it
  • were an array. The reason we are doing this is that
  • look-up times on a table are a bit faster than look
  • up times on an array (or so I have been told).
  • --->
  • <cfloop
  • index="LOCAL.ColumnName"
  • list="#ARGUMENTS.Fields#"
  • delimiters=",">
  •  
  • <!--- Store the current column name. --->
  • <cfset ArrayAppend(
  • LOCAL.ColumnNames,
  • Trim( LOCAL.ColumnName )
  • ) />
  •  
  • </cfloop>
  •  
  • <!--- Store the column count. --->
  • <cfset LOCAL.ColumnCount = ArrayLen( LOCAL.ColumnNames ) />
  •  
  •  
  • <!--- Create a short hand for the new line characters. --->
  • <cfset LOCAL.NewLine = (Chr( 13 ) & Chr( 10 )) />
  •  
  • <!--- Create an array to hold the set of row data. --->
  • <cfset LOCAL.Rows = [] />
  •  
  •  
  • <!--- Check to see if we need to add a header row. --->
  • <cfif ARGUMENTS.CreateHeaderRow>
  •  
  • <!--- Create array to hold row data. --->
  • <cfset LOCAL.RowData = [] />
  •  
  • <!--- Loop over the column names. --->
  • <cfloop
  • index="LOCAL.ColumnIndex"
  • from="1"
  • to="#LOCAL.ColumnCount#"
  • step="1">
  •  
  • <!--- Add the field name to the row data. --->
  • <cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#LOCAL.ColumnNames[ LOCAL.ColumnIndex ]#""" />
  •  
  • </cfloop>
  •  
  • <!--- Append the row data to the string buffer. --->
  • <cfset ArrayAppend(
  • LOCAL.Rows,
  • ArrayToList( LOCAL.RowData, ARGUMENTS.Delimiter )
  • ) />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Now that we have dealt with any header value, let's
  • convert the query body to CSV. When doing this, we are
  • going to qualify each field value. This is done be
  • default since it will be much faster than actually
  • checking to see if a field needs to be qualified.
  • --->
  •  
  • <!--- Loop over the query. --->
  • <cfloop query="ARGUMENTS.Query">
  •  
  • <!--- Create array to hold row data. --->
  • <cfset LOCAL.RowData = [] />
  •  
  • <!--- Loop over the columns. --->
  • <cfloop
  • index="LOCAL.ColumnIndex"
  • from="1"
  • to="#LOCAL.ColumnCount#"
  • step="1">
  •  
  • <!--- Add the field to the row data. --->
  • <cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Append the row data to the string buffer. --->
  • <cfset ArrayAppend(
  • LOCAL.Rows,
  • ArrayToList( LOCAL.RowData, ARGUMENTS.Delimiter )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Return the CSV value by joining all the rows together
  • into one string.
  • --->
  • <cfreturn ArrayToList(
  • LOCAL.Rows,
  • LOCAL.NewLine
  • ) />
  • </cffunction>

As you can see, this uses ArrayToList() not only for the individual field values, but for all of the rows. Then, I put both of these to a basic speed test on several thousand rows of data:

 Launch code in new window » Download code as text file »

  • <!--- Create a new part query. --->
  • <cfset qPart = QueryNew(
  • "id, serial_number, price",
  • "cf_sql_integer, cf_sql_varchar, cf_sql_decimal"
  • ) />
  •  
  •  
  • <!--- Create some record. --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="5000"
  • step="1">
  •  
  • <!--- Add a row to the query. --->
  • <cfset QueryAddRow( qPart ) />
  •  
  • <!--- Populate row with random data. --->
  • <cfset qPart[ "id" ][ qPart.RecordCount ] = JavaCast( "int", qPart.RecordCount ) />
  • <cfset qPart[ "serial_number" ][ qPart.RecordCount ] = JavaCast( "string", CreateUUID() ) />
  • <cfset qPart[ "price" ][ qPart.RecordCount ] = JavaCast( "float", RandRange( 1, 100 ) ) />
  •  
  • </cfloop>
  •  
  •  
  • <!--- Test the StringBuffer version. --->
  • <cftimer
  • type="outline"
  • label="Java String Buffer">
  •  
  • <!--- Get the CSV value. --->
  • <cfset strOutput = QueryToCSV(
  • qPart,
  • "id, serial_number, price"
  • ) />
  •  
  • Done.
  •  
  • </cftimer>
  •  
  •  
  • <!--- Test the array-only version. --->
  • <cftimer
  • type="outline"
  • label="ArrayToList() Only">
  •  
  • <!--- Get the CSV value. --->
  • <cfset strOutput = QueryToCSV2(
  • qPart,
  • "id, serial_number, price"
  • ) />
  •  
  • Done.
  •  
  • </cftimer>

I am happy to say that both of them executed in milliseconds, so no matter which way you go, you can't really go wrong. The StringBuffer method did, however, slightly outperform the ArrayToList() method over about 10 trials:

QueryToCSV() - ArrayToList() And StringBuffer

Average Time: 295 ms

QueryToCSV2() - ArrayToList() Only

Average Time: 373 ms

As you can see, we're really only talking fractions of a second, so very little difference. That being said, as much as the StringBuffer was slightly faster, definitely go with whatever version you feel more comfortable with - they are both quite fast.

Download Code Snippet ZIP File

Comments (1)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Adobe ColdFusion 8.0.1 Update - Helping Programmers To Be Signifanctly Less Girlie - Download ColdFusion 8 Update 8.0.1 Now.

Reader Comments

Hi Ben,

Nice little research!... However I am really interested to know about the performance stats of POIUtility for this same scenario...

Posted by Dav R on May 27, 2008 at 9:04 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting