Updated: Converting A ColdFusion Query To CSV Using QueryToCSV()
Last week, I posted a ColdFusion user defined function that took a ColdFusion query object and turned it into a comma separated value (CSV) string. I had made heavy use of the Java StringBuffer to build the string. This however, required many method calls and extra string concatenation that apparently was slowing things down. This is a good example of what happens when you try to optimize something without fully weighing the pros and cons of the tools in use.
Using some of the feedback that I received on my function, I am posting an updated version of it here:
<cffunction
name="QueryToCSV"
access="public"
returntype="string"
output="false"
hint="I take a query and convert it to a comma separated value string.">
<!--- Define arguments. --->
<cfargument
name="Query"
type="query"
required="true"
hint="I am the query being converted to CSV."
/>
<cfargument
name="Fields"
type="string"
required="true"
hint="I am the list of query fields to be used when creating the CSV value."
/>
<cfargument
name="CreateHeaderRow"
type="boolean"
required="false"
default="true"
hint="I flag whether or not to create a row of header values."
/>
<cfargument
name="Delimiter"
type="string"
required="false"
default=","
hint="I am the field delimiter in the CSV value."
/>
<!--- Define the local scope. --->
<cfset var LOCAL = {} />
<!---
First, we want to set up a column index so that we can
iterate over the column names faster than if we used a
standard list loop on the passed-in list.
--->
<cfset LOCAL.ColumnNames = [] />
<!---
Loop over column names and index them numerically. We
are working with an array since I believe its loop times
are faster than that of a list.
--->
<cfloop
index="LOCAL.ColumnName"
list="#ARGUMENTS.Fields#"
delimiters=",">
<!--- Store the current column name. --->
<cfset ArrayAppend(
LOCAL.ColumnNames,
Trim( LOCAL.ColumnName )
) />
</cfloop>
<!--- Store the column count. --->
<cfset LOCAL.ColumnCount = ArrayLen( LOCAL.ColumnNames ) />
<!---
Now that we have our index in place, let's create
a string buffer to help us build the CSV value more
effiently.
--->
<cfset LOCAL.Buffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
<!--- Create a short hand for the new line characters. --->
<cfset LOCAL.NewLine = (Chr( 13 ) & Chr( 10 )) />
<!--- Check to see if we need to add a header row. --->
<cfif ARGUMENTS.CreateHeaderRow>
<!--- Create array to hold row data. --->
<cfset LOCAL.RowData = [] />
<!--- Loop over the column names. --->
<cfloop
index="LOCAL.ColumnIndex"
from="1"
to="#LOCAL.ColumnCount#"
step="1">
<!--- Add the field name to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#LOCAL.ColumnNames[ LOCAL.ColumnIndex ]#""" />
</cfloop>
<!--- Append the row data to the string buffer. --->
<cfset LOCAL.Buffer.Append(
JavaCast(
"string",
(
ArrayToList(
LOCAL.RowData,
ARGUMENTS.Delimiter
) &
LOCAL.NewLine
))
) />
</cfif>
<!---
Now that we have dealt with any header value, let's
convert the query body to CSV. When doing this, we are
going to qualify each field value. This is done be
default since it will be much faster than actually
checking to see if a field needs to be qualified.
--->
<!--- Loop over the query. --->
<cfloop query="ARGUMENTS.Query">
<!--- Create array to hold row data. --->
<cfset LOCAL.RowData = [] />
<!--- Loop over the columns. --->
<cfloop
index="LOCAL.ColumnIndex"
from="1"
to="#LOCAL.ColumnCount#"
step="1">
<!--- Add the field to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
</cfloop>
<!--- Append the row data to the string buffer. --->
<cfset LOCAL.Buffer.Append(
JavaCast(
"string",
(
ArrayToList(
LOCAL.RowData,
ARGUMENTS.Delimiter
) &
LOCAL.NewLine
))
) />
</cfloop>
<!--- Return the CSV value. --->
<cfreturn LOCAL.Buffer.ToString() />
</cffunction>
Now, rather than calling StringBuffer.Append() on every field value, I am build an array of fields in a row, then I join that array using ArrayToList() and append it to the ongoing string value that is held in the StringBuffer. So, I am still using the Java StringBuffer to hold the CSV value, but I am using an intermediary ColdFusion array to build up individual line values. I am also escaping any double-quotes that are in the query values - something which I was neglecting to do before. I have to say that this did greatly simplify the amount of code that was in the method. This pleases me; I love to see more simple solutions be more effective.
Some people did tell me that the StringBuffer was completely overkill and that they, in fact, were getting better results using just array-joining for both the fields and the rows. I found this hard to believe as the StringBuffer has proven useful so many times in the past. I had trouble believing that using just array joining would be the fastest method, and so, I did a little testing of my own. I re-wrote the above user defined method to use only arrays and no StringBuffer:
<cffunction
name="QueryToCSV2"
access="public"
returntype="string"
output="false"
hint="I take a query and convert it to a comma separated value string.">
<!--- Define arguments. --->
<cfargument
name="Query"
type="query"
required="true"
hint="I am the query being converted to CSV."
/>
<cfargument
name="Fields"
type="string"
required="true"
hint="I am the list of query fields to be used when creating the CSV value."
/>
<cfargument
name="CreateHeaderRow"
type="boolean"
required="false"
default="true"
hint="I flag whether or not to create a row of header values."
/>
<cfargument
name="Delimiter"
type="string"
required="false"
default=","
hint="I am the field delimiter in the CSV value."
/>
<!--- Define the local scope. --->
<cfset var LOCAL = {} />
<!---
First, we want to set up a column index so that we can
iterate over the column names faster than if we used a
standard list loop on the passed-in list.
--->
<cfset LOCAL.ColumnNames = [] />
<!---
Loop over column names and index them numerically. We
are going to be treating this struct almost as if it
were an array. The reason we are doing this is that
look-up times on a table are a bit faster than look
up times on an array (or so I have been told).
--->
<cfloop
index="LOCAL.ColumnName"
list="#ARGUMENTS.Fields#"
delimiters=",">
<!--- Store the current column name. --->
<cfset ArrayAppend(
LOCAL.ColumnNames,
Trim( LOCAL.ColumnName )
) />
</cfloop>
<!--- Store the column count. --->
<cfset LOCAL.ColumnCount = ArrayLen( LOCAL.ColumnNames ) />
<!--- Create a short hand for the new line characters. --->
<cfset LOCAL.NewLine = (Chr( 13 ) & Chr( 10 )) />
<!--- Create an array to hold the set of row data. --->
<cfset LOCAL.Rows = [] />
<!--- Check to see if we need to add a header row. --->
<cfif ARGUMENTS.CreateHeaderRow>
<!--- Create array to hold row data. --->
<cfset LOCAL.RowData = [] />
<!--- Loop over the column names. --->
<cfloop
index="LOCAL.ColumnIndex"
from="1"
to="#LOCAL.ColumnCount#"
step="1">
<!--- Add the field name to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#LOCAL.ColumnNames[ LOCAL.ColumnIndex ]#""" />
</cfloop>
<!--- Append the row data to the string buffer. --->
<cfset ArrayAppend(
LOCAL.Rows,
ArrayToList( LOCAL.RowData, ARGUMENTS.Delimiter )
) />
</cfif>
<!---
Now that we have dealt with any header value, let's
convert the query body to CSV. When doing this, we are
going to qualify each field value. This is done be
default since it will be much faster than actually
checking to see if a field needs to be qualified.
--->
<!--- Loop over the query. --->
<cfloop query="ARGUMENTS.Query">
<!--- Create array to hold row data. --->
<cfset LOCAL.RowData = [] />
<!--- Loop over the columns. --->
<cfloop
index="LOCAL.ColumnIndex"
from="1"
to="#LOCAL.ColumnCount#"
step="1">
<!--- Add the field to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
</cfloop>
<!--- Append the row data to the string buffer. --->
<cfset ArrayAppend(
LOCAL.Rows,
ArrayToList( LOCAL.RowData, ARGUMENTS.Delimiter )
) />
</cfloop>
<!---
Return the CSV value by joining all the rows together
into one string.
--->
<cfreturn ArrayToList(
LOCAL.Rows,
LOCAL.NewLine
) />
</cffunction>
As you can see, this uses ArrayToList() not only for the individual field values, but for all of the rows. Then, I put both of these to a basic speed test on several thousand rows of data:
<!--- Create a new part query. --->
<cfset qPart = QueryNew(
"id, serial_number, price",
"cf_sql_integer, cf_sql_varchar, cf_sql_decimal"
) />
<!--- Create some record. --->
<cfloop
index="intI"
from="1"
to="5000"
step="1">
<!--- Add a row to the query. --->
<cfset QueryAddRow( qPart ) />
<!--- Populate row with random data. --->
<cfset qPart[ "id" ][ qPart.RecordCount ] = JavaCast( "int", qPart.RecordCount ) />
<cfset qPart[ "serial_number" ][ qPart.RecordCount ] = JavaCast( "string", CreateUUID() ) />
<cfset qPart[ "price" ][ qPart.RecordCount ] = JavaCast( "float", RandRange( 1, 100 ) ) />
</cfloop>
<!--- Test the StringBuffer version. --->
<cftimer
type="outline"
label="Java String Buffer">
<!--- Get the CSV value. --->
<cfset strOutput = QueryToCSV(
qPart,
"id, serial_number, price"
) />
Done.
</cftimer>
<!--- Test the array-only version. --->
<cftimer
type="outline"
label="ArrayToList() Only">
<!--- Get the CSV value. --->
<cfset strOutput = QueryToCSV2(
qPart,
"id, serial_number, price"
) />
Done.
</cftimer>
I am happy to say that both of them executed in milliseconds, so no matter which way you go, you can't really go wrong. The StringBuffer method did, however, slightly outperform the ArrayToList() method over about 10 trials:
QueryToCSV() - ArrayToList() And StringBuffer
Average Time: 295 ms
QueryToCSV2() - ArrayToList() Only
Average Time: 373 ms
As you can see, we're really only talking fractions of a second, so very little difference. That being said, as much as the StringBuffer was slightly faster, definitely go with whatever version you feel more comfortable with - they are both quite fast.
Want to use code from this post? Check out the license.
Reader Comments
Hi Ben,
Nice little research!... However I am really interested to know about the performance stats of POIUtility for this same scenario...
This is a nice utility, thanks for creating it. However, the one thing that doesn't work for me is date fields.
@Chris,
I suppose you can either modify your SQL to format the date before you pass it to the function. Or, you can add column-type checks to the CSV method to check for date data types. I am not sure which way I would go. Personally, I think I would lean towards formatting the date in the SQL.
@Chris,
It's funny you just asked this cause I just saw this blog post:
http://www.petefreitag.com/item/687.cfm
This is what I changed to make it work the way I wanted:
<cfset LOCAL.querydata = ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ] >
<cfif isdate(LOCAL.querydata)>
<cfset LOCAL.querydata = dateformat(local.querydata,"mmm dd, yyyy")>
</cfif>
<!--- Add the field to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( local.querydata, """", """""", "all" )#""" />
@Chris,
Cool, looks good.
Hello Ben,
I don't want to have the Chr(10)&chr(13) at the end of each csv file i generate with you're function. Do you you an idea for fix your QueryToCSV function ?
Regards
@Laurent,
Piece of cake - only add the new line characters if you are NOT at the end of the query loop. You'll probably need something like:
<cfif (ARGUMENTS.Query.CurrentRow NEQ ARGUMENTS.Query.RecordCount)>
.... add new line.
</cfif>
This way, when the query is in the last row of the loop, it will not add the new line.
Hey Ben,
Just a heads up that i used this function on a coldfusion 7 server and it didn't work.
After a quick look, I realised that the function uses coldfusion 8 implicit struct and array creation.
Might be worthwhile putting a note up about this for anyone else using older versions of coldfusion.
Jon
p.s I've used this function heaps of times on coldfusion 8 servers before I noticed this today. Keep up the good work!
@Jon,
Yeah, that's always a tough thing to deal with - version compatibility. Glad you like it when you're on a CF8 box. Technically, you can get this to work on a CF7 box if you swap out bits, but its a pain.
Ben,
Recently we have been converting many of our sites from Application.cfm to Application.cfc. In that file we store our component classes as an application scoped variable in the onApplicationStart function. For example:
<code>
<cfset application.remarketingGateway = createObject("component","components.remarketingGateway").init(application.dsn)>
</code>
In the above code application.remarketingGateway holds all the methods which report sales. In those whe produce a variety of reports in XML,CSV and PDF and we used to call it this way:
<code>
<cfinvoke component="components.remarketingGateway" method="currentBenchmarkReport" returnvariable="getCurrentBenchmarkReport">
<cfinvokeargument name="custID" value="#Session.Login.Company.CustID#">
<cfinvokeargument name="startDate" value="#form.fromDate#">
<cfinvokeargument name="endDate" value="#form.toDate#">
</cfinvoke>
</code>
But when we utilize the components scoped in the Application.cfc file this way:
<code>
<cfset getCurrentBenchmarkReport = application.remarketingGateway.currentBenchmarkReport(Session.Login.Company.CustID, form.fromDate, #form.toDate#)>
</code>
We get the following error:
The routine QueryToCSV has been declared twice in different templates.
We know why this happens and have reverted back to the old way, but I thought it was an interesting issue you might like to test for yourself. Feel free to contact me for further details.
@Christopher,
That's a really odd error. I can't imagine why it is doing that if you are only calling methods on cached components. Very strange.
Awesome code, would you recommend it with CF 9?
@Nikos,
I don't think there's anything here that would / should break in CF9.
awesome :)
How easy would it be to just to print out all the fields by default instead if having to specify them all?
@Nikos,
You could default the Fields argument to be the column list from the passed-in query:
<cfargument
name="Fields"
required="false"
default="#arguments.query.columnList#"
/>
I ask because I usually just want to have a stored proc get specific data for a specific csv function and don't want to have to type out all the columns like this:
<cfreturn QueryToCSV(sp,"Client name,Client ID,Trade date,Due date,Transaction,Deposit / refund CCY,Deposit / refund amount")>
many thanks bro
Which tag would you use to save this string as a CSV file? Thanks Ben This is awesome!
@Josh,
You could use either the CFFile/Write tag or the fileWrite() method (depending on what version of ColdFusion you have).
Thanks mate, I done that but I get this error:
faultCode:Server.Processing faultString:'[Table (rows 30 columns CLIENT NAME, CLIENT ID, TRADE DATE, DUE DATE, TRANSACTION, 'DEPOSIT / REFUND CCY', 'DEPOSIT / REFUND AMOUNT'): [CLIENT NAME: coldfusion.sql.QueryColumn@629fe5] [CLIENT ID: coldfusion.sql.QueryColumn@ef9676] [TRADE DATE: coldfusion.sql.QueryColumn@17f30f2] [DUE DATE: coldfusion.sql.QueryColumn@1667e62] [TRANSACTION: coldfusion.sql.QueryColumn@1f361fb] ['DEPOSIT / REFUND CCY': coldfusion.sql.QueryColumn@1a5da07] ['DEPOSIT / REFUND AMOUNT': coldfusion.sql.QueryColumn@18d2964] ] is not indexable by arguments.query.columnList' faultDetail:'null'
@Nikos,
It looks like you forgot to put the ## characters in the default attribute?
@Ben Nadel,
awesome mate :)
@Ben Nadel,
Thanks but if I do that it orders it alphabetically, not the same order from the database. I think its a colfusion setting.
@Nikos,
Yes, the column list comes out of the query in alphabetical order. If you want to custom order them, just pass them in with the method call.
I am not sure what the order of the columns in the database have to do with anything? I am confused?
Well I had the sp in the database return the columns in the order I wanted the csv to show thats all :)
@Nikos,
Ah, gotcha. You can try using the undocumented method on the query object:
Query.GetColumnNames()
I believe this returns a list of columns in the original order of the select statement... but not 100% sure.
This works really well, thanks a million!
hi Ben,
This is what i want awesome,but i and some help from you regarding date values.
i am sending date column to from query to CV but when i open CSV that date field values point to time not date values in CSV . Will you please guide me how to change that to date.
Below is what chris posted on your blog about date values.
Will you please tell me where i can add this code to work with date values coming from query.
Thanks very much for this nice awesome code.
@chris for your comment
This is what I changed to make it work the way I wanted:
<cfset LOCAL.querydata = ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ] >
<cfif isdate(LOCAL.querydata)>
<cfset LOCAL.querydata = dateformat(local.querydata,"mmm dd, yyyy")>
</cfif>
<!--- Add the field to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( local.querydata, """", """""", "all" )#""" />
Hey guys,
I have recently been involved in a project which requires exporting data to CSV and originally used this UDF (it works great by the way!) but I would like to point out that in ColdFusion9 the cfspreadsheet tag can output a query to .XLS, CSV or HTML!
@CFNew,
It's been a while since I've done anything in Excel; but, I believe that if you are only seeing Time, it's actually a setting on the Excel column. Try looking at the formatting of the column cells to see if there is an option show date AND time.
@Joe,
Very cool! I still do most of my programming on ColdFusion 8 these days (I know, I know), so I haven't really explored the CFSpreadSheet tag too much. From the demos I've seen, it looks to be pretty badass.
Hi Ben,
why do you need to have so many double quotes when adding the field and field name to the row data?
-----------------------------------------
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#LOCAL.ColumnNames[ LOCAL.ColumnIndex ]#""" />
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
-----------------------------------------
I am not getting that part. Also, I need to add " actually to my field values.
Thanks for that code really useful and sorry for digging out some old posts !
Cheers,
K.
This is exactly what I was searching for Ben, thanks.
In my case I'll always be returning a header with the full column list so I refactored this a bit so you don't have to manually type out a list of columns. https://gist.github.com/2775372
Thanks for writing this. I added the following to make the fields argument optional.
I am using this to export stored procedure data to a CSV and it works great! The only change I made was to check the CurrentRow against the RecordCount to break the loop. I was getting multiple newline chars at the end of the file.
<cfif (ARGUMENTS.Query.CurrentRow LTE ARGUMENTS.Query.RecordCount)>
<!--- Add the field to the row data. --->
<cfset LOCAL.RowData[ LOCAL.ColumnIndex ] = """#Replace( ARGUMENTS.Query[ LOCAL.ColumnNames[ LOCAL.ColumnIndex ] ][ ARGUMENTS.Query.CurrentRow ], """", """""", "all" )#""" />
</cfif>
I also have a requirement to generate this as a read-only file. I have researched adding an attribute to the file, but have came up empty. Any suggestions on how to do this?