Skip to main content
Ben Nadel at cf.Objective() 2011 (Minneapolis, MN) with: Johnson Tai
Ben Nadel at cf.Objective() 2011 (Minneapolis, MN) with: Johnson Tai

Parsing CSV Data Using ColdFusion

By
Published in Comments (36)

As part of my exploration of writing, reading, and creating Microsoft Excel documents using ColdFusion, I have come across the need to parse comma-separated-value (CSV) data files. While this seems at first like a relatively simple task, I soon found out that it was ANYTHING but simple. It's one thing to worry about lists (for which ColdFusion is the bomb-diggity), but it's an entirely other thing to worry about lists that have field qualifiers, escaped qualifiers, escaped qualifiers that might be empty fields, and non-qualified field values all rolled into one.

I tried looking it up in Google but could not find any satisfactory algorithms (translates to: code that I could understand). Everything on CSV seems to be in Java and none the stuff on CFLib.org seems to comply with the range of CSV values (especially qualified fields). So, in typical blood-and-guts fashion, I sat down and tried to write my own algorithm. This proved to be easy at first until I found out that my approach was highly flawed. I went through about three different implementations over the weekend of the algorithm before I came up with something that seemed to work satisfactorially.

It has to evaluate each character at a time, which probably won't scale or perform nicely. I would have liked to harness the power of CFHttp to convert CSV files to queries, but I could not get CFHttp to work on the LOCAL file system (ie. a URL that begins with "file:"). If anyone knows of great way to do this, please let me know. I suppose that I could written a temporary file to a public folder and then performed a CFHttp to it, then deleted it, but that just felt a bit "hacky." However, in the end that might just prove to be the way to go.

So anyway, this is what I have come up with. It is a function that takes either a chunk of CSV data or a file path to a CSV data file (text file) and converts it to an array of arrays. It assumes that each record is separated by a return character followed optionally by a new line. Not sure if that is cross system compliant, but heck, this is my first attempt:

<cffunction
	name="CSVToArray"
	access="public"
	returntype="array"
	output="false"
	hint="Takes a delimited text data file or chunk of delimited data and converts it to an array of arrays.">

	<!--- Define the arguments. --->
	<cfargument
		name="CSVData"
		type="string"
		required="false"
		default=""
		hint="This is the raw CSV data. This can be used if instead of a file path."
		/>

	<cfargument
		name="CSVFilePath"
		type="string"
		required="false"
		default=""
		hint="This is the file path to a CSV data file. This can be used instead of a text data blob."
		/>

	<cfargument
		name="Delimiter"
		type="string"
		required="false"
		default=","
		hint="The character that separate fields in the CSV."
		/>

	<cfargument
		name="Qualifier"
		type="string"
		required="false"
		default=""""
		hint="The field qualifier used in conjunction with fields that have delimiters (not used as delimiters ex: 1,344,343.00 where [,] is the delimiter)."
		/>


	<!--- Define the local scope. --->
	<cfset var LOCAL = StructNew() />

	<!---
		Check to see if we are dealing with a file. If we are,
		then we will use the data from the file to overwrite
		any csv data blob that was passed in.
	--->
	<cfif (
		Len( ARGUMENTS.CSVFilePath ) AND
		FileExists( ARGUMENTS.CSVFilePath )
		)>

		<!---
			Read the data file directly into the arguments scope
			where it can override the blod data.
		--->
		<cffile
			action="READ"
			file="#ARGUMENTS.CSVFilePath#"
			variable="ARGUMENTS.CSVData"
			/>

	</cfif>


	<!---
		ASSERT: At this point, whether we got the CSV data
		passed in as a data blob or we read it in from a
		file on the server, we now have our raw CSV data in
		the ARGUMENTS.CSVData variable.
	--->


	<!---
		Make sure that we only have a one character delimiter.
		I am not going traditional ColdFusion style here and
		allowing multiple delimiters. I am trying to keep
		it simple.
	--->
	<cfif NOT Len( ARGUMENTS.Delimiter )>

		<!---
			Since no delimiter was passed it, use thd default
			delimiter which is the comma.
		--->
		<cfset ARGUMENTS.Delimiter = "," />

	<cfelseif (Len( ARGUMENTS.Delimiter ) GT 1)>

		<!---
			Since multicharacter delimiter was passed, just
			grab the first character as the true delimiter.
		--->
		<cfset ARGUMENTS.Delimiter = Left(
			ARGUMENTS.Delimiter,
			1
			) />

	</cfif>


	<!---
		Make sure that we only have a one character qualifier.
		I am not going traditional ColdFusion style here and
		allowing multiple qualifiers. I am trying to keep
		it simple.
	--->
	<cfif NOT Len( ARGUMENTS.Qualifier )>

		<!---
			Since no qualifier was passed it, use thd default
			qualifier which is the quote.
		--->
		<cfset ARGUMENTS.Qualifier = """" />

	<cfelseif (Len( ARGUMENTS.Qualifier ) GT 1)>

		<!---
			Since multicharacter qualifier was passed, just
			grab the first character as the true qualifier.
		--->
		<cfset ARGUMENTS.Qualifier = Left(
			ARGUMENTS.Qualifier,
			1
			) />

	</cfif>


	<!--- Create an array to handel the rows of data. --->
	<cfset LOCAL.Rows = ArrayNew( 1 ) />

	<!---
		Split the CSV data into rows of raw data. We are going
		to assume that each row is delimited by a return and
		/ or a new line character.
	--->
	<cfset LOCAL.RawRows = ARGUMENTS.CSVData.Split(
		"\r\n?"
		) />


	<!--- Loop over the raw rows to parse out the data. --->
	<cfloop
		index="LOCAL.RowIndex"
		from="1"
		to="#ArrayLen( LOCAL.RawRows )#"
		step="1">


		<!--- Create a new array for this row of data. --->
		<cfset ArrayAppend( LOCAL.Rows, ArrayNew( 1 ) ) />


		<!--- Get the raw data for this row. --->
		<cfset LOCAL.RowData = LOCAL.RawRows[ LOCAL.RowIndex ] />


		<!---
			Replace out the double qualifiers. Two qualifiers in
			a row acts as a qualifier literal (OR an empty
			field). Replace these with a single character to
			make them easier to deal with. This is risky, but I
			figure that Chr( 1000 ) is something that no one
			is going to use (or is it????).
		--->
		<cfset LOCAL.RowData = LOCAL.RowData.ReplaceAll(
			"[\#ARGUMENTS.Qualifier#]{2}",
			Chr( 1000 )
			) />

		<!--- Create a new string buffer to hold the value. --->
		<cfset LOCAL.Value = CreateObject(
			"java",
			"java.lang.StringBuffer"
			).Init()
			/>


		<!---
			Set an initial flag to determine if we are in the
			middle of building a value that is contained within
			quotes. This will alter the way we handle
			delimiters - as delimiters or just character
			literals.
		--->
		<cfset LOCAL.IsInField = false />


		<!--- Loop over all the characters in this row. --->
		<cfloop
			index="LOCAL.CharIndex"
			from="1"
			to="#LOCAL.RowData.Length()#"
			step="1">


			<!---
				Get the current character. Remember, since Java
				is zero-based, we have to subtract one from out
				index when getting the character at a
				given position.
			--->
			<cfset LOCAL.ThisChar = LOCAL.RowData.CharAt(
				JavaCast( "int", (LOCAL.CharIndex - 1))
				) />


			<!---
				Check to see what character we are dealing with.
				We are interested in special characters. If we
				are not dealing with special characters, then we
				just want to add the char data to the ongoing
				value buffer.
			--->
			<cfif (LOCAL.ThisChar EQ ARGUMENTS.Delimiter)>

				<!---
					Check to see if we are in the middle of
					building a value. If we are, then this is a
					character literal, not an actual delimiter.
					If we are NOT buildling a value, then this
					denotes the end of a value.
				--->
				<cfif LOCAL.IsInField>

					<!--- Append char to current value. --->
					<cfset LOCAL.Value.Append(
						LOCAL.ThisChar.ToString()
						) />


				<!---
					Check to see if we are dealing with an
					empty field. We will know this if the value
					in the field is equal to our "escaped"
					double field qualifier (see above).
				--->
				<cfelseif (
					(LOCAL.Value.Length() EQ 1) AND
					(LOCAL.Value.ToString() EQ Chr( 1000 ))
					)>

					<!---
						We are dealing with an empty field so
						just append an empty string directly to
						this row data.
					--->
					<cfset ArrayAppend(
						LOCAL.Rows[ LOCAL.RowIndex ],
						""
						) />


					<!---
						Start new value buffer for the next
						row value.
					--->
					<cfset LOCAL.Value = CreateObject(
						"java",
						"java.lang.StringBuffer"
						).Init()
						/>

				<cfelse>

					<!---
						Since we are not in the middle of
						building a value, we have reached the
						end of the field. Add the current value
						to row array and start a new value.

						Be careful that when we add the new
						value, we replace out any "escaped"
						qualifiers with an actual qualifier
						character.
					--->
					<cfset ArrayAppend(
						LOCAL.Rows[ LOCAL.RowIndex ],
						LOCAL.Value.ToString().ReplaceAll(
							"#Chr( 1000 )#{1}",
							ARGUMENTS.Qualifier
							)
						) />


					<!---
						Start new value buffer for the next
						row value.
					--->
					<cfset LOCAL.Value = CreateObject(
						"java",
						"java.lang.StringBuffer"
						).Init()
						/>

				</cfif>


			<!---
				Check to see if we are dealing with a field
				qualifier being used as a literal character.
				We just have to be careful that this is NOT
				an empty field (double qualifier).
			--->
			<cfelseif (LOCAL.ThisChar EQ ARGUMENTS.Qualifier)>

				<!---
					Toggle the field flag. This will signal that
					future characters are part of a single value
					despite and delimiters that might show up.
				--->
				<cfset LOCAL.IsInField = (NOT LOCAL.IsInField) />


			<!---
				We just have a non-special character. Add it
				to the current value buffer.
			--->
			<cfelse>

				<cfset LOCAL.Value.Append(
					LOCAL.ThisChar.ToString()
					) />

			</cfif>


			<!---
				If we have no more characters left then we can't
				ignore the current value. We need to add this
				value to the row array.
			--->
			<cfif (LOCAL.CharIndex EQ LOCAL.RowData.Length())>

				<!---
					Check to see if the current value is equal
					to the empty field. If so, then we just
					want to add an empty string to the row.
				--->
				<cfif (
					(LOCAL.Value.Length() EQ 1) AND
					(LOCAL.Value.ToString() EQ Chr( 1000 ))
					)>

					<!---
						We are dealing with an empty field.
						Just add the empty string.
					--->
					<cfset ArrayAppend(
						LOCAL.Rows[ LOCAL.RowIndex ],
						""
						) />

				<cfelse>

					<!---
						Nothing special about the value. Just
						add it to the row data.
					--->
					<cfset ArrayAppend(
						LOCAL.Rows[ LOCAL.RowIndex ],
						LOCAL.Value.ToString().ReplaceAll(
							"#Chr( 1000 )#{1}",
							ARGUMENTS.Qualifier
							)
						) />

				</cfif>

			</cfif>

		</cfloop>

	</cfloop>

	<!--- Return the row data. --->
	<cfreturn( LOCAL.Rows ) />

</cffunction>

I have chosen to convert the CSV to an array of arrays as I was not sure that you could depend on the constant number of fields per row. Plus, I figure that going from an array to a query (after this step) would be rather easy. Plus, since Excel is not perfectly square cols vs. rows, I figure this was more in-line with where I want to go with it (including it in my ColdFusion POI Utility component).

If I create a variable containing this CSV data:

last name,first name,salary,dream salary,happiness
Jones,Mike,"$35,500.00","$73,000.00"
Hopkins,Paul,"$55,234.00","$250,000.00",3.0
Hawkings,Katie,,,
,
Smith,Betty,"$57,010.00","$60,000.00",10.0

... and pass it into the CSVToArray ColdFusion user defined function:

<!--- Convert the CSV to an array of arrays. --->
<cfset arrCSV = CSVToArray(
	CSVData = strCSVData,
	Delimiter = ",",
	Qualifier = """"
	) />

<!--- Dump out array. --->
<cfdump var="#arrCSV#" label="CSV Data" />

I get this output:

Parsing CSV Data Using ColdFusion

As you can see, the CSVToArray() ColdFusion function handles mixed length records, empty field values, and qualified fields. It even handles escaped qualifiers (ex. "" becomes ") but this was not demonstrated. While this is not perfect, at least it provides me with a CSV conversion interface that I can use in my POI Utility ColdFusion component. Further down the road, I will be able to swap this out later for a better implementation.

Want to use code from this post? Check out the license.

Reader Comments

79 Comments

Ben,

I haven't thought through this, so forgive me if it's a stupid question, but
did you consider using regular expressions? If so, what caused you to decide against using them?

56 Comments

@Ben,

When dealing with lists, use the GetToken() function. It won't ignore empty list elements. This will significantly speed up your function and replace the loop that you are doing. Also Sammy hit the nails on the head with using RegEx to strip out the text between the qualifiers.

Another trick you can use to speed things up is to use GetToken() to populate the empty the empty cells and then use ListToArray() for the conversion. It's alot quicker then creating a Java Object on each call.

Hopefully this helps you out some.

15,848 Comments

@Sammy,

I did think of regular expressions, 'cause they are cool, but I wasn't sure how to apply them. Plus I don't think my skills with them would be good enough to handle all the different options that come with CSV formatting. Take for example:

ben,was,here

That is three fields. But this:

"ben,was,here"

is one field. But this:

""ben,was,here""

is three fields; the first starts with a quote literal, and the last field ends with a quote literal. And then this:

""ben,"was,here"""

has two fields.... you get the point? It was just too much for me to wrap my head around. I am sure that regular expressions would rock somehow, I just can't figure it out.

15,848 Comments

Tony,

It's funny you mention that because my first attempt actually did use a Tokenizer. In my experience, though, it does skip empty fields:

<cfset Tokenizer = CreateObject(
"java",
"java.util.StringTokenizer"
).Init(
JavaCast( "string", "a,b,,,,c,d,e,f" ),
JavaCast( "string", "," )
) />

<cfloop condition="Tokenizer.HasMoreTokens()">
[#Tokenizer.NextToken()#]<br />
</cfloop>

... outputs:

[a]
[b]
[c]
[d]
[e]
[f]

... it skips right over the empty fields. However, in my current implementation I do add a leading space to all fields which then gets stripped out later.

I did learn some things in iteration three that I didn't know in iteration one, so I could probably go back and apply that to the String Tokenizer. In fact, maybe I will do that.

1 Comments

Comma seperated is a good idea with cold fusion becoz it is gonna remove some of difficult queries and the irregularities. while is is easy to retrieve the information at the client end.

It is being used in www.compglobe.com where you are entitled to compose your comment and the comment will be transfered to the CSV file at the server level.
www.compglobe.com is also using CSV format to upload the phone no.s if you want to send information to the handset of the recipent to whom you want to delivered the material. www.compglobe.com has various things like message composer and an online radio too.

2 Comments

Doing something similar, i just grabbed http://opencsv.sourceforge.net/ and then did this:

<cfparam name="filename">
<cfscript>
fileReader = createobject("java","java.io.FileReader");
fileReader.init(filename);

csvReader = createObject("java","au.com.bytecode.opencsv.CSVReader");
csvReader.init(fileReader);
</cfscript>
<cfdump var="#csvReader.readAll()#">

Java and ColdFusion play SO nice together *smile*

1 Comments

Thanks for the code. This was very helpful since I'm just learning CF. I now from other experiences that parsing CSV files can be a real pain to get it to work right.

2 Comments

Thanks for the code and tutorial Ben - I was grappling with exactly the same issue relating to coverting CSV with encapsulating quotes and your post was a lifesaver!!

1 Comments

This is similar perhaps to what I need to achieve.(I think)

My client has a list of products. (Product ID, Product Name, description) are the colum headers for the product table.

well, the description field data... is a CSV.

for example

the data in the description field is:

OD(+/-1.2mm), Wall Thickness = 5.0mm (+/- .4mm), Inside Diameter = 65.0mm, Approximate pieces per case = 4, Approximate weight per case = 32.34 lbs

But i need to take the data in that one field, and create more colums to display these attributes rather than this text blob.

Am I on the right track?

15,848 Comments

@JKS,

You can use CSV parsing to get those values; however, if those are the only values in the field, you can simply treat the data as if it were a comma-delimited list. Then, you can either split the list into an array with ListToArray(), or even use things like ListGetAt() and ListLen() to loop over the elements of the list and examine each individually.

7 Comments

Ben,

AWESOME JOB!!! I can't believe this was so difficult to find. You definitely saved HOURS of time and helped meet my deadline. This works great. People like you are what make the net an awesome place for research and learning. Thanks!!

2 Comments

Hi Ben,

This is really great code that you are sharing. I am new to coldfusion coding.

I am not able to connect the dots between the array generated using this procedure and writing a query and or tying it into the POI Utility.

I imagine it is strait forward, but I am seeming not to be able to work it through.

Appreciate all your posts.

3 Comments

hi ben. how would you go about creating a function that would read csv data no matter the order of columns - as long as the header fields are named to match my db fields.

i am using your wonderful code above and it works awesome! however, i need to adjust my array data each time i add columns (particularly if i rearrange the column order).

do i first read in the header row, and somehow do the matching there?

this may be extra for expert stuff!

15,848 Comments

@Mike,

I am typically not a fan of using header rows to do auto-name things (I don't usually trust clients to name things appropriately); but this seems to be something that people always are asking about. I will come up with something that makes this a bit easier to work with. I'll get back to you.

2 Comments

Ben,

Do you have any idea how to accommodate for a multilingual csv? Most (but not all) of the languages/characters pass through fine. It seems that Chinese and Russian are having the most trouble being interpreted. I'm guessing this is an issue with the charset, but I am not positive (nor am I sure as to how I would go about fixing this issue).

Thanks for all of the helpful postings!
Brian

2 Comments

^^Ben, I apologize about my above question. I realized that it was an error on MY end. I should have figured that :p Quick fix by manipulating cfcontent :)

Thanks for all of the helpful posts though!

1 Comments

Ben,

THANK YOU! (I'm not yelling, just excited). I've been using Coldfusion longer than I like to think... I never received any training... I just read Ben Forte's book... and I was off. That said, if I hadn't found your code I would have been forced to hack together some nasty bit of code that would have caused me more trouble than good.

Question:

I need to import the data from the array into a database. I know that I can loop through a List but not an array.... any good suggestions how to easily import from an array?

Thanks,
Wayne

1 Comments

Ben,

I tried using your CSV Java based code on a file that used TABS to separate the data but it didn't reliably recognize the tab delimiter.

Do I need to change the pattern that you created to have it work properly?

Thanks,
Wayne

15,848 Comments

@Wayne,

A few comments back, I actually pointed to a newer version of the CSV parsing function. It uses Regular Expressions to the parsing which turns out to be much faster and more flexible:

www.bennadel.com/blog/991-CSVToArray-ColdFusion-UDF-For-Parsing-CSV-Data-Files.htm

As far as moving them into a database, the UDF returns an array or arrays. To loop over them, you can use an index loop; or, if you are using CF8+, you can use an array loop.

While not directly related, I do have a post that talks about moving XML into a database. This does use a good bit on Array looping to get the job done:

www.bennadel.com/blog/1636-How-To-Move-XML-Data-Into-A-Database-Using-ColdFusion.htm

I hope some of that helps!

1 Comments

Ben,

I've been using your script for some time now but I'm still having trouble with any field that has a ". My file is delimited by TAB with no quotes. I have Qualifier set to "" (nothing) but the routine sees a " it ignores any more tabs in that record (concatenates all the rest of the fields for that one record into the field that had the " in it. Here are my parameters.

<cfset arrData = CSVToArray(
CSVData = strCSVTab,
Delimiter = "#chr(9)#",
Qualifier = ""
) />

Any help would be much appreciated.

Thanks,
Wayne Gregori

3 Comments

If you want an example of uploading a csv (using Ben's csvtoarray) and then looping through each element within an array within an array, here it is:

<!--- Assumes you already processed your form and have the file file... now get your csv file to your server --->
<cffile
action = "UPLOAD"
filefield = "myFile"
destination = "#ExpandPath('myfolder/')#"
nameconflict = "overwrite"
result = "thefile">

<cfset thepath = '#Expandpath('myfolder/')#' & '#thefile.ServerFile#'>

<cfinclude template="csvtoarray.cfm">

<cfset result=csvtoarray(#thepath#)>

<cfoutput>

<cfloop index="OuterLoop" from="1" to="#ArrayLen(Result)#">
<cfloop index="InnerLoop" from="1" to="#ArrayLen(Result[OuterLoop])#">
<cfoutput>
Result[#OuterLoop#][#InnerLoop#] is #Result[OuterLoop][InnerLoop]#<br>
</cfoutput>
</cfloop>
</cfloop>

</cfoutput>

3 Comments

If you want an example of uploading a csv (using Ben's csvtoarray) and then looping through each element within an array within an array, here it is:

<!--- Assumes you already processed your form and have the file file... now get your csv file to your server --->
<cffile
action = "UPLOAD"
filefield = "myFile"
destination = "#ExpandPath('myfolder/')#"
nameconflict = "overwrite"
result = "thefile">

<cfset thepath = '#Expandpath('myfolder/')#' & '#thefile.ServerFile#'>

<cfinclude template="csvtoarray.cfm">

<cfset result=csvtoarray(#thepath#)>

<cfoutput>

<cfloop index="OuterLoop" from="1" to="#ArrayLen(Result)#">
<cfloop index="InnerLoop" from="1" to="#ArrayLen(Result[OuterLoop])#">
<cfoutput>
Result[#OuterLoop#][#InnerLoop#] is #Result[OuterLoop][InnerLoop]#<br>
</cfoutput>
</cfloop>
</cfloop>

</cfoutput>

3 Comments

If you want an example of uploading a csv (using Ben's csvtoarray) and then looping through each element within an array within an array, here it is:

<!--- Assumes you already processed your form and have the file file... now get your csv file to your server --->
<cffile
action = "UPLOAD"
filefield = "myFile"
destination = "#ExpandPath('myfolder/')#"
nameconflict = "overwrite"
result = "thefile">

<cfset thepath = '#Expandpath('myfolder/')#' & '#thefile.ServerFile#'>

<cfinclude template="csvtoarray.cfm">

<cfset result=csvtoarray(#thepath#)>

<cfoutput>

<cfloop index="OuterLoop" from="1" to="#ArrayLen(Result)#">
<cfloop index="InnerLoop" from="1" to="#ArrayLen(Result[OuterLoop])#">
<cfoutput>
Result[#OuterLoop#][#InnerLoop#] is #Result[OuterLoop][InnerLoop]#<br>
</cfoutput>
</cfloop>
</cfloop>

</cfoutput>

1 Comments

I ran into an issue where there are line breaks in the middle of the qualified text. I thought there would be an easy way to ignore or remove those via regex before running this function but am struggling. Any ideals? Thanks!

1 Comments

Just a thought -

<!---
Split the CSV data into rows of raw data. We are going
to assume that each row is delimited by a return and
/ or a new line character.
--->
<cfset LOCAL.RawRows = ARGUMENTS.CSVData.Split(
"\r\n?"
) />

Surely that should be "\r?\n", since Windows newlines look like \r\n and Unix like \n. This way the regexp would accept both \r\n and \n, as opposed to \r\n and \r like the previous code did.

4 Comments

It doesn't seem to matter which code example I try using from all the sources and examples provided from your blog post, but all the examples seem to through a coldfusion.runtime.Struct cannot be cast to java.lang.String

Got any suggestions of why? On a CF10 server, though it could be because of the code being built for a CF8 server...

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel