POST Streaming Multi-Part Form Data From ColdFusion Using Java And Formidable.js
Lately, I've been playing around with the idea of streaming form posts from a ColdFusion server such that the entire contents of the form post would not have to be buffered in the local memory at any given time. I was able to post file-only content from one ColdFusion server to another; but, when it came to multi-part form data (ie. form fields), it seemed as if the target ColdFusion server was not able to handle the streaming. So, I thought, I would try streaming multi-part form data from ColdFusion to a Node.js server which, typically, is more than happy to handle long-running and / or streaming connections.
Until now, I've never actually POST'd anything to a Node.js server; I've only ever dealt with GET requests that parse query strings and write responses. As such, I had to do a little searching for a form-parsing module. As luck would have it, Felix Geisendorfer recently published a Node.js module - formidable - which deals specifically with parsing incoming form POSTs as they are being streamed over the connection.
Like so many Node.js modules, formidable extends the event model, allowing callbacks to be bound to certain events that get triggered during the request parsing. For our demo purposes, we're going to be listening for three of the request-parsing events:
field - This gets triggered when a single field has been fully parsed out of the incoming request.
file - This gets triggered when a single file has been fully parsed out of the incoming request (and saved to the local file system as a TMP file).
end - This gets triggered when the incoming form POST has been fully parsed.
Once I cloned the formidable Git repo, setting up the Node.js server was quite easy. The following code creates an HTTP request callback which looks for requests being posted to the "/upload" URL. These requests are assumed to be POST requests and are passed to a formidable instance. Field and File parsing events are then logged to the console.
Server2.js (Our Node.js Server)
// Include the necessary modules.
var sys = require( "sys" );
var http = require( "http" );
// This module will help us parse multi-part form data as it is
// being streamed in from the ColdFusion server.
var formidable = require( "./formidable" );
// ---------------------------------------------------------- //
// ---------------------------------------------------------- //
// Create an instance of the HTTP server.
var server = http.createServer(
function( request, response ){
// Check to see if this request is a form upload.
if (request.url == "/upload") {
// Create a new instance of multi-part form parser.
var form = new formidable.IncomingForm();
// Point to the TMP directory in the server folder - this is
// where our files will be uploaded.
form.uploadDir = require( "path" ).join( __dirname, "tmp" );
// Listen for field completion.
form.on(
"field",
function( name, value ){
// Log the field to the console.
console.log( "FIELD:", name, value );
}
);
// Listen for the file completion.
form.on(
"file",
function( name, file ){
// Log the file to the console.
console.log(
"FILE:",
name,
file.filename,
("[" + file.length + " Bytes]")
);
}
);
// Listen for the end of the form post (remember, we've been
// parsing this as it was streaming in).
form.on(
"end",
function(){
// Write the successful headers.
response.writeHead(
200,
{
"content-type": "text/plain"
}
);
// Write the content.
response.end( "Form has been uploaded!" );
}
);
// Signify the new request in the console.
console.log(
"-----------------------------------",
"\n+-- New Multi-Part Form Request --+",
"\n-----------------------------------"
);
// Now that we have our form set-up, let's start parsing the
// files and fields in the incoming request.
form.parse( request );
// This reuqest was NOT an upload.
} else {
// Write the failed headers.
response.writeHead(
404,
{
"content-type": "text/plain"
}
);
// Write the content.
response.end( "Not Found." );
}
});
// Point the server to listen to the given port for incoming
// requests.
server.listen( 8080 );
// ---------------------------------------------------------- //
// ---------------------------------------------------------- //
// Write debugging information to the console to indicate that
// the server has been configured and is up and running.
sys.puts( "Server is running on 8080" );
Luckily, all of the heavy lifting is encapsulated inside the formidable node.js module. This leaves our Node.js server quite sparse when it comes to configuration.
Ok, so now that we have our target node.js server configured to parse incoming requests, let's take a look at our ColdFusion code. The following code is very similar to the code posted previously; the only difference here is that as we assemble and stream the multi-part form data, we're sleeping the upload between the field definitions and the file post. In this way, we can see (from the Node.js console) that ColdFusion is truly streaming the POST content without having to buffer it locally.
<!---
Set up the target URL for posting. In this demo, we will be
posting to a Node.js server which does have the ability to handle
multi-part form data in a streaming, non-buffered fashion.
--->
<cfset postUrl = "http://localhost:8080/upload" />
<!---
Create an instance of our Java URL - This is the object that we
will use to open the connection to the above location.
--->
<cfset targetUrl = createObject( "java", "java.net.URL" ).init(
javaCast( "string", postUrl )
) />
<!---
Now that we have our URL, let's open a connection to it. This
will give us access to the input (download) and output (upload)
streams for the target end point.
NOTE: This gives us an instance of java.net.URLConnection (or
one of its sub-classes).
--->
<cfset connection = targetUrl.openConnection() />
<!---
Be default, the connection is only set to gather target content,
not to POST it. As such, we have to make sure that we turn on
output (upload) before we access the data streams.
--->
<cfset connection.setDoOutput( javaCast( "boolean", true ) ) />
<!--- Since we are uploading, we have to set the method to POST. --->
<cfset connection.setRequestMethod( javaCast( "string", "POST" ) ) />
<!---
By default, the connection will locally buffer the data until it
is ready to be posted in its entirety. We don't want to hold it
all in memory, however; as such, we need to explicitly turn data
Chunking on. This will allow the connection to flush data to the
target url without having to load it all in memory (this is
perfect for when the size of the data is not known ahead of time).
--->
<cfset connection.setChunkedStreamingMode( javaCast( "int", 10 ) ) />
<!---
When posting data, the content-type will determine how the
target server parses the incoming request. If the target server
is ColdFusion, this is especially crtical as it will throw an
error if it tries to parse this POST as a collection of
name-value pairs.
In this case, we WANT it to see the form as multi-part, which
will be a collection of name-value pairs. In order to delimit
the part of the form post, we need to create a bondary identifier.
This is how the server will know where one value ends and the
next one starts.
This needs to be a random string so as not to show up in the
form data itself (as a false boundary).
--->
<cfset fieldBoundary = ("POST------------------" & getTickCount()) />
<!---
Set the content type and include the boundary information so the
server knowns how to parse the data.
--->
<cfset connection.setRequestProperty(
javaCast( "string", "Content-Type" ),
javaCast( "string", ("multipart/form-data; boundary=" & fieldBoundary) )
) />
<!---
Now that we have prepared the connection to the target URL, let's
get the output stream - this is the UPLOAD stream to which we can
write data to be posted to the target server.
--->
<cfset uploadStream = connection.getOutputStream() />
<!---
Before we send the file data, we'll send some simple
name-value pairs in plain-text format. In order to make it easier
to write strings to the upload stream, let's wrap it in a Writer.
This will allow us to write string data rather than just bytes.
--->
<cfset uploadWriter = createObject( "java", "java.io.OutputStreamWriter" ).init(
uploadStream
) />
<!---
Form data makes heavy use of the Carriage Return and New Line
characters to delimite values.
--->
<cfset crnl = (chr( 13 ) & chr( 10 )) />
<!--- A double break is also used. --->
<cfset crnl2 = (crnl & crnl) />
<!--- Delimit the field. --->
<cfset uploadWriter.write(
javaCast( "string", ("--" & fieldBoundary & crnl) )
) />
<!--- Send the title. --->
<cfset uploadWriter.write(
javaCast(
"string",
(
"Content-Disposition: form-data; name=""title""" &
crnl2 &
"The Bride" &
crnl
))
) />
<!--- Delimit the field. --->
<cfset uploadWriter.write(
javaCast( "string", ("--" & fieldBoundary & crnl) )
) />
<!--- Send the author. --->
<cfset uploadWriter.write(
javaCast(
"string",
(
"Content-Disposition: form-data; name=""author""" &
crnl2 &
"Julie Garwood" &
crnl
))
) />
<!--- Delimit the field. --->
<cfset uploadWriter.write(
javaCast( "string", ("--" & fieldBoundary & crnl) )
) />
<!--- Send the publisher. --->
<cfset uploadWriter.write(
javaCast(
"string",
(
"Content-Disposition: form-data; name=""publisher""" &
crnl2 &
"Pocket Star" &
crnl
))
) />
<!---
Now that we've written the simple name/value pairs, let's post
the actual file data as part of the incoming request. This works
very much in the same way, although we are going to stream the
local file into the post data.
Let's open a connection to a local file that we will stream to
the output a byte at a time.
NOTE: There are more effficient, buffered ways to read a file
into memory; however, this is just trying to keep it simple.
--->
<cfset fileInputStream = createObject( "java", "java.io.FileInputStream" ).init(
javaCast( "string", expandPath( "./data2.txt" ) )
) />
<!--- Delimit the field. --->
<cfset uploadWriter.write(
javaCast( "string", ("--" & fieldBoundary & crnl) )
) />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
Now that have defined the non-file fields and delimited the
file, let's FLUSH the UPLOAD stream and pause the upload. This
will allow us to look at the Node.js console to see if the POST
is truly being streamed (without local buffering).
--->
<cfset uploadWriter.flush() />
<!--- Sleep the upload for a few seconds. --->
<cfset sleep( 5000 ) />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- Send the file along. --->
<cfset uploadWriter.write(
javaCast(
"string",
(
"Content-Disposition: form-data; name=""upload""; filename=""the_bride.txt""" &
crnl &
"Content-Type: ""text/plain""" &
crnl2
))
) />
<!--- Read the first byte from the file. --->
<cfset nextByte = fileInputStream.read() />
<!---
Keep reading from the file, one byte at a time, until we hit
(-1) - the End of File marker for the input stream.
--->
<cfloop condition="(nextByte neq -1)">
<!--- Write this byte to the output (UPLOAD) stream. --->
<cfset uploadWriter.write( javaCast( "int", nextByte ) ) />
<!--- Read the next byte from the file. --->
<cfset nextByte = fileInputStream.read() />
</cfloop>
<!--- Add the new line to the field value. --->
<cfset uploadWriter.write( javaCast( "string", crnl ) ) />
<!---
Delimit the end of the post. Notice that the last delimiter has
a trailing double-slash after it.
--->
<cfset uploadWriter.write(
javaCast( "string", (crnl & "--" & fieldBoundary & "--" & crnl) )
) />
<!--- Now that we're done streaming the file, close the stream. --->
<cfset uploadWriter.close() />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
At this point, we have completed the UPLOAD portion of the
request. We could be done; or we could look at the input
(download) portion of the request in order to view the response
or the error.
--->
<cfoutput>
Response:
#connection.getResponseCode()# -
#connection.getResponseMessage()#<br />
<br />
</cfoutput>
<!---
The input stream is mutually exclusive with the error stream,
although both can return data. As such, let's try to access
the input stream... and then use the error stream if there is
a problem.
--->
<cftry>
<!--- Try for the input stream. --->
<cfset downloadStream = connection.getInputStream() />
<!---
If the input stream is not available (ie. the server returned
an error response), then we'll have to use the error output
as the response stream.
--->
<cfcatch>
<!--- Use the error stream as the download. --->
<cfset downloadStream = connection.getErrorStream() />
</cfcatch>
</cftry>
<!---
At this point, we have either the natural download or the error
download. In either case, we can start reading the output in
the same mannor.
--->
<cfset responseBuffer = [] />
<!--- Get the first byte. --->
<cfset nextByte = downloadStream.read() />
<!---
Keep reading from the response stream until we run out of bytes
(-1). We'll be building up the response buffer a byte at a time
and then outputting it as a single value.
--->
<cfloop condition="(nextByte neq -1)">
<!--- Add the byte AS CHAR to the response buffer. --->
<cfset arrayAppend( responseBuffer, chr( nextByte ) ) />
<!--- Get the next byte. --->
<cfset nextByte = downloadStream.read() />
</cfloop>
<!--- Close the response stream. --->
<cfset downloadStream.close() />
<!--- Output the response. --->
<cfoutput>
Response: #arrayToList( responseBuffer, "" )#
</cfoutput>
I know this is super verbose and exploding with comments. But hopefully, you can understand it fairly well. The key thing to see is that after all of the simple name/value fields have been defined, we are flushing the upload stream and then pausing the ColdFusion thread. This should allow us to see the multi-part form parsing get paused on the Node.js side, confirming that ColdFusion is, in fact, streaming the content with out buffer.
NOTE: You can see this in effect if you watch the above video.
As a target server, ColdFusion doesn't seem to be able to handle multi-part, streaming form data (for which a content-length value is not known ahead of time); but as an origin server, streaming large, complex form posts doesn't seem to be a problem. This can definitely come in handy when posting large binary data from ColdFusion to something like Amazon S3 which, I'm told, can handle streaming, multi-part form data.
Want to use code from this post? Check out the license.
Reader Comments
Ben...am wondering if this post has some ideas that might help my latest conundrum...
I have a client who is a photographer. I built a multipart form that will allow her to upload photos (5 at a time) to her server so that her clients can view them. The issue we are running into is the immense size of the photos and the length of time it is taking to get the photos uploaded. Currently, I am using a combination of cfflush and cfimage's rezizing capabilities to make things smaller, but it is still not ideal. Being a photographer, as you may well imagine, these pics are huge, high res pics. What I am discovering is that the file is being uploaded through the http stream prior to hitting my flushing/cfimage work. I came across your code in looking for ways to stream the form submission.
With my long-winded explanation out of the way, do you think that your proof of concept might be the way to go? Any thoughts would be appreciated.
~Clay
By the way...have learned a ton from your site. Thanks.
@Clay,
That's a classic problem. In the past, I've actually gone the "use FTP" route. If your client is willing to use a modern browser, there might be some things you can do with the new File API to make the experience a bit more reasonable. However, if they are large files, there's only so much you can do to get around the issue of bandwidth, which is usually not as available going up than coming down.
I wish I had better advice. Is the process failing? Or just taking too long for a good user experience.
It is just taking too long for a good user experience. SInce my post, I have shown her how to batch resize in PS and I am using cffileupload to allow multiples at the same time with user feedback. I just did this for her and have not had a response yet.
@Clay,
Yeah, sometimes you *have* to ask the client to take on a little of the responsibility. There's only so much you can do to make uploading a more enjoyable experience. Good luck!