Using A File Upload Or The Request Body To Overcome The Slow Processing Of A Large Form POST In Lucee CFML 5.3.3.62
Yesterday, I demonstrated that Lucee CFML appears to incur a request-cloning cost when spawning CFThread
tags. In most circumstances, this cost is negligible. However, if the parent HTTP request is generated by a large Form POST, the cost of spawning a CFThread
tag can become shockingly expensive. While discussing this issue on the Lucee CFML Dev Forum, Joe Gooch suggested that sending the content through as a File Upload may get around the "re-processing" times. A quick test proved this to be true. This literally rocked my world! So, I wanted to share the performance differential of various Form POST techniques in Lucee CFML 5.3.3.62.
To quickly recap the findings from yesterday, if I posted a large value in a form field using a Content-Type of application/x-www-form-urlencoded
, I get two outcomes:
The initial cost of parsing the incoming HTTP request can be expensive.
The cost of spawning an asynchronous
CFThread
tag re-incurs the initial cost of the form-field parsing (as part of the PageContext cloning that Lucee performs under the covers).
To see this in action, I set up a simple ColdFusion page that does nothing but log some metadata about the request and then tries to spawn three CFThread
tags. The relative execution time of the spawning and the execution of the threads are also logged:
<cfscript>
// The large content value will be coming through in a single FORM FIELD.
param name = "form.content" type = "string" default = "";
dump( getHttpRequestData().headers );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
startedAt = getTickCount();
systemOutput( " . . . . . . . . . . . . . . . . . . . . . . .", true );
systemOutput( "DEMO: As Form Field", true );
systemOutput( "Responding to FORM post", true );
systemOutput( "Field length: #numberFormat( form.content.len() )#", true );
systemOutput( "Sample: #form.content.left( 50 )#", true );
// Here, we are just looking to see how long it takes to SPAWN the following threads
// as a function of the incoming HTTP request size.
thread action = "run" {
systemOutput( "Inside 1st thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 2nd thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 3rd thread - #delta( startedAt )#", true );
}
systemOutput( "Done spawning threads - #delta( startedAt )#", true );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
public string function delta( required numeric tickCount ) {
return( numberFormat( ( getTickCount() - tickCount ) ) );
}
</cfscript>
As you can see, the data payload is going to come through as the form field, form.content
. To generate test data, I took the string, "This is a long field value. "
and repeated it using the .repeatString()
method with increasing repetition-counts:
- 10,000 times.
- 20,000 times.
- 30,000 times.
- 40,000 times.
- 50,000 times.
Then, using Postman, I posted the increasingly large form-field values. And, if we then look at the Lucee Server logs, we get the following output:
As you can see, as the size of the Form field increases, so does the cost of spawning CFThread
tags. So much so that by the time we post a 1.4MB Form field, each CFThread
tag takes close to 30-seconds to spawn! This blocks the overall request, which then takes over 80-seconds to spawn three threads.
Now, what you can't see in this output is Lucee's initial cost of processing the incoming HTTP request (before my code even executes). I don't have exact numbers to back this up; but, the cost of spawning a CFThread
feels like it somewhat mirrors the initial cost of the request. Meaning, once I hit the Send
button in Postman, it takes quite a while before I even see log-output start to show up.
In fact, if I look at the Postman metadata for my last post, I can see that posting the 50,000 repetition-test in the Form field takes 1m33s to complete. And, since we know from the server logs that the code-facing portion took about 84-seconds to complete, we can deduce that the initial HTTP request processing took about 9-seconds before my code even started executing. And, keep in mind, this is in my local development environment!
Ok, now that we're up-to-speed on the performance problem of posting large Form Fields in Lucee CFML, let's get to the good stuff! Joe Gooch's suggestion of trying to post the content as File instead of a Form Field.
To test this, I slightly tweaked the test code. Instead of looking in a Form Field for content, the Form Field will now contain a File with the content. We then read the content directly from the Temp file:
<cfscript>
// The large content value will be coming through in an attached FILE.
param name = "form.contentFile" type = "string" default = "";
dump( getHttpRequestData().headers );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
startedAt = getTickCount();
// When ColdFusion sees a file upload, it places it in the TEMP directory first.
// Since we're going to read the file in without using the CFFile[upload] action,
// let's at least validate that the given path is in the TEMP directory and not just
// any arbitrary path provided by the user.
if ( getDirectoryFromPath( form.contentFile ) != getTempDirectory() ) {
throw( type = "InsecurePathReference" );
}
// Read the content of the uploaded file (and then delete the file).
content = fileRead( form.contentFile );
fileDelete( form.contentFile );
systemOutput( " . . . . . . . . . . . . . . . . . . . . . . .", true );
systemOutput( "DEMO: As File Upload", true );
systemOutput( "Responding to FORM post", true );
systemOutput( "File length: #numberFormat( content.len() )#", true );
systemOutput( "Sample: #content.left( 50 )#", true );
// Here, we are just looking to see how long it takes to SPAWN the following threads
// as a function of the incoming HTTP request size.
thread action = "run" {
systemOutput( "Inside 1st thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 2nd thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 3rd thread - #delta( startedAt )#", true );
}
systemOutput( "Done spawning threads - #delta( startedAt )#", true );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
public string function delta( required numeric tickCount ) {
return( numberFormat( ( getTickCount() - tickCount ) ) );
}
</cfscript>
As you can see, this time, we're using the fileRead()
function to read the posted file directly from the temp directory. Now, since I know this approach is going to be faster, I only tested the 10,000 repetition-test and the 50,000 repetition-test lengths (posting the file using Postman). And, when doing so, we get the following server logs:
As you can see, uploading the large content value as a File allows us to completely bypass the original overhead of spawning asynchronous CFThread
tags. Furthermore, when going from the 10,000 repetition-test to the 50,000 repetition-test, the relative overhead barely changes at all! This is amazing!
So, just to be clear, here's the timing of the two approaches we've seen so far:
- Form Field: 50,000 repetition-test - 84-seconds.
- File Upload: 50,000 repetition-test - 91-milliseconds.
After I shared these findings on the Lucee CFML Dev Forum, Brad Wood suggested that I might try posting the large content value as the request body with a Content-Type of something like text/plain
. The idea being that it was the form field decoding that was causing the slowness; so, anything that prevents Lucee from parsing the input may result in the same performance increase.
ASIDE: ColdFusion only parses the incoming request as "Form Data" when the Content-Type is either
application/x-www-form-urlencoded
ormultipart/form-data
. If we send a request using some other Content-Type, ColdFusion will just let the payload come through as-is.
To test this, I refactored my demo to look for the large content value directly in the getHttpRequestData()
:
<cfscript>
dump( getHttpRequestData().headers );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
startedAt = getTickCount();
// The large content value will be coming through as the request BODY.
content = getHttpRequestData().content;
systemOutput( " . . . . . . . . . . . . . . . . . . . . . . .", true );
systemOutput( "DEMO: As Body Content", true );
systemOutput( "Responding to FORM post", true );
systemOutput( "Body length: #numberFormat( content.len() )#", true );
systemOutput( "Sample: #content.left( 50 )#", true );
// Here, we are just looking to see how long it takes to SPAWN the following threads
// as a function of the incoming HTTP request size.
thread action = "run" {
systemOutput( "Inside 1st thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 2nd thread - #delta( startedAt )#", true );
}
thread action = "run" {
systemOutput( "Inside 3rd thread - #delta( startedAt )#", true );
}
systemOutput( "Done spawning threads - #delta( startedAt )#", true );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
public string function delta( required numeric tickCount ) {
return( numberFormat( ( getTickCount() - tickCount ) ) );
}
</cfscript>
In Postman, I am posting this as a "raw" Body value with Content-Type text/plain
. This will cause Lucee to bypass any HTTP request processing (ie, leaving the FORM
scope empty). And, I'll have to look in the getHttpRequestData().content
property for the posted value.
Again, since I know this is going to be faster, I only tested with 10,000 repetition-test and 50,000 repetition-test. And, when we run these tests, we get the following Lucee log output:
Posting the content as the HTTP Request Body is not quite as fast as the file-upload; but, it's still blazing fast when compared to the Form Field approach. Since spawning an asynchronous CFThread
tag has to clone the PageContext, it makes sense that having a large Request Body would have some impact on the overall speed, as the Request Body is reported as part of the PageContext.
So, to summarize, here is the relative cost of code execution for each approach when using the 50,000 repetition-test:
- Form Field: 50,000 repetition-test - 84-seconds.
- File Upload: 50,000 repetition-test - 91-milliseconds.
- Request Body: 50,000 repetition-test - 470-milliseconds.
I'm super excited about these results because it really clues me into a processing bottleneck that I've been seeing in my code; and, shows me a very clear path to improvement. A huge shout-out to Joe Gooch and Brad Wood with their stellar advice on how to get around the high-cost of posting large values in Form Fields in Lucee CFML 5.3.3.62. They done rocked my world!
Want to use code from this post? Check out the license.
Reader Comments
Wow. This is an amazing investigation.
But, it seems like the Lucee engineers need to try and address this issue, so that requests are optimized better. But, in the meantime, this is a great work around.
@Charles,
Yeah, it definitely seems like they can make some better choices under the hood - like, why are they re-decoding the form-fields just to spawn the threads -- seems like they could just duplicate the data. Brad Wood asked me to open a ticket about this part of it, which I plan to do this weekend.
Dude,
I haven't touched CF in a hot minute but this post is so awesomely documented I was hooked. Never mind true crime, nerds better read this. With a healthy dose of lines like "Tweaking garbage collection only gets you so far - at some point, you just have to stop using so much memory (ancient Chinese proverb). " this is classic Ben Nadel hotness right here. Keep up the good work!
@Jean,
Ha ha, good to hear from you, my friend :D I am glad this was enjoyed. What are you working on these days?
I don't know if this is the same thing, or maybe this has been semi-patched since this post, but I ran into a similar issue submitting a large text field (a Base64 encoded PDF). Simply changing the form's enctype to "multipart/form-data" seems to have resolved it (after days of hair-pulling, of course). So, it seems like form fields can also work just fine, and "application/x-www-form-urlencoded" may be the only guilty party, here.
@BJ,
That's awesome to hear! I believe this was filed as a ticket in the Lucee Jira system; so, it's definitely possible that parts of the underlying issue have been patched - depends on which version you are running. But, this is a great point to know about - thanks for sharing! 💪🙌