In the past, I have been asked several times how to create secure file downloads without tying up resources in the ColdFusion thread pool. Really, when you want to create a good, secure download, you would want to funnel the file requests through some sort of ColdFusion control logic and then stream a non-web-accessible file to the user via ColdFusion's CFContent tag. The problem with this is that the CFContent tag ties up a ColdFusion thread for the entire duration of the download. This works nicely, but ColdFusion shouldn't be responsible for streaming files - let IIS do that; ColdFusion should be freed up to handle the application logic and work flow.
One thing that I have suggested in the past was to copy files to a new, web-accessible location before forwarding the user directly to the file. This would put the onus of the download on the web server, not ColdFusion. It would create a somewhat non-secure download since the file is now in a public location, but we can take steps to make sure that that is a "good enough" solution.
I told people that this was a theoretical suggestion since I had never tried it out myself. But, curiosity got the best of me and I wanted to see if I could come up with a really simple solution for this problem. I think what I came up with is pretty decent. You start off with a directory of secure files that are above the web root. Then, you have a list of files that a user can download. When they click on a file link, we funnel this request through a ColdFusion action page that checks to see if the requested file is already accessible somewhere. If it isn't, it copies the file to a public folder which has a UUID directory name. Once the copy is done, the ColdFusion page forwards the user directly to the new file location so that IIS can handle the file download.
There's a bit more too it (not much more), but let's take a look at the set up first. Here is my Application.cfc file:
Launch code in new window » Download code as text file »
When the application is initialized or re-initialized, we start off by clearing out all of our temp (publicly-accessible) files. Then we create a cache of file names. This cache is keyed using the requested file name and each item in the cache has the name of the randomly-named directory that houses each publicly accessible file and the date/time stamp on which that directory was created.
Then we list out each of our secure files (these files are not web accessible at this point):
Launch code in new window » Download code as text file »
Notice that each file link goes through the ColdFusion page, get_file.cfm. The get_file.cfm page checks to see if the requested file has been made publicly available yet. It does this be checking to see if the file is in the application cache. If it is not, then a public directory is created using a UUID name and the secure file is copied to the public directory and the name of the directory and creation date are cached in the application. If, however, the file is already publicly accessible, it checks the cache to see when it was created. In our demo (for testing purposes), if the directory was created more than two minutes ago, the public directory [that contains the file in question] is renamed to a new UUID and its new name and creation date are cached in the application. This renaming of the directory helps to keep the direct-file-access availability to a minimum while at the same time not having to re-copy the file every time it is requested.
Let's take a look at get_file.cfm:
Launch code in new window » Download code as text file »
The concept is fairly straight forward - we are periodically moving accessible files to new directories so that if anyone has the direct-file link bookmarked, it will only be available for a certain amount of time. Now, in the above file, the file-move is triggered by user actions; you could, however, have some sort of a scheduled task that runs every 20 minutes or something and renames old directories. I try to avoid using scheduled tasks whenever possible and I felt that since this kind of technique would really only make sense on a high-traffic site, the user-triggered event would be more than sufficient.
At first, I was concerned that it would take too long to copy files from the private directory to a public directory, but it was surprisingly quick. In my demo, the video files were about 730 Mega bytes and they copied over, on my slow local development server, in about 24 seconds. This seems like a long time for a SAVE AS prompt to come up, but realize also that we are only incurring that time cost on the first file request (for that file). All subsequent file requests will be made directly to the publicly accessible, but randomly located file. Smaller files, like the JPGs were copied over instantly.
Again, this is a "good enough" solution. It is not mean to be ultra, super secure. If someone wanted to, they could get a publicly accessible file link and email that to a million people who could all start downloading. But really, that's not what we're worried about - those people could just as easily download the file and send the file to a ton of people. Really, what this solution is for is to keep honest people honest and do as much as possible with as little effort to get people to go through standard procedures to download files.
Now, as much as this is less theoretical, since I have tried it out myself, I am still not sure that I would recommend this. I know very little about high-traffic download sites. I am sure that there is a lot more to consider and many patterns, techniques, and best practices have already been discovered for this type of problem. This, however, is what I could come up with this morning.
Download Code Snippet ZIP File
Comments (42) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Using ColdFusion To Stream Files To The Client Without Loading The Entire File Into Memory
OOP Data Validation Using Validation Behaviors In ColdFusion
My only complaint is that it's not 100% secure. There is a way to secure downloads but Adobe has to make a chance to cfcontent to do so. Right now, in order to cfcontent something down to the end user's client, they read the entire file into memory. There's no such thing as a buffer or anything? :P Anyway, I did put as an enhancement request for CF9 and if you want the code that I researched / demo'ed, I'd be happy to share it.
Posted by Todd Rafferty on May 14, 2008 at 9:09 AM
@Todd,
Are you saying that you don't like CFContent because is ties up a ColdFusion thread? Or because it reads the whole file into memory? I am curious to which of these issues your enhancement addresses? Also, did you come up with a way to do it in CF8? Or was your demo theoretical?
Posted by Ben Nadel on May 14, 2008 at 9:17 AM
Well, I hate both aspects of CFContent. :) The fact that it has to load the entire file into memory instead of just buffering and streaming the buffer is just silly. Then again, that would make CF a streaming server if they changed it, wouldn't it? Probably why they didn't do it. As for cfcontent taking up a thread, for the sake of security, I'll deal with it, just buffer/stream the file appropriately to make the memory aspect of it at little more friendly. They did it for cffile's upload, they can certainly do it for downloading.
The code I have does work and does buffer/stream. I have not load tested it at all to see if it was on the right path. Said code is at home at the moment (for some reason), so I'll have to get back to you on it. If you're impatient, you can go here ( http://www.realitystorm.com/experiments/flash/streamingFLV/index.cfm ) and check this code out yourself and rewrite it to make it a little more generic (which is all I did).
Posted by Todd Rafferty on May 14, 2008 at 9:26 AM
You'll also have to cross reference Christian Cantrell's post as well:
http://weblogs.macromedia.com/cantrell/archives/2003/06/using_coldfusio.html
I had to use those two references to write the file as it wasn't 100% perfect, but those are my notes that I can share with you.
Posted by Todd Rafferty on May 14, 2008 at 9:28 AM
@Todd,
I actually have used that Christian Cantrell post before, especially in pre-MX7 days before the CFContent tag had a Variable attribute for binary content. When the CFContent added the Variable attribute, I stopped using this. But, this didn't stream anyway, this was just a way to send binary data to the client.
The Reality Storm post does a similar thing, but rather than writing the whole binary content at one time, it is periodically reading in the flushing the content.
So, I guess the benefit here is not that it frees up a thread, but rather that that thread in question is not hogging memory?
Posted by Ben Nadel on May 14, 2008 at 9:35 AM
Correct. CFContent as it stands reads the entire file into memory to send it down to the user. With the code that I had written, I was able to create a buffer of 10 megs and stream that. So, only 10 megs of continuous memory was being used for that one request. So, if my client has 200 meg PDF files (and, I could show you a few), then it's not going to shove everything into memory, it's only going to buffer 10 megs at a time. That 10 megs was arbitrary so, I could set it to anything I want.
What I would be curious about and I doubt if Adobe is going to answer this. If the 1st request requested my 200 meg 'abcdef.pdf' file and a 2nd request came in for the same pdf, does it check the memory for it first (since it's already there?) or is it going to get a new copy of the file? If it's going to the memory first, then cfcontent's as it exists today is probably the more efficient way to do it. Since we don't want 400 megs lingering in memory, we want 200 megs being served out to 2+ requests.
This whole cfcontent thing is a mess because I have written a document library for clients. I need absolute tracking / security and I need cfcontent to serve out that file, but not at the expense of killing the server.
Posted by Todd Rafferty on May 14, 2008 at 9:44 AM
An off topic nit pick - but important. Your onRequestStart has:
<cfset THIS.OnApplicationStart() />
Don't forget that when you call CFC methods using this.X, it acts like an "outside" call. If X were private, you would get an error. You should do
<cfset OnApplicationStart() />
instead.
Posted by Raymond Camden on May 14, 2008 at 9:54 AM
@Todd,
I would doubt that it is checking the memory for existing buffers. I think the use case of two users downloading the same file concurrently is probably fairly small, unless its something like a Product download on a hugely popular site?
I hear what you are saying about the document library. I think that is where we all ultimately want to go with a scenario like this. I'd like to give that code-write a go. I know you have written it already, and that other site has a version, but I love to flex my brain muscles :)
Posted by Ben Nadel on May 14, 2008 at 9:55 AM
@Ray,
Yeah, that is weird habit of mine. I love to see things scoped. What's ironic is that it works against me sometimes. If the method were private, you could call it like this:
<cfset VARIABLES.Method() />
Sure, if the access changes, then you have to change where it is accessed, but I am not so worried about that. What really bothers me, though, is that you can't use Named arguments with private methods:
<cfset VARIABLES.Method( FirstName = "Ben", LastName = "Nadel" ) />
This throws an error for some reason saying that it can't be called using named arguments and to try calling it using ordered arguments. Of course, if it were THIS instead of VARIABLES, this works just fine.
So, that irks me. But really, you are right, there is no need to scope these methods at all. I think I did it to demonstrate that they were not built-in ColdFusion methods, but it's probably something that I will drop eventually.
Posted by Ben Nadel on May 14, 2008 at 9:58 AM
@Ben : Flex those brain muscles dude, have at it. I showed you my notes, it'd be interesting to compare.
Posted by Todd Rafferty on May 14, 2008 at 10:01 AM
@Todd,
I have to believe that notes will definitely be compared :)
Posted by Ben Nadel on May 14, 2008 at 10:03 AM
The other option is to integrate the web tier with your application's security. It can be a pain, but I've written an ISAPI filter that, before sending a file, delegates a security check back to the CF app via. a separate http request (you could also communicate via. db or just about any other mechanism - both processes are on the same server). Keep in mind that ISAPI (or the Apache equivalent) has access to session cookies set by your CF app.
Also keep in mind that app server threads aren't as expensive as they used to be, especially if all they are doing is chunking bytes to an output stream.
Posted by Dave Ross on May 14, 2008 at 10:31 AM
@Dave Ross: I'd really like to see a blog post on that.
Posted by Raymond Camden on May 14, 2008 at 10:34 AM
@Dave : And if you're using Apache/Linux?
Posted by Todd Rafferty on May 14, 2008 at 10:37 AM
@Dave Ross: Also, majority of my files have some kind of security restriction so they're not web accessible at all which is why cfcontent is cool, because you can put it wherever and read it from that location as long as the permissions are in place. How would Apache/IIS even go after those if it's not defined globally in some settings or setup as an alias (which defeats security)?
Posted by Todd Rafferty on May 14, 2008 at 10:40 AM
@Todd - you would use a virtual directory within the webserver that has the filter applied to it.
1) Make a virtual directory that maps to where you keep your files
2) Apply the ISAPI filter to that virtual directory
The filter would redirect someone to whatever your login page is (or just return a 403) if someone wasn't logged into the application (or your application tells the filter that they shouldn't access that particular file.
@Ray - I'll do my best. After all was said and done we ended up using cfcontent, and adding threads and hardware.
Posted by Dave Ross on May 14, 2008 at 10:51 AM
for Apache I would assume you'd write a filter
(http://httpd.apache.org/docs/2.2/filter.html)
Posted by Dave Ross on May 14, 2008 at 10:52 AM
This is a very interesting topic to me. I'm working on a document library as well. I need full security to make sure person in department a isn't looking at files they shouldn't be seeing in department b. I'm using cfcontent at this time, but I'm concerned about the memory usage too.
I like Dave Ross' idea about an ISAPI filter, but even a CF solution that keeps the memory footprint small per download would be workable. (Even at the expense of being a slightly longer download time)
I like Ben's original solution on this post, but only if a cleanup is done afterwards to remove the temp file to retain security.
Dan
Posted by Dan Sorensen on May 14, 2008 at 11:42 AM
@Dan,
The problem with clean up is that from a Server standpoint, you have no idea how long the download will take. You'd have to keep checking the file to see if its locked.
Posted by Ben Nadel on May 14, 2008 at 11:53 AM
I'll chime in and add that we actually handle secure file downloads on another server (although not necessarily so) with a PHP script. My ColdFusion download page creates a encrypted reference to a primary key in the DB and passes it to the browser with a redirect to a PHP script with the encrypted reference in the URL.
The PHP script then parses the URL, makes a call back to the CF-powered app tier via web services which will return the authorization to download (by giving a file path outside of the webroot). The PHP script at that point reads and streams the file to the client with the appropriate HTTP headers.
The reason for this approach had to do with scaling existing infrastructure (our problem was that downloads were thread-bound in the app-tier). Move the download process to PHP (on a different server in our case) and your shortage of ColdFusion threads clears right up!
Posted by Clint on May 14, 2008 at 11:56 AM
@Clint,
That is a cool idea.
Posted by Ben Nadel on May 14, 2008 at 11:59 AM
@Clint : Creative, but don't think it's about time we poke Adobe and have them address this issue? :)
Posted by Todd Rafferty on May 14, 2008 at 12:22 PM
If you are running on a Unix box, you can use <cfexecute> to create a Symlink (ln -s /path/to/sourcefile /path/to/tempfile), redirect the user to this file, wait a few seconds (<cfthread action="sleep" duration="5000">), then remove /path/to/tempfile.
As long as the user has started downloading the file, they will receive the full file, and in addition you don't have all the disk IO involved in copying the large file, and you also don't have to worry about other users being able to snag the file even 10 seconds later.
The way that most Unix filesystems work, files can be deleted even while there are open read handles on the file, the file is not cleaned up from the disk until the last read handle is closed - though no new read handles can access the file, for users not already reading the file it's effectively gone.
Posted by Eric on May 14, 2008 at 1:19 PM
@Eric : Yeah, I remember looking at that when I was researching all this. I even looked into something for windows as well (it's a mess on windows, not even worth it).
Posted by Todd Rafferty on May 14, 2008 at 1:24 PM
We've had this problem for a while. To enable secure downloads without having to have temporary files I wrote a c# asp.net page. The page communicates via webservice with the coldfusion application and then returns the file.
The asp.net page needs to run under IIS.
1. Create a webservice in coldfusion. The service takes information about the download request and determines if the current user in the session has access to the specified file.
2. create webservice stub in C# and compile it to a dll
3. create a .aspx page in the same directory as the webroot of the coldfusion application.
Something like
cfapp\wwwfiles\index.cfm
cfapp\wwwfiles\download.aspx
4. using the webservice stub, the aspx page adds the cfid/cftoken cookies to the request to the webservice and asks the cf app if the user is authorized.
5. The .aspx page either shows an error or returns the file.
Posted by JohnEric on May 14, 2008 at 1:52 PM
@JohnEric - Doesn't this just shift the tied up request from CF to ASP? You still end up with a thread somewhere doing nothing but reading the file and passing it to IIS which does nothing but read from your middleman and send it to the end user. The idea is to get ourselves out from the middle of the transfer work to free up server resources.
Posted by Eric on May 14, 2008 at 1:56 PM
A thread is a thread - there will always be one per ongoing download. The problem is that with CF it's tough to differentiate between threads and prioritize them effectively.
So, another option is to run CF enterprise, and have a second "content hosting" instance which has a high number of request handling threads (hundreds if not thousands). You would stream using standard java io (example was posted above), and communicate with the "application" instance via. webservices to establish security.
Posted by Dave Ross on May 14, 2008 at 2:33 PM
@Eric : 1 questions about using cfthread to delete the file after redirecting the user:
1. how are you making sure that the cfthread continues after the user has been relocated - are you starting and sleeping the cfthread before the cflocation occurs?
2. using this technique, is the server-client connection completely cut off by the relocation? I ask as I wonder what happens to any remaining output after the cflocation? (not that there should be any, but I'm curious)
thanks!
Posted by jdbo on May 14, 2008 at 2:46 PM
@jdbo: You don't have to maintain a read handle on the file from CF, once the user follows the redirect to the file, Apache or whatever your web server is will obtain the read handle in order to deliver the content to the user.
You just have to make CF sleep for long enough that you're certain the user will have started their request for the file before you delete the symlink. My example was 5 seconds - even a very slow connection should have been able to make that round trip in this time.
Here's my lots of caveats: You might not be able to use <cflocation> since CF might not send the Location header until after page processing is done (ie, after you deleted the file), and this might even depend on what web server you're using. You might have to resort to a meta redirect or javascript redirect, <cfflush>, then sleep 5 seconds and delete. Further, this might be interfered with if you use mod_gzip or the like too (which might wait for more response data in order to complete a compression unit to send to the browser), in which case you might have to send a few K of random data in a HTML comment after your redirect commands.
The difficulty with this method is convincing the browser that it needs to start reading the new file while you're still inside CF's thread sleep. You can rest easy that once the browser starts to download the file, it's safe to delete the symlink.
Posted by Eric on May 14, 2008 at 3:02 PM
@Ben Nadel: Can. Of. Worms. SPORING! I bet you weren't expecting this kind of response to this topic. :)
Posted by Todd Rafferty on May 14, 2008 at 3:07 PM
@Todd,
I'm happy to learn new stuff :) That's the fun of worms in a can!
Posted by Ben Nadel on May 14, 2008 at 3:10 PM
@Todd,
Brain muscles have been flexed:
http://www.bennadel.com/index.cfm?dax=blog:1227.view
Posted by Ben Nadel on May 14, 2008 at 3:27 PM
@Eric: that helped clear things up immensely - I was make exactly the wrong assumptions (thinking that cflocation and cfthread would work together happily), and you've just saved me a lot of time, so thanks much!
Posted by jdbo on May 14, 2008 at 3:58 PM
Threads/Memory/CFContent - yes memory is a big issue but I actually ran into the threads problem with weekly download of an updated schedule published as PDF. The PDF was very small but since the update was always published at the same time and everyone went to get it at that time the threads were consumed quickly. Even with a small file it just takes a few people with very slow connections to bog things down.
My solution is similar to yours - if file is public then I just copy it to a simple named download dir and redirect. If private then I create a temp download dir with UUID style name and copy file there and redirect. Dirs are flushed based on comparing create time stamp and a settable value. Although not completely secure is does also allow people to use file download utilities.
A really nice solution (especially with large files) is to use Amazon S3 to store the files and set an expiring link. S3 gives you unlimited and cheap storage, delivers the files fast (scales on demand) and of course handles the link expiry.
Posted by Johan on May 14, 2008 at 6:23 PM
@Ben
Interesting approach. We needed something similar to deal with presentations, video and audio for the conferences and I found AdmitOne which seems to work pretty well.
http://www.qwerksoft.com/products/AdmitOne/
I really wish CF was smarter about how to handle these though. I'm reasonably sure that while it doesn't load the entire file into memory, it does block a CF template request while it's running.
Rails (on mongrel) does this much smarter. There's a function that returns a file handle directly back to the web server and the web server streams the file completely separate of the rails application and your code.
To my knowledge CF still ties up a CF thread with requests... or does it do what mongrel/rails does? Can someone confirm this?
Posted by Elliott Sprehn on May 14, 2008 at 9:11 PM
I just read up on the Rails feature - and it's actually not Rails but Apache and Lighttpd that look in the response for a special header - very slick.
http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/
This should work w/ CF on Apache too... quick someone try!
Posted by Dave Ross on May 14, 2008 at 11:13 PM
Wow, that's awesome!
I was actually thinking of another feature where rails passes an open file descriptor back to mongrel and it streams to the client. Rails isn't thread safe so it blocks the mongrel process I think, but a different framework can technically have more than one client streaming files and handling requests at the same time.
X-SendFile definitely appears to the best way to handle this though. I wonder if IIS has a feature like that...
Posted by Elliott Sprehn on May 14, 2008 at 11:29 PM
...snip...
"Rails (on mongrel) does this much smarter. There's a function that returns a file handle directly back to the web server and the web server streams the file completely separate of the rails application and your code."
...snip...
Sounds like this should be a new feature in CF9
Posted by Kurt Bonnet on May 15, 2008 at 1:20 PM
@Kurt: The web server it's running on has to support it. I believe that web servers which already support it can be interacted this way by CF - it only requires crafting a specific response header, so can probably be done in one or two lines of code.
Posted by Eric on May 15, 2008 at 1:25 PM
@Dave - I just tried mod_xsendfile for Apache and it works perfectly. I'm so glad you found this!!! My CF script hands things off to apache for serving so fast it's sick!!! And now my CF threads are free to do real work instead of being tied up serving files, woo hoo!!! I have been looking for a solution like this for a while, I knew it had to be out there! Thanks again!
Posted by Kurt Bonnet on May 17, 2008 at 3:53 PM
Does anyone know of a solution like that mentioned mod_xsendfile for IIS?
Posted by Dan Sorensen on May 19, 2008 at 11:37 AM
I looked for a while and didn't find anything. I'd really like to roll an ISAPI filter for this - it wouldn't be that hard but I'd be concerned about streaming the file properly with my (poor) C++ coding abilities.
Posted by Dave Ross on May 29, 2008 at 8:20 AM