Ask Ben: Tracking File Downloads In ColdFusion
I am working on a project that includes a little function that I am having trouble with. I have been snooping around various CF resources, and haven't seen anything close to what I'm trying to do.
In a nutshell, I have a library application that allows people to download docs once they've logged in. This part is not trouble.What I'd like to do is to record when someone downloads a doc. My thought was to have the href link to the doc include an onclick event, which calls a function which would do an sql insert, adding the userID, docID and date to the database.
I'm not proficient in cfc's and cfscript, so if you can give me an example I can modify, I'd be greatful.
First, let me tell you that I don't know what the best solution is. These three method discussed below are variations on things that I have done - each has its time and place. In order to track the downloads for a particular file, I will either route the file download through a proxy that logs the document activity and then forwards the user to the requested file or I will use a mouse event tracker like you mentioned. For this demo, I am logging the document requests to a text file, "download_log.txt," but this could just as easily be a database or an XML file or what ever kind of data persistence model you like.
Let's start out with a download proxy. Here is the code for the proxy.cfm ColdFusion template. It takes only one URL parameter, File. This is the name of the file that the user wants to download. This demo assumes that all the files are in a known, web-accessible directory. If you your files are in specific directories, then you will probably have to alter the ColdFusion CFLocation tag:
<!---
We do not want any output for this page. Therefore, we
are going to set it so that only content within a set
of CFOutput tags will be written to the content buffer.
--->
<cfsetting
enablecfoutputonly="true"
/>
<!---
Param the file value in the URL. This will be the name
of a file that we are going to let the user download. This
assumes that all of our files are in the same directory
(otherwise, we would need to know more about the location
for this specific file).
--->
<cfparam
name="URL.file"
type="string"
default=""
/>
<!---
Log this download information. This logging could be to
a database, but for the purposes of this example, we are
going to log this download to a text file. When logging,
we are going to capture the following information as a
pipe-delimited list:
SESSION.User.ID -- ID of the logged-in user.
URL.File -- Name of the file being downloaded.
Now() -- Date/Time of download.
--->
<cffile
action="APPEND"
file="#ExpandPath( './download_log.txt' )#"
output="#SESSION.User.ID#|#URL.file#|#Now()#"
addnewline="true"
/>
<!---
Now that we have logged information about the file
download, we can let the user download the file. There
are two ways to go about this. We can use CFContent to
stream the file via ColdFusion or we can just use a
CFLocation to forward the use to the location of the file.
CFContent would allow us to stream non-web-accessible files
but puts processing time on the ColdFusion server.
CFLocation does not really put any work on the CF server,
but it means the files must be accessible via the web.
--->
<cflocation
url="./#URL.file#"
addtoken="false"
/>
Now that we have our proxy.cfm ColdFusion template in place, we have to go back and alter our links to point to the proxy template rather than directly to the files:
<html>
<head>
<title>File Download Proxy In ColdFusion</title>
</head>
<body>
<!---
When outputting the file links, instead of linking
directly the file, route the user through the
proxy template. This will log the clicks and then
forward the user to the requested file.
--->
<p>
<a
href="./proxy.cfm?file=#UrlEncodedFormat( "picture_1.jpg" )#"
>Picture 1</a><br />
<a
href="./proxy.cfm?file=#UrlEncodedFormat( "picture_2.jpg" )#"
>Picture 2</a><br />
<a
href="./proxy.cfm?file=#UrlEncodedFormat( "picture_3.jpg" )#"
>Picture 3</a><br />
</p>
</body>
</html>
Notice that we are URL encoding our file names. This is the safe thing to do since we never know what kind of crazy characters there are in the file names. Also notice that we do not decode the file name on the proxy page. ColdFusion will automatically decode URL values for us (as far as I know).
After clicking around on the links, here is what our download_log.txt file looks like:
4|picture_1.jpg|{ts '2007-05-15 08:49:27'}
4|picture_2.jpg|{ts '2007-05-15 08:49:33'}
4|picture_2.jpg|{ts '2007-05-15 08:49:37'}
4|picture_3.jpg|{ts '2007-05-15 08:49:47'}
4|picture_2.jpg|{ts '2007-05-15 08:49:49'}
4|picture_1.jpg|{ts '2007-05-15 08:49:51'}
4|picture_2.jpg|{ts '2007-05-15 08:50:05'}
4|picture_2.jpg|{ts '2007-05-15 08:50:31'}
4|picture_3.jpg|{ts '2007-05-15 08:55:58'}
4|picture_1.jpg|{ts '2007-05-15 08:55:59'}
4|picture_1.jpg|{ts '2007-05-15 08:56:20'}
I am capturing the full file name, but it sounds like you have a good document ID to grab - you would use that instead as IDs are just about always better to use than file names (which might change).
Now, this example uses a CFLocation to forward the user to a given file. This has some down sides. For starters, CFLocation can only point to publicly accessible folders. This means that none of these files being downloaded can be outside of the web root. This removes several security options that we could have implimented via ColdFusion. Additionally, if someone were to right-click on the link and do "Save Target As", the file would show up as a ColdFusion CFM file, NOT as the requested file type (since the browser does not know that it will be forwarded to a different document type). This can be deal-breaker.
The upside to using CFLocation is that is puts no stress on the ColdFusion server as ColdFusion is not responsible for dealing with the file; once the user is forwarded to the file, it's the web server's job of streaming the file.
If the security-related down side to the CFLocation method is a deal breaker (which for many, it is), you can always go the ColdFusion CFHeader and CFContent route. Using CFHeader and CFContent, ColdFusion can grab the requested file and stream it to the client with more control and security:
<!---
We do not want any output for this page. Therefore, we
are going to set it so that only content within a set
of CFOutput tags will be written to the content buffer.
--->
<cfsetting
enablecfoutputonly="true"
/>
<!---
Param the file value in the URL. This will be the name
of a file that we are going to let the user download. This
assumes that all of our files are in the same directory
(otherwise, we would need to know more about the location
for this specific file).
--->
<cfparam
name="URL.file"
type="string"
default=""
/>
<!---
Log this download information. This logging could be to
a database, but for the purposes of this example, we are
going to log this download to a text file. When logging,
we are going to capture the following information as a
pipe-delimited list:
SESSION.User.ID -- ID of the logged-in user.
URL.File -- Name of the file being downloaded.
Now() -- Date/Time of download.
--->
<cffile
action="APPEND"
file="#ExpandPath( './download_log.txt' )#"
output="#SESSION.User.ID#|#URL.file#|#Now()#"
addnewline="true"
/>
<!---
Before we send over the content, we might want to try
to narrow down the type of content being streamed. You
can use the file extension to help figure this out.
While this does NOT affect the content of the file
itself, it will help the client deal with file once
it is downloaded.
--->
<cfswitch expression="#ListLast( URL.file, '.' )#">
<!--- Image types. --->
<cfcase value="gif,jpg,jpeg,pjpeg,png,pic,bmp">
<cfset strMime = "image/#ListLast( URL.file, '.' )#" />
</cfcase>
<!--- MS Excel. --->
<cfcase value="xls">
<cfset strMime = "application/msexcel" />
</cfcase>
<!--- MS Word. --->
<cfcase value="doc,mht,rft">
<cfset strMime = "application/msword" />
</cfcase>
<!--- Text. --->
<cfcase value="txt">
<cfset strMime = "text/plain" />
</cfcase>
<!---
Our default value will just send the default mine
type, the octet stream, which is our way of just
saying we have no idea what the file type is.
--->
<cfdefaultcase>
<cfset strMime = "application/octet-stream" />
</cfdefaultcase>
</cfswitch>
<!---
Tell the client to try and open this file inline. This
is the best option if you expect to be getitng lots of
image requests. We can also tell the client what the
suggested file name of the asset is.
--->
<cfheader
name="content-disposition"
value="inline; filename=#URL.file#"
/>
<!---
Stream the file to the client using CFContent. By doing
this, we can grab files that are outside of the web root.
This gives us more security access options. Notice that
we have to use the Full Server Path for this file
since COldFusion needs to know exactly where to find it.
Also, we are passing back the file's mime type which we
calculated above. This will help the client figure out
how to best deal with the resultant file.
--->
<cfcontent
type="#strMime#"
file="#ExpandPath( './#URL.file#' )#"
/>
When we use this technique, the right-click "Save Target As" still comes up as a ColdFusion file, but the file that gets downloaded can be anywhere that ColdFusion can reach. It does not have to be web-accessible which means you can implement all the security around file access that your heart desires.
As a somewhat large down side to this method, the file download is much slower since ColdFusion has to actually read in the binary data and stream it to the client. It just doesn't do it that fast, even on small files. This will probably be most noticeable on images where the image can load a bit at a time.
Ok, so how do we deal with the whole right click issue? If it is important to you that letting your user's right click is a "must have," then using a file download proxy is not the way to go. In that case, I would suggest using the method you mentioned in your question: a Javascript onclick mouse event handler. In this case, we are going to create a download logging page that will track via an onclick event:
<!---
We do not want any output for this page. Therefore, we
are going to set it so that only content within a set
of CFOutput tags will be written to the content buffer.
--->
<cfsetting
enablecfoutputonly="true"
/>
<!---
Param the file value in the URL. This will be the name
of a file that we are going to let the user download. This
assumes that all of our files are in the same directory
(otherwise, we would need to know more about the location
for this specific file).
--->
<cfparam
name="URL.file"
type="string"
default=""
/>
<!---
Log this download information. This logging could be to
a database, but for the purposes of this example, we are
going to log this download to a text file. When logging,
we are going to capture the following information as a
pipe-delimited list:
SESSION.User.ID -- ID of the logged-in user.
URL.File -- Name of the file being downloaded.
Now() -- Date/Time of download.
--->
<cffile
action="APPEND"
file="#ExpandPath( './download_log.txt' )#"
output="#SESSION.User.ID#|#URL.file#|#Now()#"
addnewline="true"
/>
<!---
Since we have logged the file and the client is not
expecting any feedback from this page, we are done.
--->
As you can see, the logging for this ColdFusion template is only slightly different from the ones above it. The actually log to the data storage is the same across all three pages. The only difference is what the ColdFusion template returns to the client. In this case, we are returning nothing in our content buffer (except for some potential white space).
Now, let's look at the HTML page that will tie into this one:
<html>
<head>
<title>File Download Logging In ColdFusion</title>
<script type="text/javascript">
function LogDownload( strFile ){
// Create an image object to ping the file logging.
// While the target CFM file will NEVER return a
// valid image file, we don't really care... we just
// want to trigger the CFM page itself.
var imgPing = new Image();
imgPing.src = (
"./log_download.cfm?file=" +
strFile
);
// Return out.
return;
}
</script>
</head>
<body>
<!---
When outputting the file links, instead of linking
directly the file, route the user through the
proxy template. This will log the clicks and then
forward the user to the requested file.
--->
<p>
<a
href="./picture_1.jpg"
onclick="LogDownload( '#UrlEncodedFormat( "picture_1.jpg" )#' );"
>Picture 1</a><br />
<a
href="./picture_2.jpg"
onclick="LogDownload( '#UrlEncodedFormat( "picture_2.jpg" )#' );"
>Picture 2</a><br />
<a
href="./picture_3.jpg"
onclick="LogDownload( '#UrlEncodedFormat( "picture_3.jpg" )#' );"
>Picture 3</a><br />
</p>
</body>
</html>
As you can see, each link is targeted directly at the file itself. This means that ColdFusion does not have to stream the file. It also means that the right-click "Save Target As" will actually result in the proper file type. When the user clicks on any of the links, it triggers a Javascript function call to the LogDownload() method. This method creates an image object and sets its source equal to the download logger ColdFusion template. Now, the CFM template will NEVER return a valid image binary, but frankly, we don't care. It will be invalid, but since we never do anything with it, this is going to be much easier to code that using any sort of AJAX call.
So, these are three options that can be used to track user downloads. Each has its time and place for use. You have to decide which aspects of each are more important to you.
Want to use code from this post? Check out the license.
Reader Comments
We have taken kind of a hybrid approach to this problem on our internal website. A dynamic page listing files available for download, grouped by category (I will save the admin side of this functionality from the explanation, but suffice it to say that each file is listed in a db table, along with various attributes about the file and a foreign key reference to the category that it belongs in), is initially presented to the user. Clicking on a specific file takes them to a page (something like filepreview.cfm?fileid=1) where they can see additional information about the file they have selected, such as file size (from cfdirectory), description (from the db table), file name, etc. Here they can choose to actually download the file (download file button) or to go back to the main file listing. Clicking the download file button takes the user to a generic proxy page (something like downloadfile.cfm?fileid=1) which logs information about the download (user, datetime, fileid) into a db table, and then attempts to automatically push the file to the user. If the automatic download stops due to a popup blocker or for various other reasons, the page also loads with a link to manually download the file (but at this point, the attempted download is already logged).
@Rich,
I like the idea of having a preview page before you have to download the file. When you say that you provide a link to manually download the file, what is that link? Does that link directly to the file itself? Or are you still going through a proxy?
The final page (once you have confirmed on the preview page that you actually want to download the file) has a javascript that attempts to do a javascript document.url to the actual file, and then the following html of the page is "If the file does not begin to download please click here", with here being a hyperlink to the file itself.
This solution does require us to have the file all be web-accessible, so currently the security of any individual file is only as good as somebody only accessing urls through the ui (which enforces permissions to determine which files you see in your master list).
@Rich,
I think that is good. Personally, I only feel super file security is needed if absolutely necessary. Otherwise, file handling through the web application is just a drain on resources.
I guess a possible medium between the two is copying the file from a secure place to a public place and then linking to it can be done... but this makes me nervous. Having to wait for a file to copy makes me nervous.
@Ben-
For super-secret, super-large files, we do have additional functionality in place. When an authorized users request a file designated as super-secret and super-large, we simply display a page letting them know that we will sending them an email soon when the file is available for them. At that point, it is sent to an ftp site that the specific user has access to and a task is there to make sure that the file only lives on the ftp server for a set period of time (I believe a day currently). An asynchronous process places the file on the user's ftp space and then sends them an email with directions to pick it up. This is a very arduous process, so most of our files go through the process mentioned above.
We actually use a document id and a auth session ID to track downloads. Each link is a form submit that writes the tracked info to a SQL table. We track time, date, user ID, and asset ID. We can then tie the asset ID to the asset table, and the user ID to the user table.
@Rich,
I really like the email with the link idea. I have never thought of that. Player.
@Ben-
One of the (few) benefits of having 99% of your code goes towards a very targeted audience :)
Like I said before, our solution might not work for everyone (or even every case), but for our site (which is accessed by internal employees and employees of our clients) we have control over who accesses the site in general, client machine specifications (they can access the ftp site and the web server), and user information (we can track who did the download, and have their email address to send notifications to). If it helps save somebody else from re-inventing the wheel, so be it.
Thanks for all the feedback and tips.
When you do an onclick function, would that track it as a hit if i were to right click on the link? If so would that not provide you with an accurate result? Or is onClick only for left hand clicks?
@rich, i have used a similar method to yours when it comes to large files although my client needed to be able to download multiple large files so our system zips them up as it transfers them to the ftp. I find the ftp method rules, especially as its really easy to disable an account after 24 hours.
@Simon,
Not sure if the right click will trigger a mouse event. I want to say no, but it will get trapped by onmousedown... but again, can't say without testing.
@Ben,
I just tested it and you were right, onClick doesnt fire for right hand click but onMouseUp does fire.
@Ben-
You've given some very good tips on how to track downloads and have given security some consideration, but I'd like to point out that because you're allowing the filename to be passed in the URL, someone could request any file in the webroot to be downloaded to them. This means that the proposed solution won't handle file security for groups or users, because anyone could conceivably get any file on the server (e.g., href="./proxy.cfm?file=#UrlEncodedFormat( "../../someone_else's_file.doc" )#").
My suggestion to your user would be to 1) depend on storing a list of files and permissions in a db table somewhere, and passing the file's ID in the URL instead of the file's name; and 2) use cfcontent to deliver the file since it can deliver assets from outside of the webroot,. This way, the application logic can check the permissions in the database before allowing the download to occur.
@Tom,
You raise an excellent point. In fact, you could have passed "Application.cfc" in the URL and it would have downloaded the App file (assuming CFContent, not CFLocation). I agree; an ID is ALWAYS the way to do. IDs do not give away any information about the file behind the scenes and limits the file to one that corresponds to some sort of data cache.
If you don't have an ID, I would recommend Encrypting the file name at the very least.
Thanks for raising some excellent points.
@Tom,
With the id method would you be passing in the document ID and checking it against a permissions join table? My thought is that if you pass in an id couldnt somoene just change the id to get access to a different file?
@Simon-
You're right that just using an ID by itself wouldn't secure the file, but I was assuming a certain kind of application logic would be used to make sure that only authorized users could get at certain files. For instance, the permissions table could hold a lookup ensuring that certain files were only available to certain groups or certain users. If you modified the URL string in your browser to pass in the ID for a different file, the application wouldn't find you via the database lookup and wouldn't download the file to you. Let me know if that doesn't make sense.
@Tom,
That makes sense.
@anyone
If a user was to right click and to choose save target as, would they be able to save the file? Or would this try and have them save a .cfm file. I have had instances that when they have chosen to save target it has prompted as the file name "a file name.cfm" even though they could change it to the correct file extention and it would work, that isnt the way i would like it to work. Has anyone experienced that?
@Simon-
I've experience the same problem, where the Save File dialog box presents you with a .cfm filename. The inclusion of "filename=..." in the cfcontent tag, like Ben does, is supposed to specify the filename but doesn't always work. I find IE to be a particularly problematic browser for this.
I've seen one kludgy workaround for the issue: instead of sending URL parameters via a query string (e.g. "index.cfm?file=6"), send them as slash-separated path info with the desired filename right at the end (e.g. "index.cfm/file/6//filename/Business Plan.doc"). There are several code libraries out there which can help you parse the CGI.PATH_INFO variable. The server gets the variables, and the browser thinks it's downloading a file with the name of whatever comes after the last slash.
Whoops-- I shouldn't have included the double foreslash in my comment above. The proposed path info URL should be "index.cfm/file/6/filename/Business Plan.doc"
@Tom, i love the slash idea, that makes sense. I am going to be changing the site over to slash notation for better SEO so that would match the rest of the site as well.
SWEEET! thanks!
I would tread lightly around <cfcontent>. As Ben touches on, one has to wait around whilst CF transfers it (why CF does not pass this off to the file system, I have no idea).
That is SLOW, but it does not impact server stability.
What DOES impact server stability is that whilst the file is transferring, <cfcontent> holds a thread open. This is not a consideration for occasional small file transfers, but it's a significant one in busy environments or when large files are concerned. Unless one is careful, one can very quickly end up with <cfcontent> transfers holding open all available threads, and the CF server basically grinding to a halt.
Not so nice.
--
Adam
@Adam,
Thanks for driving that point home. I don't generally work with high traffic sites, so did not want to speak with authority... but that is exactly what I was thinking.
Adam is absolutely correct on the impact of using CFCONTENT for file downloads. Avoid it unless you absolutely have to have the security. If you can't avoid it, make sure you're using your own server/instance and that the simultaneous requests setting in the CFAdmin is set high: not 3 times the no. processors as generally advised, but 10 or more. Just yesterday I had a server lockup several times as I watched the thread count rise during a session where I suspect a group of students had been told to go and download some documents.
If you're on a shared server, then use of CFCONTENT for large downloads could be considered anti-social behaviour.
Ben, thanks a bunch for your tutorials. It's a huge help figuring out ColdFusion.
I'm using the second version of the above to change the name of existing files to something a little more web friendly. It works great on small files, but dies on large ones (~40MB). Any guesses?
@Jason,
Glad to help in some way. Streaming files can take up some resources, but from what I have seen, ColdFusion seems to handle it well. When you say "dies", how do you know? Does ColdFusion service restart?
@Ben
Doh! Turns out I had some links that were bad. 404s a plenty.
Stay kinky! ;-)
@Jason,
No worries man. I'll be keeping it kinky as long as I can :)
i m facing a issue with cflocation , if file name is same for 3-4 entries in same page, nd user try to download, then it display the same file every time
Could u pls let me know solution of this problem
@Piyush,
In cases like that, you have to come up with a way to unique identify the file in the URL. Either you need to use some sort of database-generated ID, or perhaps you could use something like the hash() of the file path.
After all, the files *have* to be in different directories, if they have the same name, or they would *be* the same file. As such, you just need a way to integrate that difference into the URL.
Ben, you made a comment in the proxy.cfm that this logging could be to a database too. Could you give me an idea how to accomplish that.
Thanks
David