Redirecting Static Requests To Amazon S3 Using IIS Mod-Rewrite Or Apache Mod-Rewrite
In many of my blog posts, I have short, less-than-5-minute demo videos that I have recorded with Jing (a Techsmith product). These videos can range anywhere in size from 1 MB to 30 MB depending on the amount of movement recorded in the video (the more movement, the larger the file size). I've never liked having to stream these videos directly from my server - not only does it tie up one of the few parallel requests that a browser will make to the same domain, it also puts unnecessary load on the server itself. Since the URLs for these video are embedded in database-driven content, I figured there wasn't much that I could do about it; but then, over the weekend, I realized that I could use Mod-Rewrite to forward these "static" requests onto an Amazon S3 server.
When I upload a Jing video to my server using Jing's embedded FTP button, the URL for the uploaded video gets hard-coded to the "http://www.bennadel.com" domain. As such, I figured the only way to stop streaming from bennadel.com would be to go through all the blog posts and actually change the stored content - a task I was feeling very apathetic about.
I realized, however, that I could achieve a 90% solution by trapping those incoming SWF file requests and returning an HTTP-302 redirect header to Amazon S3. This way, the browser still has to make the initial requests to my server; but, at least it is very quickly forwarded onto another domain that doesn't take up the limited number of parallel requests.
To do this, I added a simple redirect rule to my server's mod-rewrite file. I use Apache mod_rewrite locally and IIS Mod-Rewrite in production:
# If the user is requesting a JING video, forward them to
# S3 - no need to tie up the bandwidth on the local server.
RewriteRule (resources/jing/[\d_-]+\.swf)$ http://blog.bennadel.com.s3.amazonaws.com/$1 [R,L]
This simply catches SWF files located in the Jing folder and redirects the request to the same file hosted on Amazon S3. In this rewrite rule, the [R] stands for "Redirect". The [L] stands for "Last." When you are performing URL rewrites, the updated URLs continue to fall through to the subsequent rewrites unless the [L] flag is used.
Amazon S3 is a pretty awesome service. Content delivery networks, in general, are a very powerful tool in the journey towards performant web sites. I am only just beginning to think in terms of the benefits of distributed delivery systems. If anyone has any tips or tricks, I'd love to hear them!
Want to use code from this post? Check out the license.
Reader Comments
I'd also be interested in tips and tricks . . . I'm looking into using S3 to handle some stuff for our site.
@Lola,
I thing one of the things I need to do is start building the concept of different "roots" into my site. So, for example, if I had something like "imageRoot":
... then I could programmatically swap the value of the image storage. This would be especially nice when I have to switch between HTTP and HTTPS and perhaps need to start making local requests rather than distributed requests (I'm not 100% clear on the rules surrounding secure content requests).
Do you see any performance issues doing this?
@John,
The only thing I've looked at, numerically, is the Network activity in Firebug on page requests that include SWF files. The browsers still has to make a requests to the server for the local SWF; this requests is typically ~100-150ms to receive the HTTP redirect header. So, that is super fast compared to the seconds/minutes to download the actual SWF.
That said, I am sure there is a tiny tiny overhead to running the regex on every requests. But, honest, I think this is so blazing fast that the overhead would not be noticeable. And, I could probably optimize the regex that I am using by adding a start-of-string boundary (^) to make the URLs fail faster.
@Ben,
Good stuff.
This is great stuff!
I'd love to see more articles like this regarding AWS. I've been toying with moving much of our production machines to EC2 and would love to use S3 as an on-the-fly image storage drive for CFChart, CFImage, etc... But I worry about performance issues?
I recently set up a Linux/Apache/MySQL/CF9 server on EC2 and it's working beautifully. It will be a great playground to test out the capabilities of EC2 and how CF can take advantage.
@PMascari,
How did you accomplish this setup?
@PMascari,
The good news is that if you are using CF9, I believe you can use S3 as if it were a local file system: "s3://" for any of the tags / functions that use file paths.
First, I must admit, it took longer than it should have because I'm pretty much a Linux Newbie. Playing around with Ubuntu Desktop was the extent of my Linux experience. You can do Windows EC2 but I went Linux with the aim of using the AWS Free Tier.
Having accomplished that, I've racked up all of 1 cent in costs (data transfer) so far!
First, I found a Ubuntu AMI to start with which is a base install of linux server. Found a lot of online articles on how to go about setting up and configuring Apache and MySQL on Linux and using an AWS EBS volume for data store. The CF install was fairly painless. Found a way to mount the EC2 as a drive on my system for updates and it's running great.
All the individual steps I took may be a bit much for this venue but I'd be happy to send you a more detailed listing a bit later...
@Ben,
Really?!?! Using AWS (or S3) is native to CF9?
@PMascari . . .
All the individual steps I took may be a bit much for this venue but I'd be happy to send you a more detailed listing a bit later...
please do!
@PMascari,
Yep. Same works easily in Railo.
Be warned, even though you can use it for cffile, etc do not assume it is a local file system. Things still much travel across the wire and S3 can be wonky w/ managing permissions. Other than that, it is a great feature!
@PMascari, @John,
Yes, good point - definitely not "local" in terms of speed :D I haven't had a chance to play with this feature myself - I've only read about it. But, it seems pretty awesome!
@Ben, since it appears that you're always redirecting, why not use 301 Moved Permanently ("[R=301,L]")? Then if the user hits Refresh/Reload, the browser won't even bother to hit your server again for the redirected URL. Or shouldn't, according to the 301 spec.
302 is common as all heck. Even cflocation uses it. But I don't know anyone who uses 301. I wonder if 301 has some nuisance behavior I don't know about. Of course, you don't want to do a 301 for a dynamic resource, such as a CFM, because you'd never get an opportunity to do anything different after the first redirect. But for a SWF or JPEG you want the user to get from S3, I can't imagine what the nuisance behavior would be.
Again, I'm not saying it's a good idea to use 301. But since the stated goal was to lighten the load on your server, it would seems to be something worth looking into.
@Lola,
OK, here are the steps I took to get CF running on AWS.
Create your AWS account, and set up a basic Linux instance. Make it a Micro instance to be on the AWS Free Tier. Here's a good walkthrough for Ubuntu:
http://goo.gl/3pvI2
Once you've got your SSH connection set and connect to your instance install Apache:
http://goo.gl/J9VK6
Install MySQL. It's a good idea to run this on an WS EBS volume so create one for say 10GB. Then follow this walk-through:
http://goo.gl/6X5tE
Now install CF9. Here's where my Linux newbie really shines. I wasn't sure how to download CF9 through the command line, so I installed Ubuntu desktop in order to get to Firefox and download CF.
http://goo.gl/3Jnn5
Once done, you'll want to kill the VNCServer under normal operating conditions to save RAM. You only get 600MB in a micro instance.
The CF install asks you where your Apache files are which took me a while being a Linux newbie but is otherwise painless.
Once all is done, you ned to configure your AWS security group to allow port 80 and you should have access to the CFAdmin your instance public DNS address.
Hope this helps.
@pmascari,
If I may, please do not put MySQL on your EC2 instances. Use Amazon RDS. It allows for scaling and you're ready to make your EC2 instances autoscale [using Amazon Auto-Scale] since they only maintain site code vs site+data. RDS also has a Multi-AZ feature allowing a master/slave setup [basically, redundancy].
Everything else is just fine though. I just wanted to point out RDS.
Ray did a good in making a "setup guide".
http://www.coldfusionjedi.com/index.cfm/2010/7/15/CF901-Guide-to-Amazon-S3-support-in-ColdFusion-901
I'm also looking for doing amazon s3 for our next project.
@Ben - some thoughts for site "roots", here's something I pulled from my system:
I have two buckets on S3 - one with gzipped assets and one without and I serve up a separate bucket name depending on the capabilities of the requesting browser. I also switch HTTP/HTTPS as needed.
Hope that gives you some ideas... every image in the system is prefixed with #CDNURL#. The beauty of this though, is locally, you can replace it with an empty string and use hard / references or some other prefix and use Apache aliases to get things to point to the right folder.
This is all leveraging my post here: http://www.ghidinelli.com/2009/09/02/deploying-assets-amazon-s3-ant
@John,
I am by no means an AWS expert so thank for your comments. I'm just now learning the tools...
I'm aware of RDS and plan on using it in the future. However, keep in mind my intentions thus far are to keep a small "playground" CF setup for myself on the free tier on EC2. I do not see a free tier of RDS. So, for now, keeping MySQL and my CF files on an attached EBS volume will do.
@WebManWalking,
Excellent point. It's never going to change - no need to mess around with "temporary" redirects.
@Brian,
That's a pretty good setup. I hadn't even thought about GZipped - to be honest, I don't know that much about compression at that level. But I like where you're going with that. Seems to be exactly on the right track.
@pmascari,
No worries. I'm learning on the go myself. :)
Ahh...yeah, keeping it free means no RDS. :)
@Ben - it's a little better than the right track, it's production code for 2 years. ;)
The trick to S3, if you don't step up to CloudFront, is that it's just a dumb content store. It won't do any of the things that we take for granted with Apache so you have to do it before you upload the files. Expires headers, compression, etc, all has to be managed by you as part of the upload process to get real benefit from it.
@Brian,
Ha ha, I didn't mean to imply that it was anything less than great - my hesitation was only a product of my own inexperience :)
S3 is really awesome. The thing that took me a while to really wrap my head around was the idea that there really are no "folders." We can use "keys" that have path-delimiters in them; but, these are merely a convention. We just have keys that *look* like they are made up a directory structure.
That was the biggest leap for me. But once I understood that, it helped me grapple with the difference between buckets and folders and the various limitations. The one thing that I think this makes quite nice is that it removes the file-count limitation that Windows file systems have. On a Windows machine, I am told that things simply break if you have a directory containing more than 32,000 files. On S3, however, since directories are meaningless conventions, I have to **assume** that there are no equivalent limitations.
NOTE: I believe there are limits to the number of files that be located at the root of a bucket.
Really cool, I gonna try this on a Amazon install.