Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion
Before I say anything, I should probably mention that as of ColdFusion 9.0.1, ColdFusion has had native file-support for Amazon S3 using the "s3://" protocol. That said, I wanted to try experimenting with the Amazon S3 REST API using ColdFusion's CFHttp functionality. I know that I'm like 5 years (at least) behind everyone else on this topic; so, this blog post won't add much to the conversation - really, this is just here for my own reference.
Amazon Simple Storage Service (S3) is a hugely scalable data storage system. But, it is not a file system; it is a key-value store. You can have it mimic a file system by using storage keys that look like file paths; many applications, including ColdFusion's native S3 integration, present S3 as a file hierarchy. But at the end of the day, that's just a user-friendly abstraction built on top of the resource key that identifies a stored object.
The "not a file system" nature of Amazon S3 has other implications, as well, such as consistency. In some regions (but not all), S3 provides "eventual consistency." I don't have a full grasp of how "eventual" eventual consistency is; but in the US Standard region, due to the cross-country latency, Amazon does not guarantee read-after-write access.
Right now, I don't know if this eventual consistency applies to every client of your application? Or, if it is just for cross-client consistency? Meaning, if I PUT an object into S3, can I (as the PUT executer) read that object from S3 immediately? I'll have to do some more reading on this.
Ok, enough with the background, let's do some experimenting. For this post, all I want to do is try to upload an object to Amazon Simple Storage Service (S3), read it out as a binary, and provide an authenticated, public URL to the object.
Uploading Objects To Amazon Simple Storage Service (S3)
Amazon S3 can store just about anything with only the loosest of size constraints. It simply stores bytes. Those bytes can represent text files; those bytes can also represent images. We're going to try uploading an image of the beautiful and talented Helena Bonham Carter.
All authenticated requests to the S3 REST API must include a signature - a Base64-encoded hash-based message authentication code. As of ColdFusion 10, generating Hmac values is wicked easy and can be done with the native hmac() function; but, since I am on ColdFusion 9, I'll use my Crypto.cfc Hmac component.
When posting the file to S3, we'll post its binary value as the Body of the post.
<!---
Creates a structure with the secretKey and accessID so that I
don't have to have them in the blog post.
--->
<cfinclude template="credentials.cfm" />
<!---
This is the file we are going to upload. We need to read in the
binary file since we aren't posting it like a form field - we're
posting it as the BODY of the PUT request.
--->
<cfset content = fileReadBinary( expandPath( "./helena.jpg" ) ) />
<!---
When uploading the file, we are going to save it at the
following "Key". NOTE: S3 is NOT A FILE SYSTEM. It's a key/value
store. While this resource address looks like a file path, it is
a single key.
--->
<cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
All requests to the S3 API have to be authenticated. Here, we are
going to create the "signature" to be used in the Authorization
header of the PUT request.
--->
<!---
A timestamp is required for all authenticated requests (NOTE: This
does not apply to query-string-authentication based requests).
--->
<cfset currentTime = getHttpTimeString( now() ) />
<!---
The content type is not required; but it will be stored as meta-
data with the object if supplied.
--->
<cfset contentType = "image/jpeg" />
<!---
Set up the part of the string to sign - we are not including any
X-AMZ headers in this.
--->
<cfset stringToSignParts = [
"PUT",
"",
contentType,
currentTime,
resource
] />
<!--- Collapse the parts into a newline-delimited list. --->
<cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
<!---
The target string is then signed to Hmac-Sha1 hashing, and
must be encoded as Base64. For this, I am using my Crypto.cfc
component.
NOTE: If you have ColdFusion 10, the hmac() function will now
do this with a single function call.
--->
<cfset signature = new Crypto().hmacSha1(
aws.secretKey,
stringToSign,
"base64"
) />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
Post the actual binary to the S3 bucket at the given resouce.
NOTE: Since we have not provided any ACL (Access Control List)
permissions, the resource will be stored as *private* by default.
--->
<cfhttp
result="put"
method="put"
url="https://s3.amazonaws.com#resource#">
<cfhttpparam
type="header"
name="Authorization"
value="AWS #aws.accessID#:#signature#"
/>
<cfhttpparam
type="header"
name="Content-Length"
value="#arrayLen( content )#"
/>
<cfhttpparam
type="header"
name="Content-Type"
value="#contentType#"
/>
<cfhttpparam
type="header"
name="Date"
value="#currentTime#"
/>
<cfhttpparam
type="body"
value="#content#"
/>
</cfhttp>
<!--- Dump out the Amazon S3 response. --->
<cfdump
var="#put#"
label="S3 Response"
/>
By default, the object is stored with private access settings. This means that only authenticated users can view the object using the resource URL. You can pass a lot of additional settings with the PUT command, including access control permissions; but, for this blog post, I'll keep it as simple as possible.
Reading Objects From Amazon Simple Storage Service (S3)
Now that we've uploaded our image, let's read it back out. Like the PUT command, the GET command also has to be authenticated with the Hmac signature.
<!---
Creates a structure with the secretKey and accessID so that I
don't have to have them in the blog post.
--->
<cfinclude template="credentials.cfm" />
<!--- This is the resource that we want to read as a binary. --->
<cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
All requests to the S3 API have to be authenticated. Here, we are
going to create the "signature" to be used in the Authorization
header of the GET request.
--->
<!---
A timestamp is required for all authenticated requests (NOTE: This
does not apply to query-string-authentication based requests).
--->
<cfset currentTime = getHttpTimeString( now() ) />
<!--- Set up the part of the string to sign. --->
<cfset stringToSignParts = [
"GET",
"",
"",
currentTime,
resource
] />
<!--- Collapse the parts into a newline-delimited list. --->
<cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
<!---
The target string is then signed to Hmac-Sha1 hashing, and
must be encoded as Base64. For this, I am using my Crypto.cfc
component.
NOTE: If you have ColdFusion 10, the hmac() function will now
do this with a single function call.
--->
<cfset signature = new Crypto().hmacSha1(
aws.secretKey,
stringToSign,
"base64"
) />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- Read the S3 resource AS A BINARY object. --->
<cfhttp
result="get"
method="get"
url="https://s3.amazonaws.com#resource#"
getasbinary="yes">
<cfhttpparam
type="header"
name="Authorization"
value="AWS #aws.accessID#:#signature#"
/>
<cfhttpparam
type="header"
name="Date"
value="#currentTime#"
/>
</cfhttp>
<!---
Reset the output buffer and then stream the content to the
screen as an image.
--->
<cfcontent
type="image/jpeg"
variable="#get.fileContent#"
/>
Notice that both the PUT and the GET actions required the current date to be set as part of the request headers. This date/time value needs to be within 15 minutes of Amazon S3 system time, or the request will be rejected. In addition to being current, the date/time value also has to be posted in a specific format. Luckily, ColdFusion's native getHttpTimeString() function makes this super easy as well.
Generating Pre-Signed Urls For Amazon Simple Storage Service (S3) Objects
Now that we've seen that we, as authenticated S3 users, can write-to and read-from the REST API, let's look at how to provide public URLs to our uploaded objects. Using "Query String Request Authentication," we can put our authentication signature directly into the request URL, removing the need of our end-users to provide the Authorization request header.
These generated URLs are time-sensitive. That is, we define an expiration date as part of the URL definition. Once the URLs has expired, Amazon S3 will start returning "Access Denied" responses. The expiration is defined as the number of seconds since Epoch. In our demo, we'll provide a URL that is valid for only 10 seconds.
<!---
Creates a structure with the secretKey and accessID so that I
don't have to have them in the blog post.
--->
<cfinclude template="credentials.cfm" />
<!---
This is the base resource that we want to provide a URL to. Since
the resource was stored with Private permissions, we'll need to
create a query-string-authentication URL that will grant people
access to the resource (for a limited amout of time).
--->
<cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
Using Query-String-Authentication is no different than any other
authenticated request in that a authentication signature still
needs to be provided. In this case, it will be part of the URL
that we are generating.
--->
<!---
The URL will only be valid for a certain amout of time. This
time will be determined by the number of SECONDS since Epoch.
For this demo, we'll make the URL available for 10 seconds.
--->
<cfset nowInSeconds = fix( now().getTime() / 1000 ) />
<!--- Add 10 seconds. --->
<cfset expirationInSeconds = ( nowInSeconds + 10 ) />
<!---
Prepare the parts of the signature - we are going to leave the
MD5 hash and the content type blank since the GET request won't
send those.
--->
<cfset stringToSignParts = [
"GET",
"",
"",
expirationInSeconds,
resource
] />
<!--- Collapse the parts into a newline-delimited list. --->
<cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />
<!---
The target string is then signed to Hmac-Sha1 hashing, and
must be encoded as Base64. For this, I am using my Crypto.cfc
component.
NOTE: If you have ColdFusion 10, the hmac() function will now
do this with a single function call.
--->
<cfset signature = new Crypto().hmacSha1(
aws.secretKey,
stringToSign,
"base64"
) />
<!---
Make sure the signature is properly encoded for use in the query
string of a GET request.
--->
<cfset urlEncodedSignature = urlEncodedFormat( signature ) />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<cfoutput>
<img src="https://s3.amazonaws.com#resource#?AWSAccessKeyId=#aws.accessID#&Expires=#expirationInSeconds#&Signature=#urlEncodedSignature#" />
</cfoutput>
NOTE: I am using the undocumented .getTime() method of the Jave Date object. You could be a bit more "proper" and use the dateDiff() function.
After we generate this URL, we can then use it to populate an IMG "src" attribute presented to our users. In this way, we can provide "secure" content to our users without making our S3 objects public.
This is only a taste of what the Amazon Simple Storage Service (S3) can do. There's a ton of stuff left to explore.
Want to use code from this post? Check out the license.
Reader Comments
Very cool Ben. Thanks for posting.
Thanks Ben. Would love to see more CF/S3 examples!
@Josh,
My pleasure. It was fun to learn more about this stuff.
@Chebby,
Will do - we're gonna be moving some stuff over to S3, so I am sure I'll be learning all sorts of interesting things / use cases along the way!
Hi Ben, THANKS! While not bleeding edge, it is new to me & I like learning new things every day!
I have coincidentally been beating my head against the S3 API for the last week or so. One big "gotcha" I had to work around was file names and paths containing spaces. Remember to URL Encode your request!
If you don't, the signature will be for the non-encoded value while the browser will auto-URL encode the returned presigned URL. This will result in a signature mismatch error being returned by S3.
@Richard,
Glad you like! Hopefully I'll have some more interesting stuff coming. This morning, I blogged a bit more about generating the pre-signed, query string authenticated URLs; but, then deemed that my exploration probably was not very fruitful (other than an increased understanding of the technology).
@Joe,
Oh, super interesting! I had only thought to url-encode the signature; but I think that's because the S3 docs actually have a special NOTE telling you to do so. It would have never occurred to me that url-encoding would be necessary for the file names when generating the signature. Dang! I don't have any idea how I would have even debugged that.
In the past, I know that debugging Hmac values is wicked super pain. I remember when I was dealing with the Twilio API (I think), I wasn't converting to Hex properly and the leading "0" would always be stripped off... so it failed like 20% of the time :D Talk about frustrating! Took me like a week going back and forth with their support before I figured out what the problem was.
Thanks for the tip!
" Meaning, if I PUT an object into S3, can I (as the PUT executer) read that object from S3 immediately? "
"It depends" : http://aws.amazon.com/s3/faqs/#What_data_consistency_model_does_Amazon_S3_employ "S3 buckets in the US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney) and South America (Sao Paulo) Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard Region provide eventual consistency."
@Tom,
Right, but I'm still not 100% sure I understand the implications of that. Meaning, let's say that I am on the East coast of the US and I PUT an object into S3. When I think about "eventual consistency," it makes me think that if someone on the West coast of the US then immediately tried to request it, it *may* not be available yet, due to the latency in distribution across data centers. But, does that also mean that if I (on the East coast) make a request to the Object I just uploaded, will it be available immediately.
Or maybe I'm just not understanding how the data is distributed across areas.
@Ben,
read-after-write consistency means immediate visibility of new data to all clients.
Amazon is less than perfectly transparent about some things. I dunno if it's just a docs issue, or that they keep updating the service and not the docs structure or what...
@Ben & Tom
In my cursory testing uploading to US Standard, I've been able to access the files immediately after upload. My uploader performs processing on the file on upload success. So it appears that the uploader can hit the file, but perhaps not clients hitting nodes in other regions.
It remains to be seen if that process survives QA. If we don't get consistency, I may have to roll my processing over to a polling type system that keeps an eye on my temp S3 storage location.
I assume the data is distributed as with any other kind of CDN. It gets placed onto a single node immediately and then propagates through the rest of the network.
@Joe,
Yeah, exactly - you can test how you like, but maybe you just keep getting lucky :-)
Tom
@Joe, @Tom,
I was a conference last week talking to John Mancuso, who is a "Solutions Architect at Amazon Web Services". He had mentioned to the eventual consistency to me at the time. BUT, he said that the latency was only on the order of 1 second. So, at the very least, if its eventually consistent, at least "eventual" is super fast.
In my particular scenario, that could be OK, because we don't really need to read directly after write. What we do need to do is:
* Upload.
* Create a pre-signed URL.
* Send that URL to the browser.
* Have the client use an IMG tag with that URL.
So, hopefully the 1s delay (if it happens at all), will be offset by the workflow and server-client communication and HTML rendering overhead.
Way out of date, but those on older versions of CF may find Barney Boisvert's Amazon S3 CFC useful:
http://www.barneyb.com/barneyblog/projects/amazon-s3-cfc/
or Joe Danziger's S3 REST CFC:
http://amazons3.riaforge.org/
@Ed,
Thanks for the links; I poked around in Barney's S3; and I've actually used Joe's in a previous project. But, I'd not really gotten my hands dirty with the knitty-gritty of how everything was put together.
Kudos Ben, I read this at just the right time. I have recently migrated two client CF sites to AWS (one Windows, one Ubuntu) using the CF 10 AMI that came out a few months ago. I need to convert the static assets to S3 next. Thanks for the code and the hmac() function. I'll take a look at your Crypto for my CF9 clients.
You are a great asset to the ColdFusion community. I've been coding out here in Seattle for years mostly under the radar admiring your blog from afar.
BTW I've been happy with CF on AWS so far and the pricing beats traditional hosting.
@Noah,
Excellent timing! And, funny you mention the hmac() stuff. I actually, just yesterday, posted a bit more about generating the signatures and the Content-MD5 hash in both ColdFusion 9 and ColdFusion 10:
www.bennadel.com/blog/2499-Generating-The-Content-MD5-Checksum-For-The-Amazon-S3-REST-API-Using-ColdFusion.htm
Small world :)
Thank you for the kinds words! I'm really glad my blog has been providing value. Hopefully many more years to come!
@All,
It looks like ColdFusion can be a bit too aggressive with how it encodes URL values when using urlEncodedFormat():
www.bennadel.com/blog/2656-url-encoding-amazon-s3-resource-keys-for-pre-signed-urls-in-coldfusion.htm
Great work here! From what you know, will this code still work after AWS discontinues support of SSLv3 for securing connections to S3 buckets and only supports TLS?
@Jwarzi,
According to this post, we're in good shape. It's more a question of chttp then anything else. CF11 might need a small tweak in JVM settings, CF10 and older are ok.
http://www.trunkful.com/index.cfm/2014/12/8/Preventing-SSLv3-Fallback-in-ColdFusion
Thanks Ben. I used this to help me to write a function that lists objects within a bucket. The native implementation in CF11 Update 5 is very very slow when a bucket contains more than 10+ results. Keep up the good work :).