Hashing Byte Arrays (Binary Data) With ColdFusion Before ColdFusion 10
Between the new hmac() function and the enhanced hash() function, ColdFusion 10 makes hashing byte arrays (ie. binary data) extremely easy. Last week, I looked as hashing an image using the hash() function. This was pretty cool, and it got me thinking about fun ways to use images. And, since ColdFusion 10 is not a full-release product yet, I wanted to take a quick look at hashing byte arrays with earlier versions of ColdFusion (pre-CF10).
NOTE: At the time of this writing, ColdFusion 10 was in public beta.
With a little research, I found a few ways to hash binary data using the underlying Java layer of ColdFusion. We could either use the MessageDigest class; or, we could use the DigestUtils class which encapsulates and simplifies the tasks commonly executed with MessageDigest. In the following code, I use the ColdFusion 10 version of hash() as my hashing control; then, I try to replicate the output using the aforementioned Java classes.
<cfscript>
// Read in the raw Binary data of the image. This is the byte
// array that we will be hashing in the following algorithms.
imageBinary = fileReadBinary( expandPath( "./gina_carano.jpg" ) );
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// Get the hash of the byte array (that IS the image) using the
// updated ColdFusion 10 hashing function.
imageHash = hash( imageBinary );
// Output the image "fingerprint".
writeOutput( "Fingerprint: " & imageHash );
// ------------------------------------------------------ //
// ------------------------------------------------------ //
writeOutput( "<br />" );
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// I hash a byte array using the given algorithm and return a
// 32-character Hexadecimal string. This fills in the hash()
// function for pre-CF10 installs.
//
// NOTE: Does not support CFMX_COMPAT - uses MD5 by default.
function hashBytes( bytes, algorithm = "MD5" ){
// Get our instance of the digest algorithm that we'll use
// to hash the byte array.
var messageDigest = createObject( "java", "java.security.MessageDigest" )
.getInstance( javaCast( "string", algorithm ) )
;
// Get the digest for the given byte array. This returns the
// digest in byte-array format.
var digest = messageDigest.digest( bytes );
// Now that we have our digested byte array (as another byte
// array), we have to convert that into a HEX string. For
// this, we'll need a HEX buffer.
var hexBuffer = [];
// Each integer in the byte digest needs to be converted into
// a HEX character (with possible leading zero).
for (var byte in digest){
// Get only the last 8-bits of the integer.
var tail = bitAnd( 255, byte );
// Get the hex-encoding of the byte.
var hex = ucase( formatBaseN( tail, 16 ) );
// In order to make sure that all of the HEX characters
// are two-digits, we have to prepend a zero for any
// value that was originall LTE to 16 - the largest value
// that won't result in two HEX characters.
arrayAppend(
hexBuffer,
(tail <= 16 ? ("0" & hex) : hex)
);
}
// Return the flattened character buffer.
return( arrayToList( hexBuffer, "" ) );
}
// Get the hash of the byte array using our hashBytes() function
// which dips down into the Java layer directly.
imageHash = hashBytes( imageBinary );
// Output the image "fingerprint".
writeOutput( "Fingerprint: " & imageHash );
// ------------------------------------------------------ //
// ------------------------------------------------------ //
writeOutput( "<br />" );
// ------------------------------------------------------ //
// ------------------------------------------------------ //
// Create an instance of our DigestUtils class - this class
// simplifies some of the operations we just saw in the
// MessageDigest class above, turning them into simple,
// one-line calls.
digestUtils = createObject(
"java",
"org.apache.commons.codec.digest.DigestUtils"
);
// Get the hash of the byte array using our hashBytes() function
// which dips down into the Java layer directly.
imageHash = ucase( digestUtils.md5Hex( imageBinary ) );
// Output the image "fingerprint".
writeOutput( "Fingerprint: " & imageHash );
</cfscript>
When we run the above code, we get the following three hash outputs:
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
All three approaches result in the same hash of the binary image data; but, as you can see, ColdFusion 10 really simplifies the hashing of binary data. Even with the DigestUtils class, ColdFusion 10 still makes hashing easier.
Want to use code from this post? Check out the license.
Reader Comments
@Ben:
I made this comment in your previous post, but in pre-CF10 you can also just call toString() on the binary data before calling hash() and you should get the same results.
That means pre-CF10 you can do:
imageHash = hash( toString(imageBinary) );
That said, the hashBytes() is nice work! Never know when you might want to customize the behavior more!
@Dan,
I think that holds true for String-based data; but, I'm not getting the same data parallel with image data. When I run the CF10 hash() on the image, I get:
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
... when I run this, however:
... I get the following output:
Fingerprint: A14B4CF4416C593B8A9875627D924820
A different value.
Hello Ben and all.
I tried
for SHA-512 hashing
I get that dreaded:
The sha512 method was not found.
Either there are no methods with the specified method name and argument types or the sha512 method is overloaded with argument types that ColdFusion cannot decipher reliably. ColdFusion found 0 methods that match the provided arguments. If this is a Java object and you verified that the method exists, use the javacast function to reduce ambiguity.
Now I know that sha512 does indeed exists as a method, because I saw it at
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html
But when I perform a
<cfdump var="#digestUtils#">, I only get
What happened to the rest? LOL! I guess I have to try something else.
I have a bytearray. I want to sha-512 hash it.
Please advise
Faith Sloan
Can anyone convert this algorithm to ColdFusion code? I am trying to add my code in this post but I am not able to on this blog. I get this error
oops! Let me try again.
The algorithm sounds simple enough. But trying to implement it is killing me.
xmlPost was created by my third party guys as such:
This XML payload string was converted to a UTF-8 byte array, which was then converted to a base-64 string. The resulting base-64 string is the value of my xmlPost below.
So I do this:
Here is what I get for my strXML variable:
The final value, hasByteArray2 is not even remotely similar to payload_hash
This is my first time doing this and my understanding of hashing, byte arrays and character conversions flew out of the window decades ago.
What am I doing wrong?
Thank you
Faith Sloan
From a colleague over at stackoverflow.com
Leigh you are WONDERFUL! Thank you!
====
**UPDATE:** Based on the updated code, some of the instructions may be a bit misleading. I believe they just mean decode the `xml` and `salt` strings from their given encoding (base64 and utf-8) into *byte arrays*, not strings:
// note: salt value has invalid characters for base64
// assuming it is a plain utf-8 string
saltArray = charsetDecode(salt, "utf-8");
xmlByteArray = binaryDecode(xmlPost, "base64");
Then merge the two binary arrays (see custom function)
mergedBytes = mergeArrays( xmlByteArray, saltArray );
Calculate the hash of the new byte array:
messageDigest = createObject( "java", "java.security.MessageDigest" );
messageDigest = messageDigest.getInstance( javaCast( "string", "SHA-512") );
hashedByteArray = messageDigest.digest( javacast("byte[]", mergedBytes) );
Merge the arrays again:
mergedBytes = mergeArrays( hashedByteArray, saltArray);
Finally convert the binary to base64 and compare:
calculatedPayload = binaryEncode( javacast("byte[]", mergedBytes), "base64");
// check results
arePayloadsEqual = compare(calculatedPayload, payload_hash) eq 0;
WriteDump("arePayloadsEqual="& arePayloadsEqual);
WriteDump("calculatedPayload="& calculatedPayload);
WriteDump("payload_hash="& payload_hash);
*Note*: `BinaryDecode/CharsetDecode` return java arrays. Unlike CF arrays, they are immutable (ie cannot be changed). So the [handy addAll(..) trick][2] will not work here.
// merge immutable arrays the long way
function mergeArrays( array1, array2 ){
var i = 0;
var newArray = [];
for (i = 1; i <= arrayLen(arguments.array1); i++) {
arrayAppend(newArray, arguments.array1[i]);
}
for (i = 1; i <= arrayLen(arguments.array2); i++) {
arrayAppend(newArray, arguments.array2[i]);
}
return newArray;
}
[1]: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html#sha512%28byte%5B%5D%29
[2]: http://www.aliaspooryorik.com/blog/index.cfm/e/posts.details/post/merging-two-arrays-267
Hey Ben,
First off, great article. This saved me considerable time in implementing a CF9 solution.
Secondly, I'd like to share a small correction that will save a few folks some time if they use this code (mostly only if they're needing to compare hashes).
The following line:
arrayAppend( hexBuffer, (tail <= 16 ? ("0" & hex) : hex) );
Should be changed to:
arrayAppend( hexBuffer, (tail < 16 ? ("0" & hex) : hex) );
The hex string 16, when converted to decimal is 10, so this returns 010, making the hash 33 characters long instead of 32. So you would only prefix a 0 if the tail value is less than 16 (not equal).
Hope that helps,
-Dain