Skip to main content
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Jeff McDowell and Jonathan Dowdle and Joel Hill and Josh Siok and Christian Ready and Steve 'Cutter' Blades and Matt Vickers
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Jeff McDowell Jonathan Dowdle Joel Hill Josh Siok Christian Ready Steve 'Cutter' Blades Matt Vickers

Hashing Byte Arrays (Binary Data) With ColdFusion Before ColdFusion 10

By
Published in Comments (7)

Between the new hmac() function and the enhanced hash() function, ColdFusion 10 makes hashing byte arrays (ie. binary data) extremely easy. Last week, I looked as hashing an image using the hash() function. This was pretty cool, and it got me thinking about fun ways to use images. And, since ColdFusion 10 is not a full-release product yet, I wanted to take a quick look at hashing byte arrays with earlier versions of ColdFusion (pre-CF10).

NOTE: At the time of this writing, ColdFusion 10 was in public beta.

With a little research, I found a few ways to hash binary data using the underlying Java layer of ColdFusion. We could either use the MessageDigest class; or, we could use the DigestUtils class which encapsulates and simplifies the tasks commonly executed with MessageDigest. In the following code, I use the ColdFusion 10 version of hash() as my hashing control; then, I try to replicate the output using the aforementioned Java classes.

<cfscript>

	// Read in the raw Binary data of the image. This is the byte
	// array that we will be hashing in the following algorithms.
	imageBinary = fileReadBinary( expandPath( "./gina_carano.jpg" ) );


	// ------------------------------------------------------ //
	// ------------------------------------------------------ //


	// Get the hash of the byte array (that IS the image) using the
	// updated ColdFusion 10 hashing function.
	imageHash = hash( imageBinary );

	// Output the image "fingerprint".
	writeOutput( "Fingerprint: " & imageHash );


	// ------------------------------------------------------ //
	// ------------------------------------------------------ //
	writeOutput( "<br />" );
	// ------------------------------------------------------ //
	// ------------------------------------------------------ //


	// I hash a byte array using the given algorithm and return a
	// 32-character Hexadecimal string. This fills in the hash()
	// function for pre-CF10 installs.
	//
	// NOTE: Does not support CFMX_COMPAT - uses MD5 by default.
	function hashBytes( bytes, algorithm = "MD5" ){

		// Get our instance of the digest algorithm that we'll use
		// to hash the byte array.
		var messageDigest = createObject( "java", "java.security.MessageDigest" )
			.getInstance( javaCast( "string", algorithm ) )
		;

		// Get the digest for the given byte array. This returns the
		// digest in byte-array format.
		var digest = messageDigest.digest( bytes );

		// Now that we have our digested byte array (as another byte
		// array), we have to convert that into a HEX string. For
		// this, we'll need a HEX buffer.
		var hexBuffer = [];

		// Each integer in the byte digest needs to be converted into
		// a HEX character (with possible leading zero).
		for (var byte in digest){

			// Get only the last 8-bits of the integer.
			var tail = bitAnd( 255, byte );

			// Get the hex-encoding of the byte.
			var hex = ucase( formatBaseN( tail, 16 ) );

			// In order to make sure that all of the HEX characters
			// are two-digits, we have to prepend a zero for any
			// value that was originall LTE to 16 - the largest value
			// that won't result in two HEX characters.
			arrayAppend(
				hexBuffer,
				(tail <= 16 ? ("0" & hex) : hex)
			);

		}

		// Return the flattened character buffer.
		return( arrayToList( hexBuffer, "" ) );

	}


	// Get the hash of the byte array using our hashBytes() function
	// which dips down into the Java layer directly.
	imageHash = hashBytes( imageBinary );

	// Output the image "fingerprint".
	writeOutput( "Fingerprint: " & imageHash );


	// ------------------------------------------------------ //
	// ------------------------------------------------------ //
	writeOutput( "<br />" );
	// ------------------------------------------------------ //
	// ------------------------------------------------------ //


	// Create an instance of our DigestUtils class - this class
	// simplifies some of the operations we just saw in the
	// MessageDigest class above, turning them into simple,
	// one-line calls.
	digestUtils = createObject(
		"java",
		"org.apache.commons.codec.digest.DigestUtils"
	);

	// Get the hash of the byte array using our hashBytes() function
	// which dips down into the Java layer directly.
	imageHash = ucase( digestUtils.md5Hex( imageBinary ) );

	// Output the image "fingerprint".
	writeOutput( "Fingerprint: " & imageHash );


</cfscript>

When we run the above code, we get the following three hash outputs:

Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427
Fingerprint: AF56CD049055F6D00C6DCFFD62C29427

All three approaches result in the same hash of the binary image data; but, as you can see, ColdFusion 10 really simplifies the hashing of binary data. Even with the DigestUtils class, ColdFusion 10 still makes hashing easier.

Want to use code from this post? Check out the license.

Reader Comments

198 Comments

@Ben:

I made this comment in your previous post, but in pre-CF10 you can also just call toString() on the binary data before calling hash() and you should get the same results.

That means pre-CF10 you can do:

imageHash = hash( toString(imageBinary) );

That said, the hashBytes() is nice work! Never know when you might want to customize the behavior more!

15,902 Comments

@Dan,

I think that holds true for String-based data; but, I'm not getting the same data parallel with image data. When I run the CF10 hash() on the image, I get:

Fingerprint: AF56CD049055F6D00C6DCFFD62C29427

... when I run this, however:

writeOutput( "Fingerprint: " & hash( toString( imageBinary ) ) );

... I get the following output:

Fingerprint: A14B4CF4416C593B8A9875627D924820

A different value.

5 Comments

Hello Ben and all.

I tried

ucase( digestUtils.sha512( imageBinary ) )

for SHA-512 hashing

I get that dreaded:

The sha512 method was not found.
Either there are no methods with the specified method name and argument types or the sha512 method is overloaded with argument types that ColdFusion cannot decipher reliably. ColdFusion found 0 methods that match the provided arguments. If this is a Java object and you verified that the method exists, use the javacast function to reduce ambiguity.

Now I know that sha512 does indeed exists as a method, because I saw it at
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html

But when I perform a
<cfdump var="#digestUtils#">, I only get

md5(byte[]) 	byte[]
md5(java.lang.String) 	byte[]
md5Hex(byte[]) 	java.lang.String
md5Hex(java.lang.String) 	java.lang.String
sha(java.lang.String) 	byte[]
sha(byte[]) 	byte[]
shaHex(java.lang.String) 	java.lang.String
shaHex(byte[]) 	java.lang.String

What happened to the rest? LOL! I guess I have to try something else.

I have a bytearray. I want to sha-512 hash it.

a. convert the salt characters to a UTF-8 byte array. DONE!

b. convert the XML payload characters to a UTF-8 byte array. DONE

c. create a new byte array consisting of the XML payload bytes from step b, appended with the salt bytes from step a. DONE

d. perform a SHA512 hash on the concatenated byte array from step c, which results in a hashed byte array. AAARRRGGGHHH! Need Help!

e. create a new byte array consisting of the hashed bytes from step d, appended with the salt bytes from step a. STUCK at d

f. convert the result of step e to a base64-encoded string and should be the value of query string parameter "h" payload hash.  STUCK at d

Please advise
Faith Sloan

5 Comments

Can anyone convert this algorithm to ColdFusion code? I am trying to add my code in this post but I am not able to on this blog. I get this error

5 Comments

oops! Let me try again.

The algorithm sounds simple enough. But trying to implement it is killing me.

1. compute the hash string value of the XMLPost string above:  
 a. convert the base64 salt string to a UTF-8 byte array.  
 b. convert the base64 XML payload string to a UTF-8 byte array.  
 c. create a new byte array consisting of the XML payload bytes from step b, appended with the salt bytes from step a.  
 d. perform a SHA512 hash on the concatenated byte array from step c, which results in a hashed byte array.  
 e. create a new byte array consisting of the hashed bytes from step d, appended with the salt bytes from step a.  
 f. convert the result of step e to a base64-encoded string and should be the value of query string parameter "h" payload hash.  

xmlPost was created by my third party guys as such:
This XML payload string was converted to a UTF-8 byte array, which was then converted to a base-64 string. The resulting base-64 string is the value of my xmlPost below.

So I do this:

	<cfset xmlPost = urlDecode("PD94bWwgdmVyc2lvbj0iMS4wIj8%2bPEVzdG9yZVNzb0N1c3RvbWVyIHhtbG5zOnhzaT0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEtaW5zdGFuY2UiIHhtbG5zOnhzZD0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEiPjxDdXN0b21lcklkPjExMjk0MDwvQ3VzdG9tZXJJZD48RGVhbGVyQ29kZT5OODg4ODg8L0RlYWxlckNvZGU%2bPFBvaW50QmFsYW5jZT4yODA8L1BvaW50QmFsYW5jZT48Rmlyc3ROYW1lPkZhaXRoPC9GaXJzdE5hbWU%2bPExhc3ROYW1lPkh1dHVsYTwvTGFzdE5hbWU%2bPC9Fc3RvcmVTc29DdXN0b21lcj4%3d") />
	<cfset salt = "3dfjh674!MujErf98344@090" />
	<cfset payload_hash = urlDecode("EtLDRJfcRESFKpY4OGZZnRSN2THqT%2bEelzOuXVU06jotd2kE4yKnlYay7BqyAdcUSATRgSMaHxZa6uBqKKd9rjNkZmpoNjc0IU11akVyZjk4MzQ0QDA5MA%3d%3d") />
 
	<cfset strXML = ToString( ToBinary( xmlpost ) ) /> <!--- to get actual XML --->
 
	<!--- base64 encoding returns a byte array --->
	<cfset saltByteArray = toBase64( salt, "utf-8" )  />
	<cfset xmlpostByteArray = toBase64( xmlPost, "utf-8" ) />
	<!--- append salt to xmlpost --->
	<cfset xmlpostsaltByteArray = xmlpostByteArray & saltByteArray />
 
	<!--- now let us perform a sha512 hash on this concatenated byte array --->
	<cfscript>
	// Create an instance of our DigestUtils class
	digestUtils = createObject("java","org.apache.commons.codec.digest.DigestUtils");
	// I hash a byte array using the given algorithm and return a
	// 32-character Hexadecimal string. Home-made hash function for CF9 and earlier
	function hashBytes( bytes, algorithm = "SHA-512" ){
		// Get our instance of the digest algorithm that we'll use
		// to hash the byte array.
		var messageDigest = createObject( "java", "java.security.MessageDigest" ).getInstance( javaCast( "string", algorithm ) );
	 
		// Get the digest for the given byte array. This returns the
		// digest (i.e., hash) in byte-array format.
		var digest = messageDigest.digest( bytes );
 
		// Now that we have our digested byte array (i.e., our hash as another byte
		// array), we have to convert that into a HEX string. So, we'll need a HEX buffer.
		var hexBuffer = [];
 
		// Each integer in the byte digest needs to be converted into
		// a HEX character (with possible leading zero).
		for (byte =1 ;byte LTE ArrayLen(digest);byte = byte + 1) {
		//for ( var byte in digest){
			// Get the hex value for this byte. When converting the
			// byte, only use the right-most 8 bits (last 8 bits of the integer)
			// otherwise the sign of the byte can create oddities
 
			var tail = bitAnd( 255, byte );
			 
			// Get the hex-encoding of the byte.
			var hex = ucase( formatBaseN( tail, 16 ) );
		 
			// In order to make sure that all of the HEX characters
			// are two-digits, we have to prepend a zero for any
			// value that was originally LTE to 16 (the largest value
			// that won't result in two HEX characters).
			arrayAppend( hexBuffer, (tail <= 16 ? ("0" & hex) : hex) );
		}
 
		// Return the flattened character buffer.
		return( arrayToList( hexBuffer, "" ) );
	}
 
	// Get the hash of the byte array using our hashBytes() function
	hashByteArray = hashBytes( xmlpostsaltByteArray );
	</cfscript>
 
 
	<!--- The hashByteArray is in HEX format now. Convert to binary --->
	<!--- You must binary decode the hashed string before converting it to binary --->
	<cfset hashByteArray = toBase64( BinaryDecode( hashByteArray, 'HEX' ) ) />
 
	<!--- The final step is to append this new hashbytearray with the salt byte array --->
 
	<cfset hashByteArray = hashByteArray & saltByteArray />
 
	<!--- now convert this value to a base64 encoded string --->
 
	<cfset hashByteArray2 = toBase64( hashByteArray )/>

Here is what I get for my strXML variable:

Actual xml structure converted from base 64 to string:  
&lt;?xml version="1.0"?&gt;&lt;EstoreSsoCustomer xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance](http://www.w3.org/2001/XMLSchema-instance)" xmlns:xsd="[http://www.w3.org/2001/XMLSchema](http://www.w3.org/2001/XMLSchema)"&gt;&lt;CustomerId&gt;112940&lt;/CustomerId&gt;&lt;DealerCode&gt;N88888&lt;/DealerCode&gt;&lt;PointBalance&gt;280&lt;/PointBalance&gt;&lt;FirstName&gt;Faith&lt;/FirstName&gt;&lt;LastName&gt;Hutula&lt;/LastName&gt;&lt;/EstoreSsoCustomer&gt;  

The final value, hasByteArray2 is not even remotely similar to payload_hash

This is my first time doing this and my understanding of hashing, byte arrays and character conversions flew out of the window decades ago.

What am I doing wrong?

Thank you
Faith Sloan

5 Comments

From a colleague over at stackoverflow.com
Leigh you are WONDERFUL! Thank you!
====
**UPDATE:** Based on the updated code, some of the instructions may be a bit misleading. I believe they just mean decode the `xml` and `salt` strings from their given encoding (base64 and utf-8) into *byte arrays*, not strings:

// note: salt value has invalid characters for base64
// assuming it is a plain utf-8 string
saltArray = charsetDecode(salt, "utf-8");
xmlByteArray = binaryDecode(xmlPost, "base64");

Then merge the two binary arrays (see custom function)

mergedBytes = mergeArrays( xmlByteArray, saltArray );

Calculate the hash of the new byte array:

messageDigest = createObject( "java", "java.security.MessageDigest" );
messageDigest = messageDigest.getInstance( javaCast( "string", "SHA-512") );
hashedByteArray = messageDigest.digest( javacast("byte[]", mergedBytes) );

Merge the arrays again:

mergedBytes = mergeArrays( hashedByteArray, saltArray);

Finally convert the binary to base64 and compare:

calculatedPayload = binaryEncode( javacast("byte[]", mergedBytes), "base64");

// check results
arePayloadsEqual = compare(calculatedPayload, payload_hash) eq 0;
WriteDump("arePayloadsEqual="& arePayloadsEqual);
WriteDump("calculatedPayload="& calculatedPayload);
WriteDump("payload_hash="& payload_hash);

*Note*: `BinaryDecode/CharsetDecode` return java arrays. Unlike CF arrays, they are immutable (ie cannot be changed). So the [handy addAll(..) trick][2] will not work here.

// merge immutable arrays the long way
function mergeArrays( array1, array2 ){
var i = 0;
var newArray = [];
for (i = 1; i <= arrayLen(arguments.array1); i++) {
arrayAppend(newArray, arguments.array1[i]);
}
for (i = 1; i <= arrayLen(arguments.array2); i++) {
arrayAppend(newArray, arguments.array2[i]);
}
return newArray;
}

[1]: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html#sha512%28byte%5B%5D%29
[2]: http://www.aliaspooryorik.com/blog/index.cfm/e/posts.details/post/merging-two-arrays-267

1 Comments

Hey Ben,

First off, great article. This saved me considerable time in implementing a CF9 solution.

Secondly, I'd like to share a small correction that will save a few folks some time if they use this code (mostly only if they're needing to compare hashes).

The following line:

arrayAppend( hexBuffer, (tail <= 16 ? ("0" & hex) : hex) );

Should be changed to:

arrayAppend( hexBuffer, (tail < 16 ? ("0" & hex) : hex) );

The hex string 16, when converted to decimal is 10, so this returns 010, making the hash 33 characters long instead of 32. So you would only prefix a 0 if the tail value is less than 16 (not equal).

Hope that helps,

-Dain

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel