Transforming Binary Data At The Bit Level Using ColdFusion And BitBuffer.cfc
For the last two weeks or so, I've been digging into bit-wise operations in ColdFusion. This all started out as an exploration of Base32-encoding, that I needed for a larger project. As I was looking into the Base32-encoding, it occurred to me that a great deal of the complexity in the encoding/decoding process dealt with transforming one set of Bits into another set of Bits. If that complexity could be removed, then Base32-encoding would actually be rather easy. So, I tried to create a ColdFusion component - BitBuffer.cfc - that would hide all the complex logic behind one simple bit transformation function.
View this project on my GitHub account.
You can think of the transformation function like a .map() function in other functional languages. I iterate over the underlying bit collection and pass each input chunk to your callback (ie, transformation operator). You then return the mapped value, which is used to compose a new resultant binary value:
transformBits( inputSize, outputSize, callback )
Since the transformation may need to exclude or include differently-sized bit-chunks, the transformBits() method requires an explicit bit-count for both the input and the output (mapped value). The BitBuffer then uses this to make sure only the correct number of bits is extracted from the result. And, if your callback/operator doesn't return any value, this means that the input chunk is completely excluded from the underlying transformation.
To see this in action, I've re-implemented the Base32 encoding/decoding example using the BitBuffer.cfc. And, I think you'll see that the logic is significantly more straightforward.
<cfscript>
// I Base32 encode the given string.
public string function toBase32( required string input ) {
var base32Bytes = javaCast( "string", "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" ).getBytes();
var buffer = new lib.BitBuffer( charsetDecode( input, "utf-8" ) );
// When converting to Base32, each 5-bits of the input is used to create
// an 8-bit value the indicates the index of the Base32-character.
buffer.transformBits(
5,
8,
function( required numeric encodingIndex ) {
return( base32Bytes[ encodingIndex + 1 ] );
}
);
var encoded = charsetEncode( buffer.toPaddedByteArray(), "utf-8" );
// The encoded value has to be divisible by 8; if it is not, then we have
// to pad the value with "=".
if ( len( encoded ) % 8 ) {
encoded &= repeatString( "=", ( 8 - ( len( encoded ) % 8 ) ) );
}
return( encoded );
}
// I decode the given Base32-encoded string.
public string function fromBase32( required string input ) {
var base32Bytes = javaCast( "string", "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" ).getBytes();
var buffer = new lib.BitBuffer( charsetDecode( input, "utf-8" ) );
// When converting from Base32, each 8 bits of the input is used to rebuild
// the 5-bits of the original input that were encoded.
buffer.transformBits(
8,
5,
function( required numeric encodedByte ) {
for ( var i = 1 ; i <= arrayLen( base32Bytes ) ; i++ ) {
if ( base32Bytes[ i ] == encodedByte ) {
return( i - 1 );
}
}
}
);
return( charsetEncode( buffer.toByteArray(), "utf-8" ) );
}
</cfscript>
<!--- Reset the output buffer. --->
<cfcontent type="text/html; charset=utf-8" />
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>
Implementing Base32 Encoding With BitBuffer.cfc
</title>
</head>
<body>
<cfoutput>
<h1>
Implementing Base32 Encoding With BitBuffer.cfc
</h1>
<!--- Set up our known Base32-encoded values. --->
<cfset tests = {
"C" = "IM======",
"Co" = "INXQ====",
"Com" = "INXW2===",
"Come" = "INXW2ZI=",
"Come " = "INXW2ZJA",
"Come w" = "INXW2ZJAO4======",
"Come wi" = "INXW2ZJAO5UQ====",
"Come wit" = "INXW2ZJAO5UXI===",
"Come with" = "INXW2ZJAO5UXI2A=",
"Come with " = "INXW2ZJAO5UXI2BA",
"Come with m" = "INXW2ZJAO5UXI2BANU======",
"Come with me" = "INXW2ZJAO5UXI2BANVSQ====",
"Come with me " = "INXW2ZJAO5UXI2BANVSSA===",
"Come with me i" = "INXW2ZJAO5UXI2BANVSSA2I=",
"Come with me if" = "INXW2ZJAO5UXI2BANVSSA2LG",
"Come with me if " = "INXW2ZJAO5UXI2BANVSSA2LGEA======",
"Come with me if y" = "INXW2ZJAO5UXI2BANVSSA2LGEB4Q====",
"Come with me if yo" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W6===",
"Come with me if you" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65I=",
"Come with me if you " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JA",
"Come with me if you w" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO4======",
"Come with me if you wa" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QQ====",
"Come with me if you wan" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW4===",
"Come with me if you want" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45A=",
"Come with me if you want " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BA",
"Come with me if you want t" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAOQ======",
"Come with me if you want to" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXQ====",
"Come with me if you want to " = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA===",
"Come with me if you want to l" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3A=",
"Come with me if you want to li" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJ",
"Come with me if you want to liv" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOY======",
"Come with me if you want to live" = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOZSQ====",
"Come with me if you want to live." = "INXW2ZJAO5UXI2BANVSSA2LGEB4W65JAO5QW45BAORXSA3DJOZSS4===",
"#chr( 224 )##chr( 225 )##chr( 226 )##chr( 227 )##chr( 228 )##chr( 229 )##chr( 230 )#" = "YOQMHIODULB2HQ5EYOS4HJQ="
} />
<cfset testInputs = structKeyArray( tests ) />
<cfset arraySort( testInputs, "text", "asc" ) />
<cfloop index="input" array="#testInputs#">
<!--- Encode the value. --->
<cfset encodedInput = toBase32( input ) />
<!--- Decode the encoded-value. --->
<cfset decodedOutput = fromBase32( encodedInput ) />
<!--- Check to see if the process worked in both directions. --->
<cfset encodingPassed = ( encodedInput eq tests[ input ] ) />
<cfset decodingPassed = ( input eq decodedOutput ) />
<!--- Output test results for this test. --->
#( encodingPassed ? "[PASS]" : "[FAIL]" )#
#( decodingPassed ? "[PASS]" : "[FAIL]" )#
#input# » #encodedInput# » #decodedOutput#
<br />
</cfloop>
</cfoutput>
</body>
</html>
NOTE: This demo uses ColdFusion 10 and Closures, which is why my callback / operator function is able to reference the base32Bytes variable even after it is passed out of scope.
As you can see, with the BitBuffer.cfc, the toBase32() and fromBase32() functions are fairly compact. Once the bit-based transformation is encapsulated, the rest of the logic is just character-selection.
My other Base32-encoding demo is still broken when it comes to high-ASCII values (ie, values over 127). But, this BitBuffer.cfc will fix that since it takes care of converting bits to signed Java Bytes. I just have to go back and update it, create a GitHub repository, and add tests.
Want to use code from this post? Check out the license.
Reader Comments
Hello Ben.
My question is a little off-topic but I don't know where to post it.
Please Light me up how to use your bitbuffer component to convert an array of bytes to float.
I tried without bitbuffer like this:
arr=[23,217,10,65];
ByteBuffer = createObject("java","java.nio.ByteBuffer");
myFloat= ByteBuffer.wrap(arr).getFloat();
(result should be 8.678)
But because in my array is a number bigger than 127, I've got the exception:
Event handler exception.. coldfusion.runtime.java.MethodExecutionException: An exception occurred when executing method wrap.
I am convinced that is same "...cannot fit inside a byte" problem because if I change 217 with a number lower that 128, everything works.
I read all your articles with Creating Signed Java Byte Values and Transforming Binary Data At The Bit Level and sincerely makes me dizzy (probably I am a little tired :) )
I don't work with bytes every day and I think is not such a big problem creating java byte values but I can't figured out. And this task is for me... one time task (hope so).
Help Please
Thank you in advance
I used a collateral solution.
I've created a .NET assembly with the desired function, imported with cfObject and everything works. No more signed or unsigned bytes.
Headache off.
Thanks Again for your Articles.
I did not post very often, but I pay visits every day
:)