Skip to main content
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Brian Crescimanno
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Brian Crescimanno

Experimenting With ByteBuffer In ColdFusion For Binary Manipulation

By
Published in Comments (3)

The other day, when I posted about out how to implement MySQL's Compress() and Uncompress() methods in ColdFusion, I was using Java's ByteArrayOutputStream to do a little byte-wise manipulation. But, after I posted that, I came across a few articles that warned against using ByteArrayOutputStream for performance reasons. Unless you don't know how much data you are working with, these articles recommended using ByteBuffer instead. I think I've only ever used ByteBuffer once or twice in my life; so, I wanted to do a quick little exploration of how I might use ByteBuffer in ColdFusion to accomplish some common binary operations.

The ByteBuffer is kind of an interesting Java class. It's almost easier to think about the ByteBuffer more like a "proxy" than a "buffer". It sort of sits "above" an underlying byte-array and provides convenience methods for reading-from and writing-to the underlying byte-array.

Java's ByteBuffer is a proxy to the underlying byte[] allocation.

Now, ColdFusion already has native functions for converting binary values into other representations like Hex, Base64, or other charset encodings. So, for this exploration of ByteBuffer, I really only wanted to look at how to do byte-wise manipulation; things like slicing, concatenating, and comparing binary values. Here are the kind of operations that I've had to use with binary values in the past:

  • Creating new binaries (usually as write buffers for other operations).
  • Concatenation.
  • Slicing.
  • Reversing.
  • Casting to and from Integers.
  • Comparing (equality check).

To play with this, I created a BinaryUtils.cfc ColdFusion component that exposes a method for each one of these operations. As you will see, each operation uses the Java ByteBuffer as the means of implementation.

component
	output = false
	hint = "I provide utility methods for working with binary values / byte-arrays."
	{

	/**
	* I initialize the utilities component.
	*
	* @output false
	*/
	public any function init() {

		// Since all the calls to ByteBuffer are static methods on the class, we can
		// create one shared instance of the ByteBuffer class and just reuse it.
		ByteBuffer = createObject( "java", "java.nio.ByteBuffer" );

		return( this );

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I concatenate the given binary values (in order), returning the resultant binary.
	* At least two arguments are required; but, you can concat more than two at a time.
	* All concatenation is done in-memory.
	*
	* @inputA I am the 1st binary value in the concatenation.
	* @inputB I am the 2nd binary value in the concatenation.
	* @output false
	*/
	public binary function binaryConcat(
		required binary inputA,
		required binary inputB
		) {

		// While the "arguments" objects acts as both an array and a map (of sorts),
		// let's keep the code a bit cleaner by converting the arguments object to a true
		// ColdFusion array.
		var collection = argsToArray( arguments );
		var collectionLength = arrayLen( collection );

		// When creating the ByteBuffer, we know that we want to allocate the underlying
		// byte array with a size that is equal to the SUM of the lengths of all the
		// incoming arguments (which is greater than or equal to 2 in count).
		var buffer = ByteBuffer.allocate( javaCast( "int", sumLengths( collection ) ) );

		for ( var i = 1 ; i <= collectionLength ; i++ ) {

			buffer.put( collection[ i ] );

		}

		return( buffer.array() );

	}


	/**
	* I determine if the two binary values are equal (ie, that they have the same bytes).
	*
	* @inputA I am the first binary operand.
	* @inputB I am the second binary operand.
	* @output false
	*/
	public boolean function binaryEquals(
		required binary inputA,
		required binary inputB
		) {

		var bufferA = ByteBuffer.wrap( inputA );
		var bufferB = ByteBuffer.wrap( inputB );

		return( bufferA.equals( bufferB ) );

	}


	/**
	* I return the binary value that represents the given integer.
	*
	* @input I am the integer representation.
	* @output false
	*/
	public binary function binaryFromInt( required numeric input ) {

		var buffer = ByteBuffer.allocate( javaCast( "int", 4 ) );

		buffer.putInt( javaCast( "int", input ) );

		return( buffer.array() );

	}


	/**
	* I create a new zero-filled binary value (byte array) of the given size.
	*
	* @size I am the size of the new binary value.
	* @output false
	*/
	public binary function binaryNew( required numeric size ) {

		// When we allocate a new ByteBuffer, it will automatically fill the underlying
		// byte array with zeros.
		var buffer = ByteBuffer.allocate( javaCast( "int", size ) );

		return( buffer.array() );

	}


	/**
	* I reverse the given binary value (byte array), returning a new binary value.
	*
	* @input I am the binary value being reversed.
	* @output false
	*/
	public binary function binaryReverse( required binary input ) {

		var length = arrayLen( input );
		var buffer = ByteBuffer.allocate( javaCast( "int", length ) );

		// As we loop backwards over the input, we'll push the individual bytes onto
		// the buffer in the forwards order (essentially reversing the bytes).
		for ( var i = length ; i >= 1 ; i-- ) {

			buffer.put( input[ i ] );

		}

		return( buffer.array() );

	}


	/**
	* I slice out a portion from the given binary value, returning a new binary value.
	*
	* CAUTION: We are using a ONE-BASED INDEX since ColdFusion uses one-based indexing
	* for all array concepts.
	*
	* @input I an the binary value (byte array) being sliced.
	* @index I am the ONE-BASED index at which to start slicing.
	* @length I am the number of bytes to slice.
	* @output false
	*/
	public binary function binarySlice(
		required binary input,
		required numeric index,
		required numeric length
		) {

		var buffer = ByteBuffer.allocate( javaCast( "int", length ) );

		// Translate index from ColdFusion context (1-based) to Java context (0-based).
		index--;

		buffer.put(
			input,
			javaCast( "int", index ),
			javaCast( "int", length )
		);

		return( buffer.array() );

	}


	/**
	* I return the integer represented by the given binary value.
	*
	* @input I am the binary representation of an integer.
	* @output false
	*/
	public numeric function binaryToInt( required binary input ) {

		var buffer = ByteBuffer.allocate( javaCast( "int", 4 ) );

		buffer.put(
			input,
			javaCast( "int", 0 ),
			javaCast( "int", 4 )
		);

		return( buffer.getInt( javaCast( "int", 0 ) ) );

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I take the given arguments object (ie, the collection of arguments passed to a
	* method from the calling context) and convert it to a true Array. This really only
	* serves to make the code a bit less magical.
	*
	* @args I am the arguments collection being converted to an array.
	* @output false
	*/
	private array function argsToArray( required any args ) {

		var argsArray = [];
		var length = arrayLen( args );

		for ( var i = 1 ; i <= length ; i++ ) {

			argsArray[ i ] = args[ i ];

		}

		return( argsArray );

	}


	/**
	* I sum the lengths of the given collection of binary values.
	*
	* @binaries I am the collection of binary values.
	* @output false
	*/
	private numeric function sumLengths( required array binaries ) {

		var totalLength = 0;
		var binaryCount = arrayLen( binaries );

		for ( var i = 1 ; i <= binaryCount ; i++ ) {

			totalLength += arrayLen( binaries[ i ] );

		}

		return( totalLength );

	}

}

Because the ByteBuffer is a "proxy" to an underlying byte-array, most of these functions end by calling .array() and returning the underlying byte-array after we are done manipulating it.

To test these functions, I'm going to take a few binary values, concatenate them, reverse them, slice them apart, reverse them again, and then compare the inputs to the outputs:

<cfscript>

	utils = new BinaryUtils();

	// Let's create 3 different binary objects, each based on an integer input (in
	// which each integer will be broken up into its 4-byte, 32-bit representation).
	one = utils.binaryFromInt( 101 );
	two = utils.binaryFromInt( 202 );
	three = utils.binaryFromInt( 303 );

	// Now, let's concatenate all the binaries together (resulting in a 12-byte array).
	all = utils.binaryConcat( one, two, three );

	// ... and let's reverse the bytes in the binary.
	allReversed = utils.binaryReverse( all );

	// Now, let's slice off 4-byte lengths that represent our original numbers.
	// Remember, since they are in reverse order, THREE comes first. But, each of the
	// 4-byte integer representations is also in reverse order (low-byte first at this
	// point). So, as we slice off 4-bytes at a time, we have to reverse the individual
	// bytes to get back to the original representation of the integer (high-byte first).
	threeSliced = utils.binaryReverse( utils.binarySlice( allReversed, 1, 4 ) );
	twoSliced = utils.binaryReverse( utils.binarySlice( allReversed, 5, 4 ) );
	oneSliced = utils.binaryReverse( utils.binarySlice( allReversed, 9, 4 ) );

	// Now, let's see if the inputs and outputs match.
	writeDump( "Do inputs / outputs match: " );

	// Assert that the binary values are equal.
	writeDump( utils.binaryEquals( one, oneSliced ) );
	writeDump( utils.binaryEquals( two, twoSliced ) );
	writeDump( utils.binaryEquals( three, threeSliced ) );

	// Assert that the actual numeric values are equal.
	writeDump( utils.binaryToInt( oneSliced ) == 101 );
	writeDump( utils.binaryToInt( twoSliced ) == 202 );
	writeDump( utils.binaryToInt( threeSliced ) == 303 );

</cfscript>

When we run this code, we get the following page output:

Do inputs / outputs match: YES YES YES YES YES YES

The ByteBuffer is pretty convenient. I don't have to do binary manipulation every day; but, when I do, this will make things quite a bit easier (no put intended). And, of course, it's just another reminder of how awesome it is to have ColdFusion built on top of Java - being able to reach down one layer and access this functionality is just the cat's pajamas.

Want to use code from this post? Check out the license.

Reader Comments

15,902 Comments

@Julian,

Hmmm, I will see if I can find them again. I had a few open in tabs, but closed them when I was done reading. Probably can find them in my history somewhere.

I think it makes sense, intuitively though, because a ByteBuffer only allocates a the array once. The beauty of a ByteArrayOutputStream is that you can keep writing to it as much as you want; then, when you are done, get the underlying byte array (which it probably has to generate internally ... but just my guess).

I'm actually curious about the performance of this vs. something like:

charsetDecode( repeatString( "a", 1000 ), "utf8" ) ==> byte[]

This is how I've historically created byte arrays / binary values in ColdFusion; would love to see if the performance difference is actually meaningful.

14 Comments

Thanks, Ben. I've just done some quick and dirty testing using Lucee comparing ByteArrayOutputStream vs ByteBuffer when converting a fairly large spreadsheet java object to a binary of around 8MB, and I'm not seeing much difference in performance, but perhaps with even larger amounts of data it would be more noticeable.

But what you say does make sense. If you know the size then its best to allocate the buffer at the outset.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel