Creating A ColdFusion-Oriented HashCode With Loose Types
In the companion app for my feature flags book, each user's persisted configuration file has a version
number. I want this version
number to be incremented when the user updates their feature flags configuration; but, only if something within the configuration has actually changed. To this end, I need a way in ColdFusion to take two data structures and check to see if they are deep-equals. For that, I created the concept of a "FusionCode": a consistent, repeatable value that represents a complex data structure.
In Java, this is what the various "hash code" methods are for. Each Java object has a o.hashCode()
method that returns an int
. And then, there are various utility methods that compose this value for complex objects (ex, Objects.hashCode(o)
and Arrays.hashCode(o)
).
But, the ColdFusion loose type system doesn't play well with Java's hashCode()
calculations. Or rather, I should say that it does play well in 95% of cases; and then, breaks in subtle ways in 5% of cases.
For example, the following values have two different HashCodes in Java:
javaCast( "int", 3 )
javaCast( "long", 3 )
As they should (I guess)—they are two different data-types. But, in ColdFusion, there's no semantic difference between these two values. So, from the ColdFusion perspective, these two values should have the same FusionCode.
Of course, it's not that often that we need to explicitly create different Java types in our ColdFusion code. But, sometimes, this happens implicitly in very subtle ways. Consider the following ColdFusion structure that goes through the JSON serialization workflow:
<cfscript>
data = { value: 12.0 };
dataPrime = deserializeJson( serializeJson( data ) );
</cfscript>
In this code, data
and dataPrime
are structurally equivalent. But, they have different .hashCode()
values. And, passing them to Objects.deepEquals(data, dataPrime)
results in false
. This is true for both Adobe ColdFusion and Lucee CFML.
However, if you convert that top-level reference from a Struct to an Array:
<cfscript>
data = [ 12.0 ];
dataPrime = deserializeJson( serializeJson( data ) );
</cfscript>
Then Objects.deepEquals(data, dataPrime)
returns true
in Lucee CFML and returns false
in Adobe ColdFusion. Why? I have no idea.
The problem is, the whole HashCode calculation is a bit of black box built on top of a very strict type system. So, when it comes to ColdFusion, I needed to make something that was more clear in its intent; and, which played a little nicer with ColdFusion's loose type system.
I created a ColdFusion component, FusionCode.cfc
which provides two methods:
getFusionCode( value )
deepEquals( valueA, valueB )
The former method generates a CRC-32 checksum of the given value by recursively walking over the given data structure and normalizing the values—using ColdFusion semantics—as it traverses. The latter method just calls getFusionCode()
for each argument and then compares the two results.
Here's a demo in which the fusionCode.deepEquals()
is true
but the objects.deepEquals()
is false—in both Lucee CFML and Adobe ColdFusion:
<cfscript>
data = [
version: 12.0, // <--- Messes up the .hashCode() approach in Lucee and ACF.
users: [
[ id: 1, name: "Jo" ],
[ id: 2, name: "Kit" ],
[ id: 3, name: "Sam" ]
],
legacyEnabled: false,
legacyCode: javaCast( "null", "" ) // <--- Messes up the .hashCode() ACF.
];
// Run data through the JSON serialization.
dataPrime = deserializeJson( serializeJson( data ) );
writeDump( data );
writeDump( dataPrime );
// -- Fusion Code Test -- //
writeDump({
"FusionCode deepEquals":
new FusionCode().deepEquals( data, dataPrime )
});
// -- Hash Code Test -- //
writeDump({
"Objects deepEquals":
createObject( "java", "java.util.Objects" ).deepEquals( data, dataPrime )
});
</cfscript>
Again, we're taking a complex structure, data
, and running it through the JSON serialization workflow to produce, dataPrime
. Then we're comparing these two values using the FusionCode and HashCode concepts. And when we run this in Lucee CFML and Adobe ColdFusion, we get the following (both results are in the same screenshot):
As you can see, Objects.deepEquals()
sees data
and dataPrime
as different structures in both Adobe ColdFusion and Lucee CFML. But, my fusionCode.deepEquals()
sees them as the same.
Here's my FusionCode.cfc
implementation. At the root of the internal algorithm is a visitValue()
method. This method inspects the given argument, uses ColdFusion's decision functions to determine which data type it is, and then defers to another visit*
function that is geared towards said data type. As it performs these visitations recursively, it passed-through a CRC-32 checksum instance to which it is adding normalized values.
component
output = false
hint = "I provide methods for generating a consistent, repeatable token for a given ColdFusion data structure (akin to Java's hashCode, but with ColdFusion looseness)."
{
/**
* I initialize the component.
*/
public void function init() {
variables.BigDecimal = createObject( "java", "java.math.BigDecimal" );
}
// ---
// PUBLIC METHODS.
// ---
/**
* I determine if the two values are equal based on their generated FusionCodes.
*/
public boolean function deepEquals(
any valueA,
any valueB
) {
return ( getFusionCode( arguments?.valueA ) == getFusionCode( arguments?.valueB ) );
}
/**
* I calculate the FusionCode for the given value.
*
* The FusionCode algorithm creates a CRC-32 checksum and then traverses the given data
* structure and adds each visited value to the checksum calculation. Since ColdFusion
* is loosely typed / dynamically typed language, the FusionCode algorithm performs
* some ColdFusion-oriented type casting to allow slightly different value types to be
* considered the "same" value (in the same way that a ColdFusion equality check will).
* For example, "int" and "long" values are both recorded as "long". And, the string
* "3" and the number 3 are both recorded as longs. This is where the FusionCode and
* Java's HashCode algorithm significantly diverge.
*/
public numeric function getFusionCode( any value ) {
var checksum = createObject( "java", "java.util.zip.CRC32" ).init();
visitValue( checksum, arguments?.value );
return checksum.getValue();
}
// ---
// PRIVATE METHODS.
// ---
/**
* I add the given Boolean value to the checksum.
*/
private void function putBoolean(
required any checksum,
required boolean value
) {
putString( checksum, ( value ? "[______TRUE______]" : "[______FALSE______]" ) );
}
/**
* I add the given date value to the checksum.
*/
private void function putDate(
required any checksum,
required date value
) {
putString( checksum, dateTimeFormat( value, "iso" ) );
}
/**
* I add the given number value to the checksum.
*/
private void function putNumber(
required any checksum,
required numeric value
) {
putString(
checksum,
BigDecimal
.valueOf( javaCast( "double", value ) )
.toString()
);
}
/**
* I add the given string value to the checksum.
*/
private void function putString(
required any checksum,
required string value
) {
checksum.update( charsetDecode( value, "utf-8" ) );
}
/**
* I visit the given array value, recursively visiting each element.
*/
private void function visitArray(
required any checksum,
required array value
) {
var length = arrayLen( value );
for ( var i = 1 ; i <= length ; i++ ) {
putNumber( checksum, i );
if ( arrayIsDefined( value, i ) ) {
visitValue( checksum, value[ i ] );
} else {
visitValue( checksum /* , NULL */ );
}
}
}
/**
* I visit the given binary value.
*/
private void function visitBinary(
required any checksum,
required binary value
) {
checksum.update( value );
}
/**
* I visit the given Java value.
*/
private void function visitJava(
required any checksum,
required any value
) {
putNumber( checksum, value.hashCode() );
}
/**
* I visit the given null value.
*/
private void function visitNull( required any checksum ) {
putString( checksum, "[______NULL______]" );
}
/**
* I visit the given query value, recursively visiting each row.
*/
private void function visitQuery(
required any checksum,
required query value
) {
var columnNames = ucase( value.columnList )
.listToArray()
.sort( "textnocase" )
.toList( "," )
;
putString( checksum, columnNames );
for ( var i = 1 ; i <= value.recordCount ; i++ ) {
putNumber( checksum, i );
visitStruct( checksum, queryGetRow( value, i ) );
}
}
/**
* I visit the given simple value.
*/
private void function visitSimpleValue(
required any checksum,
required any value
) {
if ( isNumeric( value ) ) {
putNumber( checksum, value );
} else if ( isDate( value ) ) {
putDate( checksum, value );
} else if ( isBoolean( value ) ) {
putBoolean( checksum, value );
} else {
putString( checksum, value );
}
}
/**
* I visit the given struct value, recursively visiting each entry.
*/
private void function visitStruct(
required any checksum,
required struct value
) {
var keys = structKeyArray( value )
.sort( "textnocase" )
;
for ( var key in keys ) {
putString( checksum, ucase( key ) );
if ( structKeyExists( value, key ) ) {
visitValue( checksum, value[ key ] );
} else {
visitValue( checksum /* , NULL */ );
}
}
}
/**
* I visit the given xml value.
*/
private void function visitXml(
required any checksum,
required xml value
) {
putString( checksum, toString( value ) );
}
/**
* I visit the given generic value, routing to a more specific visit method.
*
* Note: This method doesn't check for things that wouldn't otherwise be in data
* structure. For example, I'm not checking for things like Closures or CFC instances.
*/
private void function visitValue(
required any checksum,
any value
) {
if ( isNull( value ) ) {
visitNull( checksum );
} else if ( isArray( value ) ) {
visitArray( checksum, value );
} else if ( isStruct( value ) ) {
visitStruct( checksum, value );
} else if ( isQuery( value ) ) {
visitQuery( checksum, value );
} else if ( isXmlDoc( value ) ) {
visitXml( checksum, value );
} else if ( isBinary( value ) ) {
visitBinary( checksum, value );
} else if ( isSimpleValue( value ) ) {
visitSimpleValue( checksum, value );
} else {
visitJava( checksum, value );
}
}
}
This FusionCode.cfc
implementation bakes in some assumptions that may or may not be good. For example, it calls ucase()
on Struct keys and Query column names. But, it doesn't call ucase()
on other string values despite the fact that "hello"
and "HELLO"
are equivalent in ColdFusion. I think one improvement would be to turn some of these assumptions into settings that can be turned on and off.
For now, though, this should unblock some of my work in the feature flags book companion app. It should be sufficient for determining whether or not a given sub-structure has changed. And, the looseness of the type checking should work well with my ConfigValidation.cfc
component, which will type-cast inputs to the necessary type during the request processing.
Update 2024-07-05
I didn't realize this, but my use of javaCast("long")
was truncating decimal values. I get a little fuzzy with the lower-level numeric data types in Java since all of ColdFusion's numbers are just "numeric" (with some edge-cases in which you run into int
overflow errors). I've updated my putNumeric()
to use BigDecimal.valueOf(double)
instead of BigInteger.valueOf(long)
.
The BigDecimal
documentations states that numbers of different "scale" will have different hashCode values. But, I think my use of javaCast("double")
is normalizing the scale. I think.
Update 2024-07-06
I misunderstood what the checksum.update(int)
was doing. I thought it was consuming the entire integer in the checksum mutation; but, it seems that it was only taking the lowest byte (8-bits):
update(int b)
: Updates the CRC-32 checksum with the specified byte (the low eight bits of the argumentb
).
As such, two different integers could collide and create a false equivalence if they had the same lowest byte.
I've updated the putNumber()
method to turn around and call the putString()
method using the canonical string produced by BigDecimal
:
component {
private void function putNumber(
required any checksum,
required numeric value
) {
putString(
checksum,
BigDecimal
.valueOf( javaCast( "double", value ) )
.toString()
);
}
}
This combination of the javaCast()
to a double
, and then piping it through the .toString()
on BigDecimal
seems to give a good result for numbers that are the same, but have different "scales" (ex, 12
and 12.0
). According to the JavaDocs:
The
toString()
method provides a canonical representation of aBigDecimal
.
Update 2024-07-07
I authored a follow-up blog post in which the behavior of the FusionCode.cfc
can be configured. Specifically, there are two settings which can be set in the init()
method; or, passed-in with each .getFusionCode()
call:
caseSensitiveKeys
- I determine if struct keys and column names are canonicalized usingucase()
. When enabled,key
andKEY
will be considered different.typeCoercion
- I determine if strict decision functions should be used when inspecting a given value. When disabled,false
and"no"
will be considered different. As will1
and"1"
.
After I was done with the current blog post, I realized that I actually needed key-case-sensitivity in my own work. As such, I went back and added some more robust behavior.
Want to use code from this post? Check out the license.
Reader Comments
Does it only work with simple values? Or can you stuff it with objects (like Java objects, CFC's, etc.) and it still do the work?
How's the speed?
@Will,
I really only designed it to work with native data structures (string, struct, array, etc). But, if all else fails, it will fall back to using the
.hashCode()
on whatever you give it. That said, I have no idea how that would work with CFCs.As for speed, I'm assuming it's not great due to recursion. But, it's probably not terrible since most data structures aren't that deep. That said, I only need it for when I'm mutating data; so the use-case would be limited in scope. Meaning, it's not something that I'd be running on every request.
Have you tried
objectEquals()
for this? Ignore the goofy explanation in cfdocs, but I think it does the same for your code (at least in Lucee... I have not used ACF for eons)@Andrew,
Literally never seen that function before 🤪 I'll have to play around with it; but, it might be exactly on the money. That said, Adobe ColdFusion also has one, but the description seems to be about "client side CFCs". Very confusing. Awesome tip, though! I'll circle back on what I find.
For others, here's two relevant links:
objectEquals()
objectEquals()
Well, it appears I may be the only person to ever use it 🙃
https://community.ortussolutions.com/t/boxlang-town-meeting-edition-is-going-live-now/10223/2?u=andrew_kretzer
Ha ha, you're too far ahead of the curve 😛
@Andrew,
So, I just added this to my test code (I didn't update the blog post, just did this on my dev server):
And it reports back as
NO
on Adobe ColdFusion andtrue
on Lucee CFML. I don't know which parts of this are necessary not working as one would hope; but, it seems to have something that is different that my choices (inFusionCode.cfc
) and something that is different in between engines.I just realized that
javaCast( "long" )
is truncated decimal values. I didn't realize that—I had thought thatlong
could hold decimal values. I get a little fuzzy on the low-level data types. It looks likedouble
can hold decimals, though. I'll find a way to tweak that.I've updated the code to use
BigDecimal
+javaCast("double")
instead ofBigInteger
+javaCast("long")
for normalizing numbers. Hopefully this is more accurate.Hmmm, and now I'm wondering if I can't just normalize numbers with the
javaCast()
alone, and not worry at all about theBigDecimal
:Gonna play with that and see if that works better.
Ahh, ok, I can't do that. I didn't realize this at first, but the
update(int)
in the CRC-32 is only taking the lowest 8-bits:Uggg, this gets more complicated. Ok, I think maybe I have to update my
putNumber()
method to actually stringify the value usingBigDecimal
. Something like:The combination of the
javaCast()
and theBigDecimal.toString()
seems to give the best results.Oh man, this is such a rabbit hole! Fraught with edge-cases. What I'm realizing now is that by converting the
BigDecimal
to a string, I can get false-positive equivalence. Maybe not so much in this particular post (where type-coercion is acceptable); but, in a follow-up post that I'm working on, I run into a case where the string"100.1"
and theBigDecimal.valueOf(100.1)
then get stringified in the same way. Yargggg!!!@All,
Ok, I took another stab at creating a more configurable version of the
FusionCode.cfc
:www.bennadel.com/blog/4681-creating-a-coldfusion-oriented-hashcode-with-loose-types-part-2.htm
In Part 2, you can enable struct-key / column-name case sensitivity; and, you can disable type-coercion. Meaning,
true
, and"true"
will no longer be hashed as same value.Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →