Performing An In-Place Natural Sort On An Alpha-Numeric Array In Lucee CFML 5.3.6.61
In the past, I've looked at implementing a "Natural Sort order" in both JavaScipt and ColdFusion. However, my ColdFusion-based approach has always created a new array as part of its algorithm. Most of the time this doesn't matter; however, yesterday at InVision, I ran into a situation in which I needed to conditionally perform either a numeric sort or an in-place natural sort on an Array. As such, I wanted to quickly revisit the idea of a natural sort, this time using an in-place sorting algorithm in Lucee CFML 5.3.6.61.
To quickly recap what a "Natural sort order" is, it's a technique in which the numbers embedded within a string are treated "atomically". Meaning, each string of numeric characters is treated as a single numeric value. That is, each number is compared "as a number" and not as a string of "characters". This approach to sorting is more in alignment with how humans think about sorting.
In my previous ColdFusion algorithm, I implemented a natural sort by mapping one Array onto another intermediary Array; sorting the intermediary Array; and then, mapping the intermediary Array back onto a final, sorted Array. Today's algorithm uses the same basic constructs; however, rather than creating an intermediary array, I'm creating a look-up struct / index of "normalized value". Then, I'm performing an in-place sort on said normalized values.
To demonstrate, I've created an Array of values that will sort differently when using a native sort vs. a "natural sort". I then output the results of both sorting approaches:
<cfscript>
// To demonstrate the two sorting approaches, we're going to embed numbers within the
// following values that will sort differently using an Alphabetical sort (where the
// numbers are compared as Strings) and a Natural sort (where the numbers are
// compared as numbers).
reports = [
"Client 5 Data",
"Client 100 Data",
"Client 20 Data",
"Client 70 Data (22)",
"Client 70 Data (104)",
"Client 412 Data"
];
dump( label = "Native Sort", var = sortReports( reports ) );
echo( "<br />" );
dump( label = "Natural Sort", var = naturalSortReports( reports ) );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I perform an IN PLACE sort of the given collection using a text-sort.
*
* @collection I am the collection being sorted.
*/
public array function sortReports( required array collection ) {
return( collection.sort( "textNoCase" ) );
}
/**
* I perform an IN PLACE NATURAL sort of the given collection in which the embedded
* numbers are compared as numbers, not just strings of numeric characters.
*
* @collection I am the collection being sorted.
*/
public array function naturalSortReports( required array collection ) {
// The trick to performing a "natural sort" in which numbers are treated "as
// numbers", is that we're actually going to perform a "text sort"; but, we're
// going to do so with numbers that have been NORMALIZED IN LENGTH (which is what
// makes it safe to sort based on a the character data). To do this, we need to
// create an index (look-up hash) that maps our collection values onto the
// normalized values.
var natraulValueIndex = {};
for ( var item in collection ) {
naturalValueIndex[ item ] = normalizeEmbeddedNumbers( item );
}
// Now that we have our value -> normalized-value mapping, we can perform the
// IN-PLACE SORT with an operator that compares the mapped values.
collection.sort(
( a, b ) => {
var aValue = naturalValueIndex[ a ];
var bValue = naturalValueIndex[ b ];
return( aValue.compareNoCase( bValue ) );
}
);
return( collection );
}
/**
* I attempt to normalize the numbers embedded within the given value, making them all
* 12-digits long. Any embedded number that is less than 12-digits long will be front-
* padded with zeros:
*
* foo 3 bar -> becomes -> foo 000000000003 bar
* foo 3.3 bar -> becomes -> foo 000000000003.000000000003 bar
*
* @value I am the value being normalized.
*/
public string function normalizeEmbeddedNumbers( required string value ) {
var normalizedValue = value
// STEP 1: Split the value up into Numeric and Non-Numeric tokens.
.reMatch( "\d+|\D+" )
// STEP 2: Map the tokens onto normalized tokens. This means that any token
// that is numeric (a string of numeric characters) will be formatted to be
// 12-digits long (front-padded with zeros).
.map(
( token ) => {
if ( isNumeric( token ) ) {
return( numberFormat( token, "000000000000" ) );
}
return( token );
}
)
// STEP 3: Join the normalized tokens back together into a single value.
.toList( "" )
;
return( normalizedValue );
}
</cfscript>
As you can see, the first step of my naturalSortReports()
function is create a mapping - natraulValueIndex
- of the inherent values onto "normalized values". The normalized values are Strings in which the embedded numbers have all been front-padded with zeros to be 12-digits long. Then, I'm simply using the natraulValueIndex
to find the normalized values within my .sort()
in-place sort operation.
Now, if I run the following ColdFusion code in Lucee, I get the following output:
As you can see, when using the in-place natural sort on the collection, we are left with numbers that sort based on how people think about sorting.
This whole "natural sort" algorithm works because we can pass a Function / Closure to the .sort()
method in Lucee CFML that allows us to implement a custom sort operator. I just love how wonderfully enjoyable and expressive the Lucee runtime is. What a joyful language!
Want to use code from this post? Check out the license.
Reader Comments
Hey Ben - I'd suggest that the numbers after a decimal point should not be padded - otherwise 3.003 would be seen as the same as 3.3, and 3.11 would be seen as greater than 3.9...
@Seb,
That's a good point. To be honest, I go back and forth about how I feel about it because there's decimal numbers like:
3.145
But, then there are also strings of numbers that happen to contain decimal-points, like IP addresses:
127.0.0.1
These two types would / could follow different rules. In a previous post, I think I did try to handle this more gracefully, with a RegEx pattern that attempted to take more of the "number" into account. I will go back and noodle on it some more. If the goal here is to make the sort more human friendly (which is the whole point of "natural sort order"), then I agree with you that it would make sense to handle the number of granularly.