Code Kata: Parsing Strings Like "5mb" Into A Number Of Bytes In Lucee CFML 5.3.7.47
In yesterday's post about streaming an incremental ZIP file up to Amazon S3 in Lucee CFML, I had to wait until "chunks" were over 5mb (5 megabytes) in size before I could upload them. To do this, I literally calculated the number of bytes that equated to 5mb. Afterwards, I thought it would be nice if there were methods for converting between bytes and larger data-units. As a code kata, I wanted to see if I could create just functions in Lucee CFML 5.3.7.47.
In ColdFusion, there is already a precedence for converting between two units of measurement: inputBaseN()
and formatBaseN()
. inputBaseN()
converts a given value into decimal (base 10); and, formatBaseN()
converts a given decimal (base 10) into another base. As such, when converting between bytes and other units (ex, megabytes), I wanted to use the same input / format terminology:
inputBytesN( quantity, unit )
- converts a given unit into bytes.formatBytesN( quantity, unit )
- converts bytes into a given unit.parseBytes( input )
- short-hand function that will parse the quantity and unit out of a string like, "5mb", and pipe them into theinputBytesN()
function.
In the end, these functions just wrap a bunch of multiplications and divisions of 1024
, which is the number of bytes in a kilobyte (and is the general multiplier needed to move between different units):
<cfscript>
echo( "<p><strong> Testing parseBytes() </strong></p>" );
echo( parseBytes( "1.305 kb" ) & "<br />" );
echo( parseBytes( "2 megabytes" ) & "<br />" );
echo( parseBytes( "3 gb" ) & "<br />" );
echo( "<br />" );
echo( "<p><strong> Testing inputBytesN() </strong></p>" );
echo( inputBytesN( 1, "bit" ) & "<br />" );
echo( inputBytesN( 1, "b" ) & "<br />" );
echo( inputBytesN( 1, "kb" ) & "<br />" );
echo( inputBytesN( 1, "mb" ) & "<br />" );
echo( inputBytesN( 1, "gb" ) & "<br />" );
echo( inputBytesN( 1, "tb" ) & "<br />" );
echo( "<br />" );
echo( "<p><strong> Testing formatBytesN() </strong></p>" );
echo( formatBytesN( 1, "bit" ) & "<br />" );
echo( formatBytesN( 1, "b" ) & "<br />" );
echo( formatBytesN( 1024, "kb" ) & "<br />" );
echo( formatBytesN( 1048576, "mb" ) & "<br />" );
echo( formatBytesN( 1073741824, "gb" ) & "<br />" );
echo( formatBytesN( 1099511627776, "tb" ) & "<br />" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I convert the given number of bytes into the given unit. No rounding of decimals is
* performed. If you want to round the value, you must do it in the calling context.
*
* Example: formatBytesN( 1024, "kb" ) => 1
*
* @quantity I am the number of bytes to convert.
* @unit I am the unit of measurement into which we are converting.
*/
public numeric function formatBytesN(
required numeric quantity,
required string unit
) {
switch ( unit ) {
case "bit":
case "bits":
return( quantity * 8 );
break;
// CAUTION: Lowercase "b" is actually the international standard for BIT.
// However, since ColdFusion is case-insensitive, I'm going to use any case
// of "B" to mean Byte.
case "b":
case "byte":
case "bytes":
return( quantity );
break;
case "k":
case "kb":
case "kilobyte":
case "kilobytes":
return( quantity / 1024 );
break;
case "m":
case "mb":
case "megabyte":
case "megabytes":
return( quantity / 1024 / 1024 );
break;
case "g":
case "gb":
case "gigabyte":
case "gigabytes":
return( quantity / 1024 / 1024 / 1024 );
break;
case "t":
case "tb":
case "terabyte":
case "terabytes":
return( quantity / 1024 / 1024 / 1024 / 1024 );
break;
default:
throw(
type = "UnsupportedUnit",
message = "Format unit not recognized",
extendedInfo = serializeJson( arguments )
);
break;
}
}
/**
* I convert the given quantity into the equivalent number of bytes.
*
* Example: inputBytesN( 1, "kb" ) => 1024
*
* @quantity I am the value to convert.
* @unit I am the unit of measurement in which the quantity was defined.
*/
public numeric function inputBytesN(
required numeric value,
required string unit
) {
switch ( unit ) {
case "bit":
case "bits":
return( ceiling( value / 8 ) );
break;
// CAUTION: Lowercase "b" is actually the international standard for BIT.
// However, since ColdFusion is case-insensitive, I'm going to use any case
// of "B" to mean Byte.
case "b":
case "byte":
case "bytes":
return( value );
break;
case "k":
case "kb":
case "kilobyte":
case "kilobytes":
return( ceiling( value * 1024 ) );
break;
case "m":
case "mb":
case "megabyte":
case "megabytes":
return( ceiling( value * 1024 * 1024 ) );
break;
case "g":
case "gb":
case "gigabyte":
return( ceiling( value * 1024 * 1024 * 1024 ) );
break;
case "t":
case "tb":
case "terabyte":
case "terabytes":
return( ceiling( value * 1024 * 1024 * 1024 * 1024 ) );
break;
default:
throw(
type = "UnsupportedUnit",
message = "Input unit not recognized",
extendedInfo = serializeJson( arguments )
);
break;
}
}
/**
* I parse the given quantity/unit string into the number of bytes. This is basically
* a short-hand for the inputBytesN() function.
*
* Example: parseBytes( "1kb" ) => 1024
*
* @input I am the string to parse and convert.
*/
public numeric function parseBytes( required string input ) {
// RegEx pattern matches leading number followed by trailing strings.
var parts = input
.lcase()
.trim()
.reMatchNoCase( "^[\d.]+|[a-z]+$" )
;
if ( parts.len() != 2 ) {
throw(
type = "UnexpectedInput",
message = "Input string must contain a quantity followed by a unit",
extendedInfo = serializeJson( arguments )
);
}
var quantity = val( parts[ 1 ] );
var unit = parts[ 2 ];
return( inputBytesN( quantity, unit ) );
}
</cfscript>
As you can see, when converting to bytes, we're really just multiplying by some variation of 1024
; and, when converting from bytes, we're really just dividing by some variation of 1024
. And, when we run this ColdFusion code, we get the following output:
Testing parseBytes()
1337
2097152
3221225472Testing inputBytesN()
1
1
1024
1048576
1073741824
1099511627776Testing formatBytesN()
8
1
1
1
1
1
This was a fun little mental exercise in ColdFusion. Though, looking at the parseBytes()
function, it's hard to believe there's still no reMatchGroups()
function in ColdFusion - extracting parts of a Regular Expression (RegEx) is still oddly challenging.
Want to use code from this post? Check out the license.
Reader Comments
Re:
Very hard to believe, especially as it's been there since CF2016 ;-). Ref: https://helpx.adobe.com/uk/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/refind.html
Example:
https://trycf.com/gist/7867dc3eb485dc9047088cc9168192fd/acf2016?theme=monokai
This was the result of raising an issue with Adobe to get the feature added, and then them doing so. Ref: https://tracker.adobe.com/#/view/CF-3321666
@Adam,
Oh snap!!! I totally missed that one. Looks like it may be Adobe ColdFusion at this point - the Lucee CFML docs have the "scope" option documented; but, when I try to run the code in the inline-editor on the docs, it throws an error that there are too many arguments.
That said, this is awesome! Thanks for pointing this out. Mental model augmented :muscle:
That trycf.com snippet I posted works in Lucee 5. It errors as you say in 4.5 though.
Would be kind of cool if formatBytes didn't require a unit. So if I pass X to it, it recognizes, oh this is greater than 1 meg but less than a gif, so show it as N megs. Oh, this is greater than a gig, so show it as N gigs. Basically, apply the best unit to it.
@Adam,
Good tip on the
reFind()
stuff - I played around with it this morning:www.bennadel.com/blog/3973-building-rematchgroups-using-refind-in-adobe-coldfusion-2018-and-lucee-cfml-5-3-7-47.htm
It looks to be a bit janky in Lucee CFML; I'll see if I can find any bugs filed on it.
@Raymond,
Yeah, that makes a lot of sense. I wonder if it would make sense to make the unit optional. Then, if it were there, I would use the explicit one; and, if omitted, I could make the "best guess" version.