Code Kata: Parsing Strings Like "5mb" Into A Number Of Bytes In Lucee CFML 5.3.7.47
In yesterday's post about streaming an incremental ZIP file up to Amazon S3 in Lucee CFML, I had to wait until "chunks" were over 5mb (5 megabytes) in size before I could upload them. To do this, I literally calculated the number of bytes that equated to 5mb. Afterwards, I thought it would be nice if there were methods for converting between bytes and larger data-units. As a code kata, I wanted to see if I could create just functions in Lucee CFML 5.3.7.47.
In ColdFusion, there is already a precedence for converting between two units of measurement: inputBaseN()
and formatBaseN()
. inputBaseN()
converts a given value into decimal (base 10); and, formatBaseN()
converts a given decimal (base 10) into another base. As such, when converting between bytes and other units (ex, megabytes), I wanted to use the same input / format terminology:
inputBytesN( quantity, unit )
- converts a given unit into bytes.formatBytesN( quantity, unit )
- converts bytes into a given unit.parseBytes( input )
- short-hand function that will parse the quantity and unit out of a string like, "5mb", and pipe them into theinputBytesN()
function.
In the end, these functions just wrap a bunch of multiplications and divisions of 1024
, which is the number of bytes in a kilobyte (and is the general multiplier needed to move between different units):
<cfscript> | |
echo( "<p><strong> Testing parseBytes() </strong></p>" ); | |
echo( parseBytes( "1.305 kb" ) & "<br />" ); | |
echo( parseBytes( "2 megabytes" ) & "<br />" ); | |
echo( parseBytes( "3 gb" ) & "<br />" ); | |
echo( "<br />" ); | |
echo( "<p><strong> Testing inputBytesN() </strong></p>" ); | |
echo( inputBytesN( 1, "bit" ) & "<br />" ); | |
echo( inputBytesN( 1, "b" ) & "<br />" ); | |
echo( inputBytesN( 1, "kb" ) & "<br />" ); | |
echo( inputBytesN( 1, "mb" ) & "<br />" ); | |
echo( inputBytesN( 1, "gb" ) & "<br />" ); | |
echo( inputBytesN( 1, "tb" ) & "<br />" ); | |
echo( "<br />" ); | |
echo( "<p><strong> Testing formatBytesN() </strong></p>" ); | |
echo( formatBytesN( 1, "bit" ) & "<br />" ); | |
echo( formatBytesN( 1, "b" ) & "<br />" ); | |
echo( formatBytesN( 1024, "kb" ) & "<br />" ); | |
echo( formatBytesN( 1048576, "mb" ) & "<br />" ); | |
echo( formatBytesN( 1073741824, "gb" ) & "<br />" ); | |
echo( formatBytesN( 1099511627776, "tb" ) & "<br />" ); | |
// ------------------------------------------------------------------------------- // | |
// ------------------------------------------------------------------------------- // | |
/** | |
* I convert the given number of bytes into the given unit. No rounding of decimals is | |
* performed. If you want to round the value, you must do it in the calling context. | |
* | |
* Example: formatBytesN( 1024, "kb" ) => 1 | |
* | |
* @quantity I am the number of bytes to convert. | |
* @unit I am the unit of measurement into which we are converting. | |
*/ | |
public numeric function formatBytesN( | |
required numeric quantity, | |
required string unit | |
) { | |
switch ( unit ) { | |
case "bit": | |
case "bits": | |
return( quantity * 8 ); | |
break; | |
// CAUTION: Lowercase "b" is actually the international standard for BIT. | |
// However, since ColdFusion is case-insensitive, I'm going to use any case | |
// of "B" to mean Byte. | |
case "b": | |
case "byte": | |
case "bytes": | |
return( quantity ); | |
break; | |
case "k": | |
case "kb": | |
case "kilobyte": | |
case "kilobytes": | |
return( quantity / 1024 ); | |
break; | |
case "m": | |
case "mb": | |
case "megabyte": | |
case "megabytes": | |
return( quantity / 1024 / 1024 ); | |
break; | |
case "g": | |
case "gb": | |
case "gigabyte": | |
case "gigabytes": | |
return( quantity / 1024 / 1024 / 1024 ); | |
break; | |
case "t": | |
case "tb": | |
case "terabyte": | |
case "terabytes": | |
return( quantity / 1024 / 1024 / 1024 / 1024 ); | |
break; | |
default: | |
throw( | |
type = "UnsupportedUnit", | |
message = "Format unit not recognized", | |
extendedInfo = serializeJson( arguments ) | |
); | |
break; | |
} | |
} | |
/** | |
* I convert the given quantity into the equivalent number of bytes. | |
* | |
* Example: inputBytesN( 1, "kb" ) => 1024 | |
* | |
* @quantity I am the value to convert. | |
* @unit I am the unit of measurement in which the quantity was defined. | |
*/ | |
public numeric function inputBytesN( | |
required numeric value, | |
required string unit | |
) { | |
switch ( unit ) { | |
case "bit": | |
case "bits": | |
return( ceiling( value / 8 ) ); | |
break; | |
// CAUTION: Lowercase "b" is actually the international standard for BIT. | |
// However, since ColdFusion is case-insensitive, I'm going to use any case | |
// of "B" to mean Byte. | |
case "b": | |
case "byte": | |
case "bytes": | |
return( value ); | |
break; | |
case "k": | |
case "kb": | |
case "kilobyte": | |
case "kilobytes": | |
return( ceiling( value * 1024 ) ); | |
break; | |
case "m": | |
case "mb": | |
case "megabyte": | |
case "megabytes": | |
return( ceiling( value * 1024 * 1024 ) ); | |
break; | |
case "g": | |
case "gb": | |
case "gigabyte": | |
return( ceiling( value * 1024 * 1024 * 1024 ) ); | |
break; | |
case "t": | |
case "tb": | |
case "terabyte": | |
case "terabytes": | |
return( ceiling( value * 1024 * 1024 * 1024 * 1024 ) ); | |
break; | |
default: | |
throw( | |
type = "UnsupportedUnit", | |
message = "Input unit not recognized", | |
extendedInfo = serializeJson( arguments ) | |
); | |
break; | |
} | |
} | |
/** | |
* I parse the given quantity/unit string into the number of bytes. This is basically | |
* a short-hand for the inputBytesN() function. | |
* | |
* Example: parseBytes( "1kb" ) => 1024 | |
* | |
* @input I am the string to parse and convert. | |
*/ | |
public numeric function parseBytes( required string input ) { | |
// RegEx pattern matches leading number followed by trailing strings. | |
var parts = input | |
.lcase() | |
.trim() | |
.reMatchNoCase( "^[\d.]+|[a-z]+$" ) | |
; | |
if ( parts.len() != 2 ) { | |
throw( | |
type = "UnexpectedInput", | |
message = "Input string must contain a quantity followed by a unit", | |
extendedInfo = serializeJson( arguments ) | |
); | |
} | |
var quantity = val( parts[ 1 ] ); | |
var unit = parts[ 2 ]; | |
return( inputBytesN( quantity, unit ) ); | |
} | |
</cfscript> |
As you can see, when converting to bytes, we're really just multiplying by some variation of 1024
; and, when converting from bytes, we're really just dividing by some variation of 1024
. And, when we run this ColdFusion code, we get the following output:
Testing parseBytes()
1337
2097152
3221225472Testing inputBytesN()
1
1
1024
1048576
1073741824
1099511627776Testing formatBytesN()
8
1
1
1
1
1
This was a fun little mental exercise in ColdFusion. Though, looking at the parseBytes()
function, it's hard to believe there's still no reMatchGroups()
function in ColdFusion - extracting parts of a Regular Expression (RegEx) is still oddly challenging.
Want to use code from this post? Check out the license.
Reader Comments
Re:
Very hard to believe, especially as it's been there since CF2016 ;-). Ref: https://helpx.adobe.com/uk/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/refind.html
Example:
https://trycf.com/gist/7867dc3eb485dc9047088cc9168192fd/acf2016?theme=monokai
This was the result of raising an issue with Adobe to get the feature added, and then them doing so. Ref: https://tracker.adobe.com/#/view/CF-3321666
@Adam,
Oh snap!!! I totally missed that one. Looks like it may be Adobe ColdFusion at this point - the Lucee CFML docs have the "scope" option documented; but, when I try to run the code in the inline-editor on the docs, it throws an error that there are too many arguments.
That said, this is awesome! Thanks for pointing this out. Mental model augmented :muscle:
That trycf.com snippet I posted works in Lucee 5. It errors as you say in 4.5 though.
Would be kind of cool if formatBytes didn't require a unit. So if I pass X to it, it recognizes, oh this is greater than 1 meg but less than a gif, so show it as N megs. Oh, this is greater than a gig, so show it as N gigs. Basically, apply the best unit to it.
@Adam,
Good tip on the
reFind()
stuff - I played around with it this morning:www.bennadel.com/blog/3973-building-rematchgroups-using-refind-in-adobe-coldfusion-2018-and-lucee-cfml-5-3-7-47.htm
It looks to be a bit janky in Lucee CFML; I'll see if I can find any bugs filed on it.
@Raymond,
Yeah, that makes a lot of sense. I wonder if it would make sense to make the unit optional. Then, if it were there, I would use the explicit one; and, if omitted, I could make the "best guess" version.