Code Kata: Parsing Strings Like "5mb" Into A Number Of Bytes In Lucee CFML 5.3.7.47

By Ben Nadel

Published 2021-01-24 in ColdFusion — Comments (6)

In yesterday's post about streaming an incremental ZIP file up to Amazon S3 in Lucee CFML, I had to wait until "chunks" were over 5mb (5 megabytes) in size before I could upload them. To do this, I literally calculated the number of bytes that equated to 5mb. Afterwards, I thought it would be nice if there were methods for converting between bytes and larger data-units. As a code kata, I wanted to see if I could create just functions in Lucee CFML 5.3.7.47.

In ColdFusion, there is already a precedence for converting between two units of measurement: inputBaseN() and formatBaseN(). inputBaseN() converts a given value into decimal (base 10); and, formatBaseN() converts a given decimal (base 10) into another base. As such, when converting between bytes and other units (ex, megabytes), I wanted to use the same input / format terminology:

inputBytesN( quantity, unit ) - converts a given unit into bytes.
formatBytesN( quantity, unit ) - converts bytes into a given unit.
parseBytes( input ) - short-hand function that will parse the quantity and unit out of a string like, "5mb", and pipe them into the inputBytesN() function.

In the end, these functions just wrap a bunch of multiplications and divisions of 1024, which is the number of bytes in a kilobyte (and is the general multiplier needed to move between different units):

<cfscript>

	echo( "<p><strong> Testing parseBytes() </strong></p>" );
	echo( parseBytes( "1.305 kb" ) & "<br />" );
	echo( parseBytes( "2 megabytes" ) & "<br />" );
	echo( parseBytes( "3 gb" ) & "<br />" );
	echo( "<br />" );

	echo( "<p><strong> Testing inputBytesN() </strong></p>" );
	echo( inputBytesN( 1, "bit" ) & "<br />" );
	echo( inputBytesN( 1, "b" ) & "<br />" );
	echo( inputBytesN( 1, "kb" ) & "<br />" );
	echo( inputBytesN( 1, "mb" ) & "<br />" );
	echo( inputBytesN( 1, "gb" ) & "<br />" );
	echo( inputBytesN( 1, "tb" ) & "<br />" );
	echo( "<br />" );

	echo( "<p><strong> Testing formatBytesN() </strong></p>" );
	echo( formatBytesN( 1, "bit" ) & "<br />" );
	echo( formatBytesN( 1, "b" ) & "<br />" );
	echo( formatBytesN( 1024, "kb" ) & "<br />" );
	echo( formatBytesN( 1048576, "mb" ) & "<br />" );
	echo( formatBytesN( 1073741824, "gb" ) & "<br />" );
	echo( formatBytesN( 1099511627776, "tb" ) & "<br />" );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I convert the given number of bytes into the given unit. No rounding of decimals is
	* performed. If you want to round the value, you must do it in the calling context.
	* 
	* Example: formatBytesN( 1024, "kb" ) => 1
	* 
	* @quantity I am the number of bytes to convert.
	* @unit I am the unit of measurement into which we are converting.
	*/
	public numeric function formatBytesN(
		required numeric quantity,
		required string unit
		) {

		switch ( unit ) {
			case "bit":
			case "bits":
				return( quantity * 8 );
			break;
			// CAUTION: Lowercase "b" is actually the international standard for BIT.
			// However, since ColdFusion is case-insensitive, I'm going to use any case
			// of "B" to mean Byte.
			case "b":
			case "byte":
			case "bytes":
				return( quantity );
			break;
			case "k":
			case "kb":
			case "kilobyte":
			case "kilobytes":
				return( quantity / 1024 );
			break;
			case "m":
			case "mb":
			case "megabyte":
			case "megabytes":
				return( quantity / 1024 / 1024 );
			break;
			case "g":
			case "gb":
			case "gigabyte":
			case "gigabytes":
				return( quantity / 1024 / 1024 / 1024 );
			break;
			case "t":
			case "tb":
			case "terabyte":
			case "terabytes":
				return( quantity / 1024 / 1024 / 1024 / 1024 );
			break;
			default:
				throw(
					type = "UnsupportedUnit",
					message = "Format unit not recognized",
					extendedInfo = serializeJson( arguments )
				);
			break;
		}

	}


	/**
	* I convert the given quantity into the equivalent number of bytes.
	* 
	* Example: inputBytesN( 1, "kb" ) => 1024
	* 
	* @quantity I am the value to convert.
	* @unit I am the unit of measurement in which the quantity was defined.
	*/
	public numeric function inputBytesN(
		required numeric value,
		required string unit
		) {

		switch ( unit ) {
			case "bit":
			case "bits":
				return( ceiling( value / 8 ) );
			break;
			// CAUTION: Lowercase "b" is actually the international standard for BIT.
			// However, since ColdFusion is case-insensitive, I'm going to use any case
			// of "B" to mean Byte.
			case "b":
			case "byte":
			case "bytes":
				return( value );
			break;
			case "k":
			case "kb":
			case "kilobyte":
			case "kilobytes":
				return( ceiling( value * 1024 ) );
			break;
			case "m":
			case "mb":
			case "megabyte":
			case "megabytes":
				return( ceiling( value * 1024 * 1024 ) );
			break;
			case "g":
			case "gb":
			case "gigabyte":
				return( ceiling( value * 1024 * 1024 * 1024 ) );
			break;
			case "t":
			case "tb":
			case "terabyte":
			case "terabytes":
				return( ceiling( value * 1024 * 1024 * 1024 * 1024 ) );
			break;
			default:
				throw(
					type = "UnsupportedUnit",
					message = "Input unit not recognized",
					extendedInfo = serializeJson( arguments )
				);
			break;
		}

	}


	/**
	* I parse the given quantity/unit string into the number of bytes. This is basically
	* a short-hand for the inputBytesN() function.
	* 
	* Example: parseBytes( "1kb" ) => 1024
	* 
	* @input I am the string to parse and convert.
	*/
	public numeric function parseBytes( required string input ) {

		// RegEx pattern matches leading number followed by trailing strings.
		var parts = input
			.lcase()
			.trim()
			.reMatchNoCase( "^[\d.]+|[a-z]+$" )
		;

		if ( parts.len() != 2 ) {

			throw(
				type = "UnexpectedInput",
				message = "Input string must contain a quantity followed by a unit",
				extendedInfo = serializeJson( arguments )
			);

		}

		var quantity = val( parts[ 1 ] );
		var unit = parts[ 2 ];

		return( inputBytesN( quantity, unit ) );

	}

</cfscript>

As you can see, when converting to bytes, we're really just multiplying by some variation of 1024; and, when converting from bytes, we're really just dividing by some variation of 1024. And, when we run this ColdFusion code, we get the following output:

Testing parseBytes()

1337
2097152
3221225472

Testing inputBytesN()

1
1
1024
1048576
1073741824
1099511627776

Testing formatBytesN()

8
1
1
1
1
1

This was a fun little mental exercise in ColdFusion. Though, looking at the parseBytes() function, it's hard to believe there's still no reMatchGroups() function in ColdFusion - extracting parts of a Regular Expression (RegEx) is still oddly challenging.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3972

Reader Comments

Adam Cameron Jan 24, 2021 at 12:36 PM

20 Comments

Re:

it's hard to believe there's still no reMatchGroups() function in ColdFusion - extracting parts of a Regular Expression

Very hard to believe, especially as it's been there since CF2016 ;-). Ref: https://helpx.adobe.com/uk/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/refind.html

Example:
https://trycf.com/gist/7867dc3eb485dc9047088cc9168192fd/acf2016?theme=monokai

This was the result of raising an issue with Adobe to get the feature added, and then them doing so. Ref: https://tracker.adobe.com/#/view/CF-3321666

Ben Nadel Jan 24, 2021 at 2:10 PM

15,902 Comments

@Adam,

Oh snap!!! I totally missed that one. Looks like it may be Adobe ColdFusion at this point - the Lucee CFML docs have the "scope" option documented; but, when I try to run the code in the inline-editor on the docs, it throws an error that there are too many arguments.

That said, this is awesome! Thanks for pointing this out. Mental model augmented :muscle:

Adam Cameron Jan 24, 2021 at 2:14 PM

20 Comments

That trycf.com snippet I posted works in Lucee 5. It errors as you say in 4.5 though.

Raymond Camden Jan 25, 2021 at 2:29 PM

362 Comments

Would be kind of cool if formatBytes didn't require a unit. So if I pass X to it, it recognizes, oh this is greater than 1 meg but less than a gif, so show it as N megs. Oh, this is greater than a gig, so show it as N gigs. Basically, apply the best unit to it.

Ben Nadel Jan 26, 2021 at 7:11 AM

15,902 Comments

@Adam,

Good tip on the reFind() stuff - I played around with it this morning:

www.bennadel.com/blog/3973-building-rematchgroups-using-refind-in-adobe-coldfusion-2018-and-lucee-cfml-5-3-7-47.htm

It looks to be a bit janky in Lucee CFML; I'll see if I can find any bugs filed on it.

Ben Nadel Jan 26, 2021 at 7:12 AM

15,902 Comments

@Raymond,

Yeah, that makes a lot of sense. I wonder if it would make sense to make the unit optional. Then, if it were there, I would use the explicit one; and, if omitted, I could make the "best guess" version.

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.