Using Both STORE And DEFLATE Compression Methods With The zip CLI In Lucee CFML 5.3.6.61
A couple of months ago, I looked as using the zip
CLI with the STORE
or DEFLATE
compression methods in Lucee CFML. The DEFLATE
compression method attempts to shrink file sizes as it adds the files to an archive where as the STORE
method just adds the files to the archive, but doesn't attempt to compress them in any way. This morning, I wanted to take a quick look at how we can apply both the STORE
and DEFLATE
methods in the same zip
command execution in Lucee CFML 5.3.6.61.
The reason I'm looking into this is because - at least in theory - compressing a file takes CPU time. And, if some files, like Images, are already in a compressed file-format, it might not be worth the CPU cost to try and compress those image files while adding them to a zip archive file.
To accomplish this mixed compression in a single zip
call, I'm going to use the -n
/ --suffixes
CLI argument. This argument uses a colon-delimited list of file-extensions to determine which files to include via the STORE
method; and, which files to include via the DEFLATE
method.
CAUTION: The
--suffixes
argument is case-sensitive. As such, a suffix of.png
will not match against the input file,Image.PNG
.
To test the outcome of this argument, I'm going to compress a directory that contains both Image files and HTML files. The images files don't benefit as much from compression; at least when comparison to HTML file, which can be heavily compressed.
In the following test, I'm going to create three files:
- One using the
STORE
method (ie, no compression). - One using the
DEFLATE
method (ie, compress everything). - One using both
STORE
andDEFLATE
(ie, mixed compression).
<cfscript>
// The data directory has a mixture of Images (which are already persisted using a
// compressed file-format) and large HTML files (which can be compressed).
dataDirectory = expandPath( "./data" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
// First, let's test the performance and outcome of the zip CLI when we use NO
// COMPRESSION at all. This will store the files in an archive, but will not attempt
// to safe any file-size.
timer
label = "No Compression (-0)"
type = "outline"
{
archiveFilePath = expandPath( "./output/no-compression.zip" );
zipOutput = executeZipFromDirectory(
dataDirectory,
[
// Don't use any compression. This will be the fastest approach, but will
// not result in any file-size advantage.
"-0",
// Recurse the input directory.
"--recurse-paths",
// Define the OUTPUT file path for the generated zip.
archiveFilePath,
// Define the INPUT file - NOTE that this path is RELATIVE TO THE WORKING
// DIRECTORY! By using a relative directory, it allows us to generate a
// ZIP file in which the relative paths become the entries in the
// resultant archive.
"./"
]
);
echo( "File size: " & getFileSize( archiveFilePath ) );
echo( "<pre>" & zipOutput & "</pre>" );
}
// Next, let's test the default behavior of the zip CLI. This uses a compression
// setting of -6, which will attempt to compress all files.
timer
label = "Default Compression (-6)"
type = "outline"
{
archiveFilePath = expandPath( "./output/default-compression.zip" );
zipOutput = executeZipFromDirectory(
dataDirectory,
[
// Recurse the input directory.
"--recurse-paths",
// Define the OUTPUT file path for the generated zip.
archiveFilePath,
// Define the INPUT file - NOTE that this path is RELATIVE TO THE WORKING
// DIRECTORY! By using a relative directory, it allows us to generate a
// ZIP file in which the relative paths become the entries in the
// resultant archive.
"./"
]
);
echo( "File size: " & getFileSize( archiveFilePath ) );
echo( "<pre>" & zipOutput & "</pre>" );
}
// And, finally, let's test the performance and outcome of the zip CLI when we use
// the default compression, but tell the CLI to store any IMAGE FILES WITHOUT
// COMPRESSION. This will include images in the archive, but will not attempt to
// improve upon the already-compressed file-formats.
timer
label = "Mixed Compression (-6 + suffixes)"
type = "outline"
{
archiveFilePath = expandPath( "./output/mixed-compression.zip" );
// We are going to tell the zip CLI to skip compression for files with the given
// set of file-extensions. This uses a colon-delimited list of extensions.
// --
// CAUTION: Unfortunately, these suffix values are CASE-SENSITIVE.
suffixes = [ ".gif", ".jpeg", ".jpg", ".png" ].toList( ":" );
zipOutput = executeZipFromDirectory(
dataDirectory,
[
// Recurse the input directory.
"--recurse-paths",
// Define which files will be archived using the STORAGE method (no
// compression) instead of DEFLATE.
"--suffixes #suffixes#",
// Define the OUTPUT file path for the generated zip.
archiveFilePath,
// Define the INPUT file - NOTE that this path is RELATIVE TO THE WORKING
// DIRECTORY! By using a relative directory, it allows us to generate a
// ZIP file in which the relative paths become the entries in the
// resultant archive.
"./"
]
);
echo( "File size: " & getFileSize( archiveFilePath ) );
echo( "<pre>" & zipOutput & "</pre>" );
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I execute the zip command-line utility from the given WORKING DIRECTORY using the
* given arguments. If error-output is returned from the utility, an error with the
* details is thrown.
*
* @workingDirectory I am the working directory from which to execute the zip command.
* @zipArguments I am the command-line arguments for zip.
*/
public string function executeZipFromDirectory(
required string workingDirectory,
required array zipArguments
) {
// The Shell Script that's going to proxy the ZIP command is expecting the
// working directory to be the first argument. As such, let's create a normalized
// set of arguments for our proxy that contains the working directory first,
// followed by the rest of the commands.
var normalizedArguments = [ workingDirectory ]
.append( "zip" )
.append( zipArguments, true )
;
execute
name = expandPath( "./execute_from_directory.sh" )
arguments = normalizedArguments.toList( " " )
variable = "local.successOutput"
errorVariable = "local.errorOutput"
timeout = 30
terminateOnTimeout = true
;
if ( len( errorOutput ?: "" ) ) {
throw(
type = "ZipFromDirectoryError",
message = "The zip command-line proxy returned error output.",
detail = "Error: #errorOutput#",
extendedInfo = "Working directory: #workingDirectory#, Command-line arguments: #serializeJson( zipArguments )#"
);
}
return( successOutput ?: "" );
}
/**
* I return a string representing the byte-size of the given file.
*
* @filepath I am the file to inspect.
*/
public string function getFileSize( required string filepath ) {
return( numberFormat( fileInfo( filepath ).size ) );
}
</cfscript>
As you can see, we're using the three different approaches; and, for each approach, we're outputting the file-size of the resultant archive, the time it took to generate it, and any output returned by the zip
CLI. And, when we run the above ColdFusion code, we get the following output:
As you can see, when using the default compression method (DEFLATE
) with the --suffixes
argument, we can apply compression to the HTML files and skip compression for the image files. This results in a slightly larger zip archive; but, may reduce load on the CPU.
NOTE: In this screen-shot, the mixed-compression was faster; but, that was not always the case. Sometimes, when I ran this ColdFusion code, the default compression was actually faster. But, I have to keep in mind that this is not a production environment that's serving a hundred-plus concurrent requests - it's a development environment without load. As such, it's not exactly clear how this will perform in a production environment. I will just assume that reducing CPU load is going to be a benefit more often than not.
It's also interesting to note that PNG files seem to actually benefit from some decent compression. Though, I assume that depends on the content of the PNG. My PNGs tend to include a lot of repeated colors, which I assume is exactly what compression likes to see.
Anyway, this was just a fun exploration of the zip
CLI tool in Lucee CFML.
execute_from_directory.sh
Epilogue on As you may have noticed in my code, I'm using the CFExecute
tag to invoke the zip
CLI. However, I'm not doing it directly. Instead, I'm proxying the zip
CLI through a user-defined script, execute_from_directory.sh
. I have to do this because, at this time, you cannot run the CFExecute
tag from a working directory. As such, I use a this script to proxy other commands from a working directory:
#!/bin/sh
# In the current script invocation, the first argument needs to be the WORKING DIRECTORY
# from whence the rest of the script will be executed.
working_directory=$1
# Now that we have the working directory argument saved, SHIFT IT OFF the arguments list.
# This will leave us with a "$@" array that contains the REST of the arguments.
shift
# Move to the target working directory.
cd "$working_directory"
# Execute the REST of command from within the new working directory.
# --
# NOTE: The $@ is a special array in BASH that contains the input arguments used to
# invoke the current executable.
"$@"
I'm looking forward to an upcoming release of Lucee CFML where the CFExecute
tag has been updated to include a working-directory concept. It's coming soon, I believe!
Want to use code from this post? Check out the license.
Reader Comments