Using Both STORED And DEFLATED Compression Methods With ZipOutputStream In Lucee CFML 5.3.7.47
In yesterday's post about generating and incrementally streaming a Zip archive in Lucee CFML, I used the default compression method - DEFLATED
- in the ZipOutputStream
class. However, as I've discussed in the past, "deflating" images within a Zip archive can be a waste of CPU since most images are already compressed. As such, I wanted to quickly revisit the use of the ZipOutputStream
, but try to archive images within the Zip using the STORED
(ie, uncompressed) method in Lucee CFML 5.3.7.47.
When using the DEFLATED
method, all you have to due is create the ZipEntry
class and then write the binary content to it. When using the STORED
method, on the other hand, it appears that you have to provide a bit more information. This wasn't well documented in the JavaDocs; but, based on trial-and-error, it seems as those we need to explicitly provide both the size and the CRC-32 (content checksum) when using the STORED
method.
To try this out for myself, I revamped yesterday's demo to download the images in parallel and then write them to the Zip file using either method - DEFLATED
or STORED
- based on a URL query-string parameter. I've also updated the demo to keep track of how long the compression takes so we can see if there is any performance difference.
NOTE: To keep things simple, I've removed the "incrementally streaming" portion of this demo. Now, I'm just creating the Zip archive in-memory and then serving up the binary variable using the
CFContent
tag.
<cfscript>
// Zip compression method: STORED or DEFLATED.
param name="url.method" type="string" default="DEFLATED";
// To try out the different compression methods, I'm going to download a number of
// images from the People section on my website and then add them, in turn, to the
// ZIP output stream.
imageUrls = [
"https://bennadel-cdn.com/images/header/photos/irl_2019_old_school_staff.jpg",
"https://bennadel-cdn.com/images/header/photos/james_murray_connor_murphy_drew_newberry_alvin_mutisya_nick_miller_jack_neil.jpg",
"https://bennadel-cdn.com/images/header/photos/juan_agustin_moyano_2.jpg",
"https://bennadel-cdn.com/images/header/photos/jeremiah_lee_2.jpg",
"https://bennadel-cdn.com/images/header/photos/wissam_abirached.jpg",
"https://bennadel-cdn.com/images/header/photos/winnie_tong.jpg",
"https://bennadel-cdn.com/images/header/photos/sean_roberts.jpg",
"https://bennadel-cdn.com/images/header/photos/scott_markovits.jpg",
"https://bennadel-cdn.com/images/header/photos/sara_dunnack_3.jpg",
"https://bennadel-cdn.com/images/header/photos/salvatore_dagostino.jpg",
"https://bennadel-cdn.com/images/header/photos/robbie_manalo_jessica_thorp.jpg",
"https://bennadel-cdn.com/images/header/photos/rich_armstrong.jpg"
];
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
ZipEntryClass = createObject( "java", "java.util.zip.ZipEntry" );
withTempDirectory(
( imagesDirectory ) => {
// Download the images in parallel.
// --
// NOTE: How cool is it that the file-IO operations in Lucee CFML will
// seamlessly work with remote URLs? So bad-ass!
imageUrls.each(
( imageUrl ) => {
fileCopy( imageUrl, "#imagesDirectory#/#getFileFromPath( imageUrl )#" );
},
true // Parallel processing, kablamo!
);
// Let's keep track of how long the various Zip METHODS take (which will
// only include the archiving, not the downloading portion, above).
var startedAt = getTickCount();
// We'll generate the Zip archive in-memory, rather than writing it to disk.
var binaryOutputStream = javaNew( "java.io.ByteArrayOutputStream" ).init();
var zipOutputStream = javaNew( "java.util.zip.ZipOutputStream" )
.init( binaryOutputStream )
;
// Now that we've downloaded the images, let's add each one to the Zip.
for ( var imageUrl in imageUrls ) {
var imageFilename = getFileFromPath( imageUrl );
var imageBinary = fileReadBinary( "#imagesDirectory#/#imageFilename#" );
var zipEntry = javaNew( "java.util.zip.ZipEntry" )
.init( "streaming-zip/images/#imageFilename#" )
;
// The default method is DEFLATED, which compresses the entry as it adds
// it to the archive. For some files, this results in wasted CPU; and, in
// some cases, can even result in larger files (not smaller files). If we
// just want to include the file in the archive, uncompressed, we can
// used the STORED method. This will include the file as its raw size.
// --
// NOTE: As of Java 8 (where I am running this demo), a STORED file needs
// to also set the SIZE and CRC of the entry or we get an error.
if ( url.method == "stored" ) {
zipEntry.setMethod( ZipEntryClass.STORED );
zipEntry.setSize( arrayLen( imageBinary ) );
zipEntry.setCrc( crc32( imageBinary ) );
}
zipOutputStream.putNextEntry( zipEntry );
zipOutputStream.write( imageBinary );
zipOutputStream.closeEntry();
}
// Finalize the Zip content.
zipOutputStream.close();
binaryOutputStream.close();
// NOTE: We're baking the DURATION right into the filename.
zipFilename = "people-#url.method#-#( getTickCount() - startedAt )#.zip";
// Setup the response headers. By using the CFContent tag with [variable],
// we'll implicitly reset the output buffers and use the given binary as the
// response payload. CFContent will also terminate the request of request
// processing (with the EXCEPTION of the FINALLY block in the method that
// setup the temp directory).
header
name = "content-disposition"
value = "attachment; filename=""#zipFilename#""; filename*=UTF-8''#urlEncodedFormat( '#zipFilename#' )#"
;
header
name = "content-length"
value = binaryOutputStream.size()
;
content
type = "application/zip"
variable = binaryOutputStream.toByteArray()
;
}
); // END: withTempDirectory().
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I compute the CRC-32 checksum for the byte array.
*
* @input I am the input being checked.
*/
public numeric function crc32( required binary input ) {
var checksum = createObject( "java", "java.util.zip.CRC32" ).init();
checksum.update( input );
return( checksum.getValue() );
}
/**
* I create a Java class instance with the given class name. This is just a short-hand
* method for the createObject() call.
*
* @className I am the Java class being created.
*/
public any function javaNew( required string className ) {
return( createObject( "java", className ) );
}
/**
* I create a temp directory for the images and then pass the directory path to the
* given callback. Any return value from the callback is returned-through to the
* calling context.
*
* @callback I am the callback to be invoked with the temp directory path.
*/
public any function withTempDirectory( required function callback ) {
var imagesDirectory = expandPath( "./images-#createUniqueId()#" );
directoryCreate( imagesDirectory );
try {
return( callback( imagesDirectory ) );
} finally {
directoryDelete( imagesDirectory, true ); // True = recurse.
}
}
</cfscript>
As you can see, when I want to use the STORED
method, I have to call .setMethod()
, .setSize()
, and .setCrc()
on the ZipEntry
- all three calls are required. Notice also that I am baking the method and the compression time (in milliseconds) into the archive filename. This way, we more clearly see the difference between the two methods right from the generated archives.
And, if we run this ColdFusion template using both methods, we get the following files:
As you can see, the DEFLATED
method produces a slightly smaller Zip archive file (by about 18Kb) when compared to the STORED
method. This is because - despite being already compressed - we were still able to squeeze some size out of the images. That said, if we look at the filenames, we can see that the STORED
method ran about 33% faster than the DEFLATED
method. Of course, the times will vary widely on each run; but, the general trend is that the STORED
method is faster than the DEFLATED
method since it's doing less work.
This was mostly a note-to-self since I couldn't figure out how to use the STORED
method when I was putting yesterday's demo together; and, I wanted something that I could reference in the future. That said, I'm still a big fan of using the zip
CLI (Command-Line Interface) in Lucee CFML since it's very fast; and will likely use the zip
CLI as my primary means of zipping in the future.
Want to use code from this post? Check out the license.
Reader Comments
This is a great demo, but the thing that really hit me was a method you used, called:
I must say, I have never seen this before. The amount of times, I have done something like:
This is what I love about your blogs. Not only do I learn new techniques but I learn about new native methods:)
@Charles,
Oh man,
getFileFromPath()
andgetDirectoryFromPath()
are super helpful functions! I just wish they had a function for extracting the file-extension. Because, like your example, I still have to do that vialistLast( filename, "." )
.On a side-note, I love list-functions in ColdFusion. Totally underrated.
It's weird. I knew about:
So, I am not sure why I didn't know about ** getFileFromPath()**?
Yes. A second Boolean argument. If set to true, it would return a Struct containing:
Great idea!
Maybe I should put in a feature request!