Calling The Closure Compiler From ColdFusion And Java
Last week, Jon Dowdle and I were trying to get Google's Closure Compiler to concatenate, compact, and optimize a number of JavaScript files in the InVision code-base. The task proved more difficult than we anticipated; so, I spent a good amount of time on Friday and Saturday picking my way through the Closure Compiler tutorials, documentation, and source code. Closure is some complex stuff! But, after a few hours of piecing code together, I was finally able to invoke the Closure Compiler from a ColdFusion 10 application and produce output that seemed reasonable.
NOTE: For this demo, I am using ColdFusion 10 to define Closure as a per-application JAR file. If you don't have ColdFusion 10, you can simply place the Closure compiler JAR file in the ColdFusion class-paths; or, use the JavaLoader library to load it on demand.
The Closure compiler has an enormous number of options that can influence the way your JavaScript code is altered, optimized, and minified. You can definitely set any and all of these options manually; however, Closure provides a utility class - CompilationLevel - that sets various combinations of options depending on how aggressive you want to be in your compiling. CompilationLevel provides three levels of aggressiveness (from the documentation):
WHITESPACE_ONLY - Removes comments and extra whitespace in the input JS.
SIMPLE_OPTIMIZATIONS - Performs transformations to the input JS that do not require any changes to JS that depend on the input JS. For example, function arguments are renamed (which should not matter to code that depends on the input JS), but functions themselves are not renamed (which would otherwise require external code to change to use the renamed function names).
ADVANCED_OPTIMIZATIONS - Aggressively reduces code size by renaming function names and variables, removing code which is never called, etc..
For the first two levels - WHITESPACE_ONLY and SIMPLE_OPTIMIZATIONS - there's not that much to think about; more or less, they just work. ADVANCED_OPTIMIZATIONS, on the other hand, requires extreme mindfulness. You to adhere to a very specific coding style within the JavaScript files that are being compiled. The Closure compiler will detect specific styles of syntax (ex, dot-notation vs. array-notation) and use those markers when deciding which parts of your code are safe to change.
And, because ADVANCED_OPTIMIZATIONS is such an aggressive compilation, the Closure compiler will err on the side of unsafe changes. This means that you have to "whitelist" code (for lack of a better term) that you want to remain unchanged. Some people find this level of compilation too aggressive (and problematic) and flat-out advise against it.
The primary problem with the ADVANCED_OPTIMIZATIONS level of compiling is the renaming of object identifiers and object properties. To give you a taste of what this means, the Closure compiler might compile the following JavaScript (pseudo-example):
if ( user.settings.hasOwnProperty( "isFavorite" ) ) { ... }
... down to:
if ( a.b.c( "isFavorite" ) ) { ... }
This is problematic for a number of reasons, not limited to the fact that "hasOwnProperty" is a native JavaScript method, and not something specific to your JavaScript input file. To prevent the Closure compiler from breaking your code during the compilation process, it provides three features (See: API Tutorial 3):
- Externs
- Exports
- String Purity
Externs are a way to tell the Closure compiler about code that is not present in the JavaScript that you're compiling, but may be a dependency of the code that you are compiling (the docs draw a comparison between Header files in C++). During the compilation process, Closure will examine the externs content, gather a list of all defined symbols, and then leave references to those symbols, within your compiled code, unchanged.
You can provide any extern files that you want. But, Closure provides a utility class - CommandLineRunner - that comes with a default set of common JavaScript symbols. This list is massive! If you output the embedded files that it provides, it doesn't seem so crazy:
- externs.zip//es3.js
- externs.zip//es5.js
- externs.zip//w3c_event.js
- externs.zip//w3c_event3.js
- externs.zip//gecko_event.js
- externs.zip//ie_event.js
- externs.zip//webkit_event.js
- externs.zip//w3c_device_sensor_event.js
- externs.zip//w3c_dom1.js
- externs.zip//w3c_dom2.js
- externs.zip//w3c_dom3.js
- externs.zip//gecko_dom.js
- externs.zip//ie_dom.js
- externs.zip//webkit_dom.js
- externs.zip//w3c_css.js
- externs.zip//gecko_css.js
- externs.zip//ie_css.js
- externs.zip//webkit_css.js
- externs.zip//google.js
- externs.zip//deprecated.js
- externs.zip//fileapi.js
- externs.zip//flash.js
- externs.zip//gears_symbols.js
- externs.zip//gears_types.js
- externs.zip//gecko_xml.js
- externs.zip//html5.js
- externs.zip//ie_vml.js
- externs.zip//iphone.js
- externs.zip//webstorage.js
- externs.zip//w3c_anim_timing.js
- externs.zip//w3c_css3d.js
- externs.zip//w3c_elementtraversal.js
- externs.zip//w3c_geolocation.js
- externs.zip//w3c_indexeddb.js
- externs.zip//w3c_navigation_timing.js
- externs.zip//w3c_range.js
- externs.zip//w3c_selectors.js
- externs.zip//w3c_xml.js
- externs.zip//window.js
- externs.zip//webkit_notifications.js
- externs.zip//webgl.js
However, if you look at the code in each one of these files, it's about 25,000 lines of JavaScript variable declarations and contains everything from "console.log" to "Object.prototype.hasOwnProperty".
Exports are a way to tell the Closure compiler that your code is actually being used. If you don't export your code in some fashion, the Closure compiler will assume your entire input is "dead code" (ie, uninvoked code) and excludes it from the compilation process. Exporting can be done by saving your code to the global scope (ie, Window) or passing it out of scope in some way (such as passing it into another function, creating a lower-case-c closure).
String purity is the idea that the Closure compiler will never change the value of a String literal. So, if you want an object property to avoid renaming, you have to define it using array-notation and a string rather than dot-notation and a symbol.
Ok, that's a lot to take in, and I'm sure it only scratches the surface of the implications of using the ADVANCED_OPTIMIZATIONS level of compiling; let's take a look at some actual code. First, I'm going to define a JavaScript file; then, I'm going to compile it using Google's Closure compiler.
The content of the JavaScript file is not so important; what you really want to pay attention to is how much the compiled code has been changed. Here is my input JavaScript that defines a name-utilities object that will return initials given a name:
(function( global, $, undefined ) {
"use strict";
// ---
// PUBLIC METHODS.
// ---
// I return the first and last initials of the given name.
function getInitials( name ) {
name = $.trim( $.ucase( name ) );
// If no name was provided, simply return the raw value.
if ( ! name ) {
return( name );
}
// Split the string on white-space characters.
var tokens = name.split( /\s+/i );
var tokenCount = tokens.length;
// If we have more than one token in the name, define the
// initials as the first and last token characters - skip
// any middle names that were provided.
if ( tokenCount > 1 ) {
return(
tokens[ 0 ].slice( 0, 1 ) +
tokens[ tokenCount - 1 ].slice( 0, 1 )
);
}
// There as only one token - just return the first character.
return( tokens[ 0 ].slice( 0, 1 ) );
}
// ---
// PRIVATE METHODS.
// ---
// I return a version of the given operator that caches the result
// based on the first argument of the input.
function memoize( operator, context ) {
var cache = {};
// If no context was provided, default to the global scope.
context = ( context || global );
var wrapper = function( input ) {
var cacheKey = ( "key:" + input );
// If the input has already been processed, return the
// previously calculated result.
if ( cache.hasOwnProperty( cacheKey ) ) {
// Debugging...
if ( console && console.info ) {
console.info( "Cache hit:", input );
}
return( cache[ cacheKey ] );
}
// Debugging...
if ( console && console.warn ) {
console.warn( "Cache miss:", input );
}
// Invoke the original operator.
var result = operator.apply( context, arguments );
// Store the caclculated result and return it.
return( cache[ cacheKey ] = result );
};
return( wrapper );
}
// ---
// DEFINE API.
// ---
// Sore the API in the global object. To make sure that the
// Closure compiler doesn't mangle the name of the object
// reference OR the names of the methods exposed, we have to
// define both the root object and the method keys as String
// values - the Closure compiler will never replace string values.
global[ "nameUtils" ] = {
"getInitials": memoize( getInitials )
};
})( window, jQuery );
Notice that when I export the utility (at the bottom of the code), I define both the name of the object and the public methods using string literals. This will ensure that the Closure compiler will not mangle the values, "nameUtils" or "getInitials".
Also notice that this file has two dependencies - jQuery and a jQuery plugin, "ucase." This is important because these files have to be provided as "externs" during the compilation process or the Closure compiler will (potentially) rename those references.
Ok, now let's compile the above code using ColdFusion and it's Java underpinnings. I've tried to put a lot of comments in the following code to do my best to explain what each option is doing and why I'm using it.
<cfscript>
// Create our compiler.
compiler = createObject( "java", "com.google.javascript.jscomp.Compiler" ).init();
// Create our options container. When we run the compile command,
// we have to provide it with a large number of options (most of
// them default values) that determine how the compiler will
// modify the given inputs.
options = createObject( "java", "com.google.javascript.jscomp.CompilerOptions" ).init();
// Set the compiler to generate code in a human-readable format
// (using line-breaks and indents).
// --
// NOTE: This is only for this demo / blog post - you wouldn't
// actually use this when producing code for production server.
options.setPrettyPrint( javaCast( "boolean", true ) );
// Because all of the inputs are concatenated into a single file
// (using our current configuration), we want to inject a comment
// before each component within the output (to help debugging).
// The delimiter can take two special value:
// -- %name% : the name you provide for the input.
// -- %num% : the index (base zero) of the input.
options.setPrintInputDelimiter( javaCast( "boolean", true ) );
options.setInputDelimiter(
javaCast( "string", "// -- File: %name% ( Input %num% ) -- //" )
);
// Since there are a TON of options, we'll defer the vast majority
// of the configuration to the compilation-level class. This will
// allow us to set all of the appropriate option values for use
// the advanced optimization.
// --
// NOTE: Advanced Optimiziation is INTENSE. It can easily mess up
// your code if you don't learn about EXTERNS and EXPORTS.
createObject( "java", "com.google.javascript.jscomp.CompilationLevel" )
.ADVANCED_OPTIMIZATIONS
.setOptionsForCompilationLevel( options )
;
// Strip out any references to console-based debugging statements.
// This will strip out qualified names that equal any of the
// following keys; or, that start with the following keys and
// contain a subsequent ".", as in "console.log.call()".
options.stripTypes = createObject( "java", "java.util.HashSet" ).init(
[
javaCast( "string", "console.log" ),
javaCast( "string", "console.dir" ),
javaCast( "string", "console.info" ),
javaCast( "string", "console.warn" ),
javaCast( "string", "console.error" )
]
);
// When Closure optimizes the code, it will rename all variables
// that it BELIEVES are local. This is extermely problematic if
// you rely on environmental or global variables. However, you can
// tell Closure to ignore objects and properties with given namnes
// by defining them in an "externs" file (or set of files). Out of
// the box, the command-line-runner class provdies a MASSIVE set
// of common externs (ex. alert, hasOwnerProperty, console, etc.).
// The code it outlines it ~25,000 lines long - blam!!!
externs = createObject( "java", "com.google.javascript.jscomp.CommandLineRunner" )
.getDefaultExterns()
;
// In addition to the default-externs outlined above, we need to
// additionally provide the objects and methods from libraries
// that our code relies upon; namely, jQuery and various plugins.
// --
// NOTE: Closure will NOT compile these externs - it simply uses
// them to create an index of the objects / methods that should
// not be renamed in the Inputs you provide.
externFiles = [
expandPath( "./js/jquery-2.0.3.min.js" ),
expandPath( "./js/jquery.ucase.js" )
];
// When adding the file, the "filename" value is simply for use
// in warnings and error.
for ( externFile in externFiles ) {
externs.add(
createObject( "java", "com.google.javascript.jscomp.JSSourceFile" ).fromCode(
javaCast( "string", getFileFromPath( externFile ) ),
javaCast( "string", fileRead( externFile ) )
)
);
}
// Create the input file that we want to compile.
inputFile = expandPath( "./js/name-utils.js" );
// NOTE: Again, the name here is only for use in the warnings,
// debugging, and the INPUT DELIMITER that we defined above.
input = createObject( "java", "com.google.javascript.jscomp.JSSourceFile" ).fromCode(
javaCast( "string", getFileFromPath( inputFile ) ),
javaCast( "string", fileRead( inputFile ) )
);
// If we pass a list of externs, we also have to pass the input
// as a list as well (all compile() signatures accept either
// single values AND multiple values, but not both).
// --
// NOTE: Since the compile() method allows for a "List" data
// type, we can pass-in a ColdFusion array since ColdFusion
// arrays are "java.util.Vector" instances, which impliment the
// List interface.
result = compiler.compile(
externs,
[ input ],
options
);
// Get the JavaScript source that the compiler produced.
// --
// NOTE: This is only human-readable because we used the
// "pretty print" compile option above.
writeOutput(
"<pre>" &
htmlEditFormat( compiler.toSource() ) &
"</pre>"
);
// Output any errors that were returned during the compilation
// process.
for ( error in result.errors ) {
writeDump( error.toString() );
}
</cfscript>
<!--- Try to consume the compiled script. --->
<script type="text/javascript" src="./js/jquery-2.0.3.min.js"></script>
<script type="text/javascript" src="./js/jquery.ucase.js"></script>
<script type="text/javascript">
// Include the code we compiled.
<cfoutput>#compiler.toSource()#</cfoutput>
console.log(
"tricia ann smith ...",
nameUtils.getInitials( "tricia ann smith" )
);
</script>
When we run the above code, we get the following JavaScript console output:
tricia ann smith ... TS
It's a lot of code I know - but hopefully the comments help clarify things a bit. As one of the options, I turned on pretty-print for the source code. This uses line-breaks and space-indenting to make the output a bit more human-friendly (for debugging). Here's what the compiler returns:
NOTE: I have replaced spaces with tabs for my blog / gist interaction.
// -- File: name-utils.js ( Input 0 ) -- //
(function(c, d) {
c.nameUtils = {getInitials:function(a, b) {
var e = {};
b = b || c;
return function(c) {
var f = "key:" + c;
if(e.hasOwnProperty(f)) {
return e[f]
}
var d = a.apply(b, arguments);
return e[f] = d
}
}(function(a) {
a = d.trim(d.ucase(a));
if(!a) {
return a
}
a = a.split(/\s+/i);
var b = a.length;
return 1 < b ? a[0].slice(0, 1) + a[b - 1].slice(0, 1) : a[0].slice(0, 1)
})}
})(window, jQuery);
Now, imagine that code with only a few line breaks and no indentation.
NOTE: The Closure compiler will always add a few line breaks. It intentionally adds a line break every 500 characters (or thereabouts) in order to prevent certain Firewalls from corrupting the file content.
This code runs, and it works well; but, it wasn't exceedingly fast and it wasn't without problems. For one, compiling this one small file takes about 3-seconds on my local development machine. I can only assume that this goes up as more files are added. Though, I also assume a good chunk of this time goes into parsing the 25,000 lines of "extern" definitions.
The only true failure that I encountered was that I started getting the Java error, "java.lang.OutOfMemoryError: PermGen space," when my ColdFusion application timeout was very low (1 minute). Though, in all fairness, I suspect this had more to do with the way ColdFusion was loading the JAR file over and over again and less to do with the size of the AST (Abstract Syntax Tree) that the Closure compiler was maintaining in memory.
At the end of the day, I was able to invoke Google's Closure compiler from ColdFusion; however, I definitely get the sense that I have a lot more to think about when I consider whether or not to use the ADVANCED_OPTIMIZATIONS level of compiling. I'll certainly report any other interesting features or issues that I encounter with the Closure compiler.
Want to use code from this post? Check out the license.
Reader Comments
Always an educational read, Ben. Thanks! Any sense for how Google's Closure Compiler compares to the others (uglify, YUI, etc)? Also, interested in why you'd do it in CF rather than something like Grunt. This is no criticism, just curious. I've not used Grunt yet, but intend to add it to my workflow at some point as it seems like a good idea.
@Chris,
Thanks! It was fun to learn about something new. To be fair, this is only the second JavaScript "compiler" that I've actually ever touched. And the one before - uglifyJS - I only used once or twice when learning about RequireJS. So, my experience with compilers is just this side of non-existent :)
From the little bit that I have read about the Closure compiler, however, people seem to think it's the most advanced compiler. Plus, it is supposed to have a large number of benefits that don't relate to minification / obfuscation; I think there's all kinds of type-checking and other syntax things? Not really sure.
As far as why I tried it in ColdFusion and not in something Grunt, it's primarily due to familiarity. I know a good deal about ColdFusion and not a whole lot about command-line tools. Outside of Git, my command-line-fu is not so great.
I, too, would be interested in learning more about Grunt and other build tools. I'm sure there is a whole world of productivity that I haven't tapped into yet!
@Ben,
As always, too many things to do/learn... never enough time to do/learn them. There are so many great tools, frameworks, and technologies out there. But in order to do any of them justice, you have to spend time with them, get to know them, take them on a date or two... then decide whether you want a relationship with them or not. Eventually, I'm going to dig into your AngularJS stuff... ah, man... like a kid in a candy store!
@Chris,
Yooooo! It's so true. I read an article about MongoDB and and I'm like, Oh man, I gotta look into that! Then I read an article about NodeJS and I'm like, Oh man, maybe I should look into that first! Then, someone mentions PhoneGap... and so on. ... It's exciting and exhausting!
I highly recommend looking into Meteor (http://www.meteor.com/) which exposed me to BOTH Node and MongoDB. Plus, it's a very promising reactive framework that blew my mind! I can't wait to spend more time with it. I hope it doesn't become a flash in the pan and matures.
@Chris,
I think I've watched the Meteor video a few times in the past. It is really cool looking. Maybe I'll try a little foray into the framework. But when? WHEN? :D
@All,
I tried to take all this code and wrap it up inside a ColdFusion component facade:
www.bennadel.com/blog/2518-ClosureCompiler-cfc-A-ColdFusion-Facade-For-Google-s-Closure-Compiler.htm
This should make is much easier to call Google's Closure compiler from a ColdFusion context.