Snapshotting ColdFusion Component State In Order To Find Memory Leaks In Lucee CFML 5.3.3.62
A couple of years ago, I wrote about a "Snooper" component that I created that would allow you to peek into the private scope of your ColdFusion component state in order to try and find memory leaks. That approach worked. Sort of. But, it was clumsy and hard to use. Then, the other week, while listening to the Modernize Or Die podcast, Brad Wood described a ColdBox module that he created that would snapshot component state and then compare those snapshots over time in order to find memory leaks. Brad's solution sounded much more elegant than my own; so, I wanted to try and build off of his snapshotting idea, but adapt it for a really old ColdFusion application that has no framework. Here's a breakdown of what I came up with in Lucee CFML 5.3.3.62.
The basic idea behind this approach is that, when the ColdFusion application in initialized, all of the ColdFusion components that have been loaded and cached in memory are located through a recursive algorithm. These located components are then used to generate a "snapshot" of the initial state of the application. At some point, later in time, you can then generate a new snapshot of the application which can then be compared to the original snapshot in order to identify changes that may represent memory leaks.
To see this in action, let's create a rather simple demo application. This ColdFusion application is going to have two ColdFusion components plus the memory leak helper component. One component will be "good"; the other component will be slowly corrupting its variables scope over time.
First, the "good" ColdFusion component, ThingOne.cfc
:
component
output = false
hint = "I represent a NON-LEAKING component."
{
public void function init( required any thingTwo ) {
variables.stringData = "bar";
variables.arrayData = [ "a", "b", "c" ];
variables.structData = { a: 1, b: 2, c: 3 };
variables.javaData = createObject( "java", "java.util.regex.Pattern" );
variables.binaryData = charsetDecode( "hello world", "utf-8" );
// This is a NESTED COMPONENT, which we will be able to track.
variables.componentData = thingTwo;
}
}
As you can see, this ColdFusion component does nothing. It just stores some static data and provides no additional business logic. As part of its initialization, it accepts and stores an instance of ThingTwo.cfc
. ThingTwo.cfc
is the other ColdFusion component that contains a memory leak:
component
output = false
hint = "I represent a LEAKING component."
{
public void function init() {
variables.stringData = "bar";
variables.arrayData = [ "a", "b", "c" ];
variables.structData = { a: 1, b: 2, c: 3 };
variables.javaData = createObject( "java", "java.util.regex.Pattern" );
variables.binaryData = charsetDecode( "hello world", "utf-8" );
// Start polluting the VARIABLES key-space over time.
thread {
for ( var i = 1 ; i < 100 ; i++ ) {
sleep( 3000 );
variables[ "rando_key_#i#" ] = {
tick: getTickCount()
};
}
}
}
}
This ColdFusion component also does next to nothing. But, it does spawn a CFThread
that starts to stick random keys into the variables
scope over time. This is the memory leak that we want to identify.
Now, let's look at our Application.cfc
ColdFusion framework component to see how these components are wired together:
component
output = false
hint = "I define the application settings and event handles."
{
// Define the application settings.
this.name = hash( getCurrentTemplatePath() );
this.applicationTimeout = createTimeSpan( 0, 1, 0, 0 );
// ---
// PUBLIC METHODS.
// ---
/**
* I initialize (or reset) the application. I get called once per application
* life-cycle.
*/
public void function onApplicationStart() {
// Initialize our model instances.
application.thingOne = new ThingOne( new ThingTwo() );
// The memory leak helper needs to be initialized LAST because, upon creation, it
// will recurse through the given container, looking for ColdFusion Components to
// target and track. And components added to this container later on will not be
// seen by the tracker.
application.memoryLeakHelper = new MemoryLeakHelper( application );
}
/**
* I initialize the request. I get called once per request life-cycle.
*/
public void function onRequestStart() {
// If reset flag is defined, stop the application and refresh the page.
if ( url.keyExists( "init" ) ) {
applicationStop();
location( url = cgi.script_name, addToken = false );
}
}
}
As you can see, all of the ColdFusion components are initialized in the onApplicationStart()
life-cycle method. In this application, we don't have any Dependency-Injection (DI) framework; so, we're just implementing our own Inversion-of-Control (IoC) through manual instantiation and argument passing.
The last ColdFusion component to be initialized is MemoryLeakHelper.cfc
. This is the component that snapshots our application state and helps to identify memory leaks. Notice that we are passing-in the application
scope. This scope represents the root container for our cached components. Upon initialization, the MemoryLeakHelper
will recurse through this root container, piercing into private variables
scopes as needed, locating all ColdFusion components that can be identified at that time.
Once the application is initialized, and the MemoryLeakHelper
has located all of the cached ColdFusion component, we can then ask the MemoryLeakHelper
to identify subsequent changes in the application state. For this demo, I'm doing that in the index.cfm
page:
<!--- I determine how deep the CFDUMP goes. --->
<cfparam name="url.top" type="numeric" default="1" />
<cfoutput>
<h1>
Memory Leak Helper
</h1>
<p>
<a href="./index.cfm?init=1">Reset Application</a> |
<a href="./index.cfm?top=#( url.top + 1 )#">Dig Deeper</a> |
<a href="./index.cfm?_=#getTickCount()#">Refresh</a>
</p>
<h2>
Memory Delta
</h2>
<cfdump
var="#application.memoryLeakHelper.findMemoryLeaks()#"
top="#url.top#"
/>
<h2>
Tracking Targets
</h2>
<cfdump
var="#application.memoryLeakHelper.getTargets()#"
/>
</cfoutput>
As you can see, the MemoryLeakHelper
ColdFusion component has two public methods:
findMemoryLeaks()
- This returns the application state delta between the original snapshot and the current snapshot.getTargets()
- This returns the collection of component names that are being targeted and tracked.
Because dumping out data can be a very intense operation, I'm using the top
property of the CFDump
tag in order to limit the burden on both the server and the browser. This top
value can then be overridden by the URL
scope in order to carefully and methodically dig deeper into the current application state delta.
Now, if you remember from above, the ThingTwo.cfc
component is using a CFThread
tag to corrupt its own variables
scope over time. So, when we run this page, we should start to see new variables showing up over time:
As you can see from the Targets list, we're tracking both ThingOne.cfc
and ThingTwo.cfc
. However, as we refresh the page, only ThingTwo.cfc
shows up in the Memory Delta; and, of its keys, only the leaking keys are listed. This is because this is the only difference between the current snapshot and the original snapshot.
Now that we see how all of this comes together, let's look at the MemoryLeakHelper.cfc
itself. This code represents my interpretation of Brad Wood's original ColdBox code; only, minus all of the ColdBox stuff (which my ColdFusion application doesn't have):
component
output = false
hint = "I help locate memory leaks within the application components."
{
/**
* I initialize the memory leak helper with the given collection of components.
*
* @container I hold the components to track.
* @output false
*/
public void function init( required struct container ) {
// As part of the initialization of the memory leak helper, gather the targets
// and snapshot the initial state of the system. This means that any lazy
// initialization of component properties (within the given container) will show
// up in future delta calculations; however, this will make it easier to consume
// the deltas behind a load balancer.
variables.targets = gatherTargets( arguments.container );
variables.snapshots = gatherSnapshops();
}
// ---
// PUBLIC METHODS.
// ---
/**
* I return the list of target names being watched for state changes.
*/
public array function getTargets() {
return( variables.targets.keyArray() );
}
/**
* I return the collection of state-slices that have observable state change.
*
* NOTE: This only ever compares the current state to the ORIGINAL snapshot - the new
* state is NOT PERSISTED for future delta calculations. Because of this, "expected"
* properties that use lazy-initialization within the targets will show up in the
* resultant delta.
*/
public struct function findMemoryLeaks() {
var delta = variables.snapshots
.map(
( name, oldSnapshot ) => {
var target = variables.targets[ name ];
var newSnapshot = generateSnapshot( target );
return( calculateSnapshotDelta( target, oldSnapshot, newSnapshot ) );
}
)
.filter(
( name, snapshotDelta ) => {
return( ! snapshotDelta.isEmpty() );
}
)
;
return( delta );
}
// ---
// PRIVATE METHODS.
// ---
/**
* I calculate the delta between the two snapshots. If a state change has been made,
* the delta is returned with the new state value.
*
* @target I am the target component that has been snapshotted.
* @oldSnapshot I am the old state snapshot.
* @newSnapshot I am the new state snapshot that is being compared.
*/
private struct function calculateSnapshotDelta(
required any target,
required struct oldSnapshot,
required struct newSnapshot
) {
var delta = {};
var targetVariables = getVariablesScope( target );
// Look for keys that exist in the old snapshot but that are no longer present in
// the new snapshot (this is likely never going to happen).
for ( var key in oldSnapshot ) {
if ( ! newSnapshot.keyExists( key ) ) {
delta[ key ] = "[null]";
}
}
// Look for new keys or existing keys that are different in the new snapshot.
for ( var key in newSnapshot ) {
// NOTE: The snapshot data uses simple value identifiers that can be safely
// compared as the standard equality operators.
if (
! oldSnapshot.keyExists( key ) ||
( oldSnapshot[ key ] != newSnapshot[ key ] )
) {
delta[ key ] = targetVariables[ key ];
}
}
return( delta );
}
/**
* I gather the initial set of snapshots for the collected targets.
*/
private struct function gatherSnapshops() {
var initialSnapshots = variables.targets.map(
( name, target ) => {
return( generateSnapshot( target ) );
}
);
return( initialSnapshots );
}
/**
* I recursively inspect the given container, looking for ColdFusion components to
* add to the targets collection so that state can be watched over time.
*
* @rootContainer I am the root container being inspected.
*/
private struct function gatherTargets( required struct rootContainer ) {
var initialTargets = {};
// Starting at the root container, the "recursion" will be powered by a queue of
// containers, rather than true recursion, so that we don't create StackOverflow
// problems (which we were seeing in production). As we examine each container,
// new containers will be added to this queue.
var containersToExplore = [ rootContainer ];
// Continue pulling containers off the FRONT of the queue until we've run out of
// new containers to inspect.
while ( containersToExplore.isDefined( 1 ) ) {
// SHIFT first container off of the queue.
var container = containersToExplore.first();
containersToExplore.deleteAt( 1 );
// Iterate over the keys in this container - we're going to be looking for
// keys that reference ColdFusion Components (ie, nested containers).
for ( var key in container ) {
// If the key is NULL for some reason, move onto the next key.
if ( ! container.keyExists( key ) ) {
continue;
}
var target = container[ key ];
// If the key is definitely NOT a ColdFusion component, move onto the
// next key.
if ( ! isObject( target ) ) {
continue;
}
// The isObject() function will return true for both components and Java
// objects as well. As such, we need to go one step further to see if we
// can get at the component metadata before we can truly determine if the
// target is a ColdFusion component.
try {
var targetMetadata = getComponentMetaData( target );
var targetName = targetMetadata.name;
} catch ( any error ) {
// An error indicates that either the metadata call failed; or, that
// the results didn't contain a "name" property. In either case, this
// isn't a value that we will know how to consume. Move onto the
// next key.
continue;
}
// If we've already inspected this target, move onto the next key.
if ( initialTargets.keyExists( targetName ) ) {
continue;
}
initialTargets[ targetName ] = target;
// Recursively explore the target component for nested components that
// may not have been accessible in the top-level collection of targets.
containersToExplore.append( getVariablesScope( target ) );
} // END: For-Loop (key in container).
} // END: While-Loop (containersToExplore).
return( initialTargets );
}
/**
* I return a snapshot of the given target's variables scope.
*
* @target I am the target being inspected.
*/
private struct function generateSnapshot( required any target ) {
var snapshot = getVariablesScope( target )
.filter(
( key, value ) => {
// Skip types that are unlikely to be the result of a leak.
if (
isNull( value ) ||
isObject( value ) ||
isCustomFunction( value )
) {
return( false );
}
// Some of the native ColdFusion tags appear to dump debugging
// information into the variables scope. Ignore these.
if (
( key == "cfquery" ) ||
( key == "cflock" )
) {
return( false );
}
return( true );
}
)
.map(
( key, value ) => {
return( getValueIdentifier( value ) );
}
)
;
return( snapshot );
}
/**
* I get the value identifier for the given value. The result will be a "simple" value
* that can be safely compared across snapshots.
*
* The current approach is based on the Java HashCode, which provides some level of
* insight into the memory usage without putting too much burden on performance. This
* is not a prefect approach; but, as a first pass, it should be OK.
*
* @value I am the value being identified.
*/
private string function getValueIdentifier( required any value ) {
try {
var valueIdentifier = value.hashCode();
} catch ( any error ) {
var valueIdentifier = 0;
}
// NOTE: A binary value tests as an "Array"; but, doesn't support the Array API.
if ( isBinary( value ) ) {
valueIdentifier &= ( ":" & arrayLen( value ) );
} else if ( isStruct( value ) || isArray( value ) ) {
valueIdentifier &= ( ":" & value.len() );
}
return( valueIdentifier );
}
/**
* I return the variables scope for the given target.
*/
private struct function getVariablesScope( required any target ) {
// Inspect the spy method so that we will be able to pierce the private scope of
// the target and observe the internal state. It doesn't matter if we inject this
// multiple times, we're the only consumers.
target.getVariablesScope__scope_spy = variables.getVariablesScope__scope_spy;
return( target.getVariablesScope__scope_spy() );
}
/**
* I return the VARIABLES scope in the current execution context.
*/
private any function getVariablesScope__scope_spy() {
// CAUTION: This method has been INJECTED INTO A TARGETED COMPONENT and is being
// executed in the context of that targeted component.
return( variables );
}
}
As you can see from the findMemoryLeaks()
method, the basic idea here is that, for each old snapshot, we generate a new snapshot and then calculate a delta between the two values. This delta is then returned to the calling context.
In Brad's original code, he was calling .toString()
on each value in order to snapshot it. I tried using that at first. But, it seemed to be quite slow, even in my local dev environment. I was also afraid that .toString()
might increase the danger of reaching into a component's private memory-space and touching non-thread-safe values. Perhaps this fear was unfounded? In any case, I ended up using a combination of .hashCode()
and object "lengths". My approach will miss deeply-nested changes; but, in terms of memory leaks, I don't think that it will matter that much.
I really like this approach. It only shows you what has changed in your Application state, so it makes it much easier to narrow-in on the potential leaks. Brad's original code works for ColdBox; mine works for manually-configured IoC; if you want to use an approach like this in your own application, you'll likely have to take the concept and tailor it for your own context. But, I hope that seeing this CFML code in Lucee 5.3.3.62 at least helps make that more of a possibility.
And, obviously, you should listen to the Modernize or Die podcast so that you can pick up cool tips and tricks like this!
CAUTION: Touching Memory You're Not Supposed To
This memory leak detector works by reaching into the private memory space of ColdFusion components. This is inherently a dangerous activity, especially since you are touching code that may or may not be thread-safe. As such, the very act of inspecting it (such as looking at an Array's length), could - theoretically - deadlock the code. As such, this approach should not be taken lightly. Be judicious in how you use it.
Want to use code from this post? Check out the license.
Reader Comments
Very interesting! Going to see if I can get it working with our new web app to spot any problems before they get too big and I sob uncontrollably into my coffee. :-)
@Michael,
Very cool! I'm excited to see what your experience is. So far, I identified one small leak in a production application. Unfortunately, it wasn't what is causing my issues (which is still a bit of a mystery).
@All,
As I was writing this code, I ran into an interesting edge-case in my Reflection-style code. When serializing the value, notice that I am using a special check for
isBinary()
. This is because the binary value will pass theisArray()
check; but, won't provide CFML Array member methods:www.bennadel.com/blog/3753-passing-isarray-decision-function-does-not-ensure-member-methods-in-lucee-cfml-5-3-3-62.htm
The issue comes down to one of Trust. By using member methods, I am trusting that the values I am working with are all native CFML data-types. But, in reflection-style code, I can't afford such trust. Really, I should have used the more generic functions like
arrayLen()
rather than.len()
when testing untrusted values.Ben. This could be very useful for me.
I have created a custom DI, which adds about 100 service components into the application scope. It uses an XML config file, like ColdSpring.
I think your utility should work with my system, because, if I remember correctly, component methods are set in the variables scope? Or maybe it is the this scope? I will try it out anyway.
@Charles,
Public methods live in the
this
scope, Private methods live in thevariables
scope. At least, the last time I checked. My mental model on some of this stuff is really old and is based on Adobe ColdFusion, not necessarily Lucee CFML.Ouch. Yes. The this scope makes sense for public methods.
Love it!
Just found a memory leak in my newly-threaded code, so might implement this.
@Harry,
Very cool - well, not cool re: memory leak - but, very cool that this might be able to help. Also, one thing to mention is that every app is different so setting something like this up is easier in some context and not in others. One of the things I will do these days if I am building something new is I will build a one-off Controller that looks at a very specific component or set of components.
For example, in my local FW/1 (Framework-One) app, I created a subsystem that starts with a
.gitignore
so that it doesn't get committed to the repository. And then, I just inject a snooper method and make sure nothing starts showing up:Now, as I am developing
myService
, I can hit this Controller and dump out thevariables
scope. And, when I start to develop another service, I just swap outmyService
for whatever I am working on.It's not as elegant as having a whole variable-diffing algorithm; but, sometimes, less elegant is exactly what you need ;)