Replacing Depth-First Recursion With A Breadth-First While-Loop In Lucee CFML 5.3.3.62
Recursion is a very powerful programming construct. And, I absolutely love recursion just as much as the next person (whether I'm dealing with recursive Promises in JavaScript, recursive templates in Angular, recursive components in Angular, or recursive algorithms in ColdFusion). But, recursion isn't the easiest concept to understand. It also requires are large "call stack" and isn't always fun to debug. As such, in certain circumstances, I find it helpful to replace recursion with a simple while
loop. This isn't always helpful; and it isn't always possible; but, I like it from time-to-time and I thought it might be worth sharing my approach in Lucee CFML 5.3.3.62.
Recursion, the way I understand it, is an algorithm that is defined, at least in part, in terms of itself. So, in a programming context, this is generally a Function whose result involves invoking itself. This self-referential invocation typically continues unfolding until an end-case is reached; at which point, all of the intermediary results are aggregated and returned.
To demonstrate this type of recursive programming, I tried to create a simple but real world example: taking a set of parent-child records (such as those that might be modeled in a relational database); and, using recursion to convert said collection of records into a Tree structure.
In the following example, a nodes
collection is passed into the generateTree()
function. This generateTree()
function then defines an internal closure, traverse(parentID)
, which performs a depth-first construction of the Tree structure using recursion. Which mean, the traverse()
closure calls itself in order to flesh-out the Tree:
<cfscript>
// Mock out some hierarchical data that uses a PARENT ID to store the relationship
// between values. The mock data is presented in linear form (akin to a set of
// relational database records).
nodes = [
{ id: "a1", parentID: "" },
{ id: "b1", parentID: "a1" },
{ id: "c1", parentID: "b1" },
{ id: "d1", parentID: "c1" },
{ id: "c2", parentID: "b1" },
{ id: "d2", parentID: "c2" },
{ id: "d3", parentID: "c2" },
{ id: "d4", parentID: "c2" },
{ id: "c3", parentID: "b1" },
{ id: "b2", parentID: "a1" },
{ id: "c4", parentID: "b2" },
{ id: "c5", parentID: "b2" },
{ id: "d5", parentID: "c5" },
{ id: "e1", parentID: "d5" },
{ id: "c6", parentID: "b2" },
{ id: "d6", parentID: "c6" },
{ id: "e2", parentID: "d6" },
{ id: "b3", parentID: "a1" },
{ id: "c7", parentID: "b3" }
];
echo( "<h2> Tree Generation Using Recursion </h2>" );
dump( generateTree( nodes ) );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a Tree structure from the given set of hierarchical nodes. The root node
* of the resultant tree structure is returned.
*
* @nodes I am the linear collection of hierarchical data.
*/
public struct function generateTree( required array nodes ) {
// Since the hierarchical data relationships are represented through Parent IDs,
// let's create an index that allows us to quickly locate nodes by "parentID".
var parentIndex = groupBy( nodes, "parentID" );
// RECURSIVE FUNCTION / CLOSURE: I generate and return the collection of Tree
// Nodes that are children of the given parent ID.
var traverse = ( parentID ) => {
// Let's output the ID so we can see the path of traversal.
echo( "#parentID# " );
var childTreeNodes = [];
if ( parentIndex.keyExists( parentID ) ) {
// For each node associated with the given parent ID, we are going to
// create a new Tree Node and then RECURSIVELY generate the children for
// that tree node. This will perform a DEPTH-FIRST recursive traversal of
// the original node data.
for ( var node in parentIndex[ parentID ] ) {
childTreeNodes.append([
id: node.id,
children: traverse( node.id ) // <== RECURSION.
]);
}
}
return( childTreeNodes );
};
// Start the recursion with the "null parent ID". This will return the collection
// of child nodes for the given ID. Since we know that there will only be one
// root node, we can just return the first item.
return( traverse( "" ).first() );
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a look-up structure that is indexed by the given key. Each value is an
* aggregate of all values associated with the given key.
*
* @values I am the collection being indexed.
* @key I am the group-by property for each value.
*/
public struct function groupBy(
required array values,
required string key
) {
var index = {};
for ( var value in values ) {
var indexKey = value[ key ];
// NOTE: Using the Elvis Operator (null coalescing operator) to either get
// the current group; or, to create, assign, and store a new group. Then,
// appending the current value to the resultant group.
( index[ indexKey ] ?: index[ indexKey ] = [] )
.append( value )
;
}
return( index );
}
</cfscript>
As you can see, the traverse()
function constructs each node's children
property by storing the results of a recursive call to the traverse()
function. And, when we execute this ColdFusion code in Lucee CFML, we get the following output:
Notice that the IDs being output along the top indicate that this algorithm is performing a depth-first exploration of the hierarchical data.
Now, everyone is going to feel differently about this; but, for me, even with such simple code, I still have to take a pause and prepare my brain to start thinking recursively. And, just as this code creates an increasingly large call stack when it traverses down through the nodes, my brain has start drawing a recursive picture in my head so that I can understand how we arrive at the final result.
And this code has almost no logic in it! As the algorithm of each invocation becomes more complicated (such as in a more realistic scenario), it becomes harder for me to maintain a solid grasp of what exactly is going on. Such is the mental cost of recursion.
Sometimes, in order to make things a little easier for me to understand and maintain, I'll take a recursive algorithm like the one above and convert it into while loop. This approach is less elegant. But, I find this helpful as the logic around each iteration becomes more complicated; and, especially if there are a special cases around data-handling, such as when dealing with relationships that contain "circular references" that need to be handled without causing infinite recursion.
To do this, I'll replace the implicit call stack created through recursive invocation with an explicit FIFO (First In, First Out) queue that is both populated and consumed by the while
loop iteration. This creates an algorithm that is more verbose but, at least in some ways, fundamentally more simple.
The idea is to push the root record into the queue. Then, continue looping and shifting items off the front of the queue while the queue is non-empty. For each item shifted off the queue, find the children that it owns, and then push those children onto the back of the queue for exploration in a subsequent iteration of the loop.
Again, its a good deal more verbose; but, at least for me, it can create a algorithm that is a bit easier to visualize and debug:
<cfscript>
// Mock out some hierarchical data that uses a PARENT ID to store the relationship
// between values. The mock data is presented in linear form (akin to a set of
// relational database records).
nodes = [
{ id: "a1", parentID: "" },
{ id: "b1", parentID: "a1" },
{ id: "c1", parentID: "b1" },
{ id: "d1", parentID: "c1" },
{ id: "c2", parentID: "b1" },
{ id: "d2", parentID: "c2" },
{ id: "d3", parentID: "c2" },
{ id: "d4", parentID: "c2" },
{ id: "c3", parentID: "b1" },
{ id: "b2", parentID: "a1" },
{ id: "c4", parentID: "b2" },
{ id: "c5", parentID: "b2" },
{ id: "d5", parentID: "c5" },
{ id: "e1", parentID: "d5" },
{ id: "c6", parentID: "b2" },
{ id: "d6", parentID: "c6" },
{ id: "e2", parentID: "d6" },
{ id: "b3", parentID: "a1" },
{ id: "c7", parentID: "b3" }
];
echo( "<h2> Tree Generation Using A Loop </h2>" );
dump( generateTree( nodes ) );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a Tree structure from the given set of hierarchical nodes. The root node
* of the resultant tree structure is returned.
*
* @nodes I am the linear collection of hierarchical data.
*/
public struct function generateTree( required array nodes ) {
// Since the hierarchical data relationships are represented through Parent IDs,
// let's create an index that allows us to quickly locate nodes by "parentID".
var parentIndex = groupBy( nodes, "parentID" );
// Since we know that there is only single node with a "null parent ID", we can
// explicitly construct the root Tree Node (which is the structure that will be
// returned from this function).
var rootTreeNode = [
id: parentIndex[ "" ].first().id,
children: []
];
// Instead of using RECURSION, we're going to use a QUEUE of tree nodes to
// explore. As new Tree Nodes get created, they will be queued-up for further
// exploration using a BREADTH-FIRST traversal.
var nodesToExplore = [ rootTreeNode ];
// Continue shifting Tree Nodes off of the exploration queue until we've
// completely traversed the entire set of data.
while ( nodesToExplore.isDefined( 1 ) ) {
// SHIFT the first item off of the exploration queue.
var treeNode = nodesToExplore.first();
nodesToExplore.deleteAt( 1 );
// Let's output the ID so we can see the path of traversal.
echo( "#treeNode.id# " );
// If there are no child nodes associated with this parent ID, move onto the
// next tree node.
if ( ! parentIndex.keyExists( treeNode.id ) ) {
continue;
}
// For each node associated with the given parent ID, we are going to create
// a new Tree Node and then push it onto the exploration queue. This new Tree
// Node will then be populated in a subsequent BREADTH-FIRST iteration of
// this while-loop.
for ( var childNode in parentIndex[ treeNode.id ] ) {
treeNode.children.append([
id: childNode.id,
children: []
]);
// Queue-up new tree node for subsequent traversal.
nodesToExplore.append( treeNode.children.last() );
}
}
return( rootTreeNode );
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a look-up structure that is indexed by the given key. Each value is an
* aggregate of all values associated with the given key.
*
* @values I am the collection being indexed.
* @key I am the group-by property for each value.
*/
public struct function groupBy(
required array values,
required string key
) {
var index = {};
for ( var value in values ) {
var indexKey = value[ key ];
// NOTE: Using the Elvis Operator (null coalescing operator) to either get
// the current group; or, to create, assign, and store a new group. Then,
// appending the current value to the resultant group.
( index[ indexKey ] ?: index[ indexKey ] = [] )
.append( value )
;
}
return( index );
}
</cfscript>
Now, if we run this ColdFusion code through Lucee CFML, we get the following output:
As you can see, the resultant Tree structure is exactly the same. But, if you look at the IDs that were echo()
'ed along the top, you'll see that this algorithm performed a breadth-first traversal of the tree (unlike the recursive algorithm which performed a depth-first traversal). This is neither a good thing nor bad thing - I'm simply pointing out the difference in implementation details.
Now you might look at this code and think that I'm insane to argue that this is in any way simpler than the recursive algorithm. But, consider the fact that each value defined within this version can be understood on its own. Once you see how the code is working, you don't need to keep a complex mental model in your head - you just think about the data structure and how it is being populated across the loop iterations.
I also find that debugging this approach is easier because, at any moment, you can dump out the current values along with the FIFO queue and see the entire state of everything. Debugging a recursive state is harder because the "state" of the data is spread across an increasingly-deep call stack.
I believe there are also memory implications to this switch as well. When using a recursive algorithm, all of the data in the intermediary call stack items needs to remain allocated until the call stack can start to be "unwound" once the recursion hits end-points and begins to start gathering results. Conversely, in the while
loop algorithm, intermediary data only needs to remain allocated during a single iteration. Once the iteration moves onto the next queued item, the previous structures (if any) can be deallocated.
CAUTION: The above is just my theory - I don't actually know how memory gets consumed and freed-up. What I can say is that while I have run into "Too Much Recursion" errors, I never run into "Too Much Looping" errors when switching to this alternate approach.
Now, I'm not trying to argue that all recursive calls should be replaced. As I said in the intro, I love recursion! And, I find that it makes some very complicated data manipulation much easier. But, there are cases where my brain isn't strong enough to handle the intensity of the recursive model. And, in those cases, I find that converting to a linear while
loop is beneficial. Your mileage may vary.
Want to use code from this post? Check out the license.
Reader Comments
@All,
After this post, fellow InVisioneer, Kevin Conway, told me that I could use either depth-first or breadth-first strategies by simply changing the way in which I add new values to the internal queue. This wasn't obvious to me, so I wanted to dig into that concept a bit further:
www.bennadel.com/blog/3751-depth-first-vs-breadth-first-tree-traversal-using-a-while-loop-in-lucee-cfml-5-3-3-62.htm
In this follow-up post, I use XML Tree traversal as the context.
I can't tell you how clever the groupBy function is? I dare anyone to say that they can understand the following part, instantly:
It took me quite a bit of debugging with TryCF.com, to work this out!
This is an example of why I find your tutorials addictive.
I can say that it would have taken me several more lines of code to write the groupBy function.
It is great to see how really succinct code is written. I will try and incorporate these kinds of techniques into my code, in future.
I am also loving Lucee's modern CFML:
Rather than:
And
Rather than:
The only difficulty I have with Lucee's version of CFML, is that I like to make my applications cross compatible with both ACF & Lucee.
So, I always end up writing my code the long winded ACF way.
Things like:
Are not part of ACF!
@Charles,
Yeah, I go back and forth on how much I care about the compatibility of the script. Since my day job is writing an application in Lucee, that's where I focus most of my energy these days. I assume most of the differences can be more-or-less easily made compatible; though, Lucee does have one or two things, like the
compress()
method for Zipping directories that I think would take a bit more elbow-grease to make work in Adobe ColdFusion.I'm glad you are liking these little hidden fun parts of my code. I do take pause when adding funky little short-hands; but, since the main point of the post wasn't the
groupBy()
method, then, I figure its just a little fun aside that shouldn't distract to much from the overall point.Happy to hear this is fun to read :D
Ben.There was one question I wanted to ask you about recursion.
Is it more efficient to pass an accumulated value, like:
Or, use a closure to cache the value, like:
@Charles,
Hmmm. I would assume that the closure is tiny bit better since you don't have to include an INT in each item on the call-stack in the recursion. But, that's just a complete guess. Ultimately, the Function calls themselves will be the limiting factor on performance as I assume they are the most expensive part of all of this - more so than allocating an INT.
Also, it's probably healthy to step-back and remember not to over-optimize things. I got started on this little rabbit-hole because I was actually running into strange performance issues with recursion over massive values. As such, it was prudent to replace those. But, there's nothing inherently wrong with recursion. If it works, it works.
@All,
On a related note, I just published an article about using a Closure to separate the recursive logic from the consumption logic:
www.bennadel.com/blog/3755-using-a-closure-to-encapsulate-depth-first-tree-traversal-in-lucee-cfml-5-3-3-62.htm
In my earlier examples, the recursion and the consumption were all tightly commingled / coupled together. But, by passing-in a Closure, we can create a reusable algorithm with very little performance overhead.