Considering A Stale-While-Revalidate Pattern To Caching In ColdFusion
In a recent episode of Syntax.fm, Scott and Wes discussed HTTP caching headers. From their discussion, I learned that there is am experimental value called, stale-while-revalidate
. And, while this post isn't about HTTP caching, their discussion got me thinking about different ways in which I might manage a server-side cache of data that needs to be kept in-sync with a remote data source. I wanted to do some noodling of what a "stale while revalidate" workflow might look like in ColdFusion.
The concept, "stale while revalidate", usually means that your request to read a value is served immediately from a cache. Then, a background non-blocking request is spawned to fetch the latest value from the "source of truth". This allows subsequent requests for the value to receive the most up-to-date version. This allows all requests for the value to be extremely fast since they are all fulfilled using locally-cached data.
This Is An Optimization
To be clear, this is an optimization technique. Instead of reading from a cached value, we could always fetch data from the "source of truth" when fulfilling the request. In fact, trying to read from a cached value adds a good bit of complexity. As such, I would likely only attempt to use a stale-while-revalidate pattern if I was either already seeing latency issues; or, if I expecting to have a high-throughput Read scenario and I wanted to keep the emotional cost of performing that Read low for the developers.
Cached Data May Necessitate Synchronization
If I'm going to be reading from a cache, it almost certainly means that I'm going to be sharing that cache across multiple, concurrent requests. As such, there is likely going to be some need for synchronized access to the cached data. There's no "one size fits all" here since the type and shape of the cached data will drive the synchronization requirements.
For example, Struct access in ColdFusion is already synchronized - meaning, it is thread-safe by default. Here's a quote from the Lucee CFML doc:
Note, the type "synchronized" is no longer supported and will be ignored; all struct/scopes are "thread safe" since version 4.1.
Array are, apparently, also synchronized by default; though, I've definitely hit an Array iteration deadlock in earlier versions of Lucee CFML. And, fun fact: I just learned while writing this post that arrayNew()
allows you to pass in an argument for creating non-synchronized arrays.
That said, data-structure access isn't the only potential reason for synchronization; but, getting into all the reasons you might want to lock access is beyond the scope of this post.
runAsync()
For Our Background Revalidate Operation
Using For this exploration, I'm going to use the runAsync()
function to implement the background cache refresh. I honestly have next-to-no experience with runAsync()
- the last time that I used it was in 2019. Normally, I just use the CFThread
tag; or, run parallel iterations over Arrays.
That said, the ergonomics of runAsync()
feel a bit lighter; though, I don't actually know if there's much of a technical difference between the task runner for this function and the task runner for the CFThread
tag. In this exploration, I chose runAsync()
specifically because it returns a value (a Future
) that I can store. This makes it easier to know if I've already requested a background refresh.
ASIDE: The
CFThread
tag stores a reference to the spawned thread in thecfthread
scope. As such, I could have implemented this experiment with aCFThread
tag; but, it's just easier to use the returned value fromrunAsync()
.
To keep things simple, my cached value is going to be a Boolean. And, our background cache revalidation will simply flip the Boolean value. I'm going to scope the revalidation to the current ColdFusion Request. Meaning, only the first read of the value in a given ColdFusion Request will spawn the background check - subsequent reads of the value in the same request will just read from the cache and will trigger no other actions.
Here's my demo ColdFusion component, Flag.cfc
. It only exposes a single method - getValue()
- which returns the currently-cached value and then spawns the background cache revalidation:
component
output = false
hint = "I provide access to a flag value which is refreshed asynchronously."
{
/**
* I initialize the flag value.
*/
public void function init() {
variables.currentValue = false;
}
// ---
// PUBLIC METHODS.
// ---
/**
* I get the CURRENT flag value, which may be STALE. Accessing the flag value MAY
* trigger a BACKGROUND FETCH to re-cache the latest value.
*/
public boolean function getValue() {
revalidateInBackground();
// NOTE: For the sake of this demo, we're going to consider this value to be a
// thread-safe value. As such, we're not going to apply any locking around its
// access despite it being cached in-memory and shared across requests.
return( currentValue );
}
// ---
// PRIVATE METHODS.
// ---
/**
* I spawn an asynchronous thread to revalidate the flag in the background.
*/
private void function revalidateInBackground() {
var futureKey = "$$flagAsyncFuture";
// Only run the async validation once per request (an optimization).
// --
// NOTE ON LOCKING: Normally, when I only want to do something once, I would add a
// double-check lock around it. However, in this case, since the scope of the
// FETCH contention is a single request, I'm not going to worry about it. More
// than likely, there will be no race-condition (again, this is scoped to the
// request); and, even if there is contention in this case, the worst-case
// scenario is that we run the asynchronous check more than once, which is fine.
if ( request.keyExists( futureKey ) ) {
return;
}
try {
// CAUTION: We do not want to use the `.error()` method on the runAsync()
// result because doing so will turn it from an ASYNCHRONOUS call into a
// BLOCKING / SYNCHRONOUS call. Instead, we're going to use a try/catch block
// inside the closure so that we can retain the asynchronicity while still
// catching errors internally.
request[ futureKey ] = runAsync(
() => {
try {
systemOutput( "Revalidating in runAsync() closure.", true );
// !! Simulating some sort of background flag change. !!
variables.currentValue = ! variables.currentValue;
} catch ( any error ) {
systemOutput( "ERROR REVALIDATING FLAG!!!!" );
}
}
);
// Catch any errors when spawning the thread that powers the runAsync() function.
// --
// CAUTION: I AM NOT SURE that this is strictly necessary. I don't know how the
// thread-pool exhaustion will manifest in the calling context. That said, I
// usually wrap my thread-spawning code in a try/catch block since I know that the
// CFThread tag will sometimes throw an error when no thread can be spawned.
} catch ( any error ) {
// Swallow thread-pool exhaustion errors for now...
systemOutput( "ERROR: runAsync() could not obtain thread.", true, true );
}
}
}
As you can see, in my revalidateInBackground()
method, I'm checking to see if the Future
has already been stored in the request
scope; and, if so, I short-circuit the method. This should ensure that - in vast majority of cases - only a single background refresh will be performed in a given ColdFusion request.
The error handling here is a bit of blind-spot for me. I know that I can't add the .error()
handler since that will turn the runAsync()
workflow into a blocking request, which is exactly what we don't want. I've attempted to cover my bases by wrapping both the thread-spawning code and the internal operator code in a try/catch
.
That said, assuming that I've cached this ColdFusion component in the application
scope, let's look at the console logging when we make several calls to the getValue()
method within a single request:
<cfscript>
systemOutput( "+ + + + + + + + + + + + + + + ", true );
systemOutput( "Starting Demo Request.", true );
systemOutput( "Checking flag...", true );
systemOutput( "--> Flag: [ #application.flag.getValue()# ]", true );
systemOutput( "Checking flag...", true );
systemOutput( "--> Flag: [ #application.flag.getValue()# ]", true );
systemOutput( "Checking flag...", true );
systemOutput( "--> Flag: [ #application.flag.getValue()# ]", true );
systemOutput( "Ending request.", true );
systemOutput( "- - - - - - - - - - - - - - - ", true );
</cfscript>
Because we are caching the runAsync()
Future
in the request
scope, only one of the preceding 3 calls should spawn a background cache revalidation. And, in fact, when we run this ColdFusion code, we get the following logs:
As you can see, for each ColdFusion request, we only see one log from within the runAsync()
callback, despite the fact that we're making 3 calls to the getValue()
method in each request. We can also see the the value of the flag is being flipped on each request, but not between getValue()
calls. This is because it's being served from the local cache for the duration of the request (mostly).
CAUTION: Even though it is working out this way in my demo, the
runAsync()
doesn't necessarily run at the end of the current request. It just doesn't block. If other aspects of your request block (such as a database query), there's a distinct possibility that yourrunAsync()
callback will be invoked before the end of the parent request. As such, there's a possibility that the state of the cached value will be "revalidated" within the bounds of a single request.
Again, to be clear, I've never done this in a production scenario before. I'm just starting to noodle on what this would look like in a ColdFusion application. It's a bit challenging to think through the details without a real-world use-case. But, sometimes you gotta start with the building blocks before you can assemble the larger concepts.
Epilogue: Keeping Cached Values In-Sync
This is fundamentally a cache synchronization issue. And, there are many ways to keep a cache up-to-date. For example, we could use a ColdFusion scheduled task to periodically read from the "source of truth" and then update the local cache (simple, but slower). Or, we could use some sort of Publish and Subscribe (Pub/Sub) mechanism to be alerted to the changed values (complex, but fast).
What I like about the "stale while revalidate" approach is that it is relatively simple. In fact, it's really just one step beyond not caching at all. In fact, we could revert the workflow to not use caching and the change would be completely transparent to the calling context. As such, it seems to strike a nice balance between performance, complexity, and the latency of eventual consistency.
Want to use code from this post? Check out the license.
Reader Comments
Interesting, Ben. Thanks for sharing. I'd heard the episode yesterday and the discussion of that concept caught my attention as well. :-)
But where you've gone pretty low-level and are caching at the component level--indeed at the variable level--I had heard it and wondered how one might take advantage of the idea at the page content (or partial content) level. For either, one might consider cfml caching features.
But for page-level content caching, one might consider (or already be) fronting their cf or web server with a caching solution like squid or varnish (or even nginx).
And so I'd wondered whether and how these other caching options might support this stale-while-revalidate notion, or could be made to. But I'd not gotten beyond that "wondering". :-) So I thought I'd chime in here with this idea if anyone else might either already have more to share or might dig into it themselves.
Would seem a good extension of what you started here, though I suppose the discussion could grow enough to be worth it's own post. Then again, maybe no one else will have a word to say, pro or con. :-)
@Charlie,
So, from a "page level" caching perspective, it was actually the HTTP
Content-Control
header that got me thinking about this in the first place. There is astale-while-validate
header value that will allow the browser (as I understand it) to do this naturally. ie, to use the locally-cached version and then check for an update in the background. Of course, that's on a per-user basis, not a per-page basis since it's controlled by the client.That said, if the request was going through something like CloudFlare, I wonder if the CDN would also follow the caching headers for how it handles point-of-presence caching?? Not sure.
I'm pretty sure that back-in-the-day, Ray Camden had a ColdFusion custom tag that would cache the
this.generatedContent
output of the custom tag. So, that would do a sort of "run once" of custom tag and then use the cache going forward. I wonder if there's a way to re-run the custom tag "body" in the background?Anyway, just lots of rando thoughts going through my head now :D
@All,
After this post, I started to think about shared access of variables in native ColdFusion data types; and, I started to wonder if my mental model for iteration over shared data structures is out of date. As such, I wanted to do some experimentation in ACF + Lucee CFML:
www.bennadel.com/blog/4289-updating-my-mental-model-for-shared-array-struct-iteration-in-coldfusion.htm
From what I can see, read-only iteration over Arrays and Structs is thread safe (at least by default - you can ask ColdFusion to create non-thread-safe arrays).
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →