Creating A Simple ColdFusion Cache With Java Soft-References
After attending CFUNITED 2010 I was very inspired to make use of more caching. Caching just seems to be the hot topic these days. Terracotta's EHCache is rocking out under the ColdFusion 9 installation; Railo is building all kinds of multi-cache and cluster cache support into their future versions. Caching is just so hot right now! Unfortunately, I haven't been jumping on the band wagon as quickly as would I like. And, with over 1000 "request timeout" blog errors in my inbox each morning, the time to start experimenting is certainly now.
My VPS still runs on ColdFusion 8; so unfortunately, I am not able to take advantage of all the excellent caching functionality built into ColdFusion 9. Building your own ColdFusion caching mechanism is simple enough; but, with RAM in limited supply - I only have 1GB on my VPS - I am nervous that I will be caching beyond my machine's ability. As such, I decided to take my previous caching demonstration and mix in some Java SoftReference usage such that I can cache as much as I want and the JVM will simply delete items as it becomes necessary (at least theoretically).
In Java, a SoftReference object (java.lang.ref.SoftReference) allows us to reference information in our application in such a way that the references will not prevent the information from being garbage collected. According to the Java Docs:
All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.
As such, I should be able to throw as much as I want in my SoftReference cache without having to worry about running out of RAM.
The SoftCache.cfc ColdFusion component that I came up with only has three methods:
- CacheData( key, data [, cacheUntil ] )
- DeleteData( key )
- GetData( key )
At first, I was also including a CacheExists(key) method; however, I realized that such a method would create potential race conditions between the cacheExists() method call and the subsequent GetData() method call. As such, I switched over to simply calling getData() first and then checking to see if that value returned was null (indicating that the value needed to be re-cached). This approach bypasses the need for any locking since the worst-case scenario is that I might unnecessarily override the given cache key from time to time (clearly not a critical failure).
Here is the SoftCaceh.cfc ColdFusion component that I ended up with:
SoftCache.cfc
<cfcomponent
output="false"
hint="I handle soft reference caching using Java's java.lang.ref.SoftReference class.">
<cffunction
name="init"
access="public"
returntype="any"
output="false"
hint="I initialize the component.">
<!---
Create our internal cache. Cache entries will be stored
in this struct by key-index.
--->
<cfset variables.cache = {} />
<!--- Return this object reference. --->
<cfreturn this />
</cffunction>
<cffunction
name="cacheData"
access="public"
returntype="any"
output="false"
hint="I put the given item in the cache for the optional amount of time.">
<!--- Define arguments. --->
<cfargument
name="key"
type="string"
required="true"
hint="I am the data key used to index this cache entry."
/>
<cfargument
name="data"
type="any"
required="true"
hint="I am the data item being cached."
/>
<cfargument
name="cacheUntil"
type="string"
required="false"
default=""
hint="I am the optional timespan for caching."
/>
<!--- Define the local scope. --->
<cfset var local = {} />
<!--- Create the cache item. --->
<cfset local.cacheItem = {
data = arguments.data,
cacheUntil = arguments.cacheUntil
} />
<!---
Create a soft reference cache entry such that the
ColdFusion garbage collection can clear this pointer
if it is necessary to free up some RAM space.
--->
<cfset local.cacheEntry = createObject(
"java",
"java.lang.ref.SoftReference"
).init( local.cacheItem )
/>
<!--- Store the entry in our internal cache. --->
<cfset variables.cache[ arguments.key ] = local.cacheEntry />
<!--- Return this object reference for chaining. --->
<cfreturn this />
</cffunction>
<cffunction
name="deleteData"
access="public"
returntype="any"
output="false"
hint="I delete the cache entry at the given key.">
<!--- Define arguments. --->
<cfargument
name="key"
type="string"
required="true"
hint="I am the key of the item being checked."
/>
<!---
Delete the key. It doesn't much matter if it exists at
this point as structDelete() won't throw an error for
non-existing keys.
--->
<cfset structDelete( variables.cache, arguments.key ) />
<!--- Return this object reference for chaining. --->
<cfreturn this />
</cffunction>
<cffunction
name="getData"
access="public"
returntype="any"
output="false"
hint="I return the cached data (or null if the given cache item doesn't exist).">
<!--- Define arguments. --->
<cfargument
name="key"
type="string"
required="true"
hint="I am the key of the target cache entry."
/>
<!--- Define the local scope. --->
<cfset var local = {} />
<!---
Check to see if the cached item even exists in our
local cache.
--->
<cfif !structKeyExists( variables.cache, arguments.key )>
<!--- The cache entry could not be found. --->
<cfreturn />
</cfif>
<!---
If we have gotten this far, the cache entry exists.
However, it is possible that it doesn't truly exist
(the soft reference may have been garbage collected).
Get the cache item into the local scope.
NOTE: Wrap this in a Try/Catch since there is a slight
race condition between the previous key check and this
key reference.
--->
<cftry>
<!--- Get the cache item from the cache entry. --->
<cfset local.cacheItem = variables.cache[ arguments.key ].get() />
<!--- Catch any errors. --->
<cfcatch>
<!---
The cache item was expired between the key
check and the get() method call. Return null.
--->
<cfreturn />
</cfcatch>
</cftry>
<!---
Check to see if the cache item was garbage collected and
has also not expired (based on the cacheUntil date).
If it was then the previous get() call will have deleted
the given local variable reference.
--->
<cfif (
structKeyExists( local, "cacheItem" ) &&
(
!isNumericDate( local.cacheItem.cacheUntil ) ||
(local.cacheItem.cacheUntil gte now())
))>
<!--- Return the cached data. --->
<cfreturn local.cacheItem.data />
<cfelse>
<!---
The cache item was garbage collected or the
cacheUntil property has been surpassed. In
either case, let's clear out the soft reference
from our cache.
--->
<cfset structDelete( variables.cache, arguments.key ) />
<!--- Return null. --->
<cfreturn />
</cfif>
</cffunction>
</cfcomponent>
As you can see, the code here is mostly straightforward; rather than caching our data directly, we are wrapping it inside of a Java SoftReference object that will enable the JVM to garbage collect the values if necessary. This proxy requires us to jump through a few more hoops; but in the end, it should alleviate our over-caching fears.
To see this SoftReference cache in action, I set up a simple test script:
<!--- Get a reference to the cacher. --->
<cfset cache = application.cache />
<!--- Check to see if we should clear this cached item. --->
<cfif structKeyExists( url, "clear" )>
<!--- Delete the cached data. --->
<cfset cache.deleteData( "date" ) />
</cfif>
<!--- Get the cached date. --->
<cfset cachedDate = cache.getData( "date" ) />
<!---
Since the cached date might not be cached yet OR may have expired
OR may have been garbage collected, let's check to see if the
previous getDate() method call returned null (removing the local
variable reference).
--->
<cfif !structKeyExists( variables, "cachedDate" )>
<!--- The date needs to be re-cached. Create the raw value. --->
<cfset cachedDate = now() />
<!--- Cache the date value. --->
<cfset cache.cacheData(
key = "date",
data = cachedDate,
cacheUntil = dateAdd( "s", 10, now() )
) />
</cfif>
<cfoutput>
<p>
Now: #timeFormat( now(), "hh:mm:ss TT" )#
</p>
<p>
Cached Now: #timeFormat( cachedDate, "hh:mm:ss TT" )#
</p>
<p>
<a href="#cgi.script_name#?clear=1">Clear Cache</a>
</p>
</cfoutput>
As you can see here, this simply caches a date/time object and outputs it to the screen. As I explained above, my general approach is to try to get the data value out of the cache before checking to see if it exists. In this way, there is no critical race condition that would be able to corrupt my data - it either returns or it doesn't. And, if it doesn't, I always define it externally to the cache before caching it.
Most of the information that I will be caching consists of ColdFusion queries; as such, the question might arise: why not just use CFQuery's cachedWithin attribute? I chose to go with variable-based caching rather than query-based caching such that I could invalidate one or more cached items based on user actions. For example, I have a cached "comments" query for each blog post. This query needs to be "invalidated" every time a user posts a new comment to the given blog entry. I am sure there are easy ways to invalidate a specific query cache; but, I don't know them off-hand. As such, I chose to go with a more explicit variable cache that provided me with straightforward creation and deletion methods.
After implementing this on my blog yesterday, my site definitely appears to be faster. However, I am still getting a large number of request timeout errors. The majority of these seem to take place between 4am and 5am. It's generally about 800-1000 emails. Of course, this morning, it was only about 400; as such, I have to assume that something that I am doing is helping the situation a bit. Now, if only I can get it down to zero emails!
Want to use code from this post? Check out the license.
Reader Comments
Fantastic - yet another example of how CF's Java foundation can be levered to provide more power to the application layer!
In terms of time out issues could it be that your thread pool is being exhausted? IIRC the default for max simultaneous template requests is set to 50 which isn't very high and may lead to requests being queued and then timing out.
Also look at your memory usage during the times you get the errors. It may be that this also leads to requests being queued.
Final thought, I wouldn't worry too much about ram usage as the JVM should be able to handle a heck of a lot of traffic with a 512MB heap and aggressive garbage collection.
You could in theory roll in Ehcache for CF8 alongside your ColdFusion install and gain the benefit of overflow to disk on top of Soft Ref memory caching which, if I understand it correctly , is what Ehcache uses anyway.
@Rob,
I could try to look at the memory usage, but I'm typically asleep and dreaming of ColdFusion while the majority of request timeout issues occur. I suppose I could put in some sort of monitoring... but that is something I would have to learn about first.
I'll take a look at the thread max settings. I have no idea what they are right now.
This is really, really cool! Thanks!
Just curious as to what you think is causing the timeouts? Is it just normal traffic to the site that is causing a bottleneck? It seems odd that peak traffic would be at 4-5 AM. Could this be a bot or spider that would be worth blocking from the firewall or in CF code for the overall health of the site?
The caching is a great idea and I'm sure gives a performance boost, but I'm also curious as to what is actually causing the timeouts at specific times as I see these occasionally on one of my sites as well.
@Eric,
Thanks my man :)
@John,
Good thinking about the bot. That is what I assume it is, but perhaps it's something that I should actually try to confirm. Tomorrow morning, I'll try to see if the IP addresses are all the same (or user agents are all the same). If that's not it, I think there must be something my machine is doing at 4AM. Perhaps I should check my scheduled tasks to see if anything fires at 4am.
Sooooo, are you monitoring your site at all? What's going on between 4AM-5AM? Spammer traffic? Review the web logs?
@Todd,
I'm not really monitoring anything :) I wasn't sure what exactly I should be doing. My VPS ColdFusion install doesn't allow for the built-in monitoring. I am gonna try to look at the logs in the morning and the User Agents as well, see if I can determine anything.
I meant monitoring in general, not ColdFusion monitoring. Just wondering if there's a sudden surge in spiders / spammers at 4-5 AM in the morning. If you track down your web logs you should be able to find some free server tools out there that can crunch some numbers for you and show you the spike in traffic (sort of like webtrends, I'll have to ask around - I forget the name of the apache ones).
Other thing you can do is take a look at FusionReactor or SeeFusion, both of them have trials available I believe. I'm sure if the great Ben Nadel approached either company, they would love to have you review their product. ;P
@Todd,
Hopefully the error messages will reveal a bit more information tomorrow morning. I'll be honest - debugging this kind of issue feels very foreign to me.
Ben, I sympathise - these things always seem to happen when you're anywhere but at a desk in front of a computer!
In terms of logging, the jRockit JVM gives you the ability to use Mission Control (an Eclipse plugin) which has a full data recorder which can be sceduled to grab information straight from the JVM. Running this combined with your web server logs, ColdFusion logs and JVM logs should give you all the information you need. to start diagnosing the problem.
If you think it's a load issue then you can use something like Apache's JMeter (http://jakarta.apache.org/jmeter/) to try and replicate overloading the site.
There are other tools if you're on a *Nix platform.
Finally have you thorught about logging the start and end times of requests in your application.cfc along with specific bits of the CGI scope should give you a good feel for the pages that are giving you problems.
Hi Ben,
Quick and dirty way of monitoring memory would just be to pipe the output of the free command straight into a text file.
Cron that for every 5 mins for a day and you should be able to quickly see what is going on.
Also do you have any backup scripts running at that time in the morning?
Well, I got another 400 error messages between 4am and 5am. Looks like the caching didn't do anything. Also, "Trusted Cache" was turned off; but, I turned it on yesterday as well. No change.
There was a scheduled task to run at 4AM. But, I deleted that yesterday as well. No change.
Either my machine is doing something internally at 4AM or the machine is just being hammered. I'll try looking at the Event Logs perhaps.
May be cacheing to memory is not the answer.
You could argue that cacheing to memory is only worth it if you can't fit the db indexes into memory.
Have you tried cacheing to disk? ie save a copy of the whole page and chuck it out when someone requests it.
That'll remove your memory problems, and shouldn't really hurt performance much as you're taking a massive amount of processing out of the system for each request.
To solve the comments problem, just recreate the cache for that page when you receive a comment.
Also if you want stuff brought in, then just use jquery to fire off an ajax request when the page loads.
Mat
@Mat,
I've definitely been romancing the idea of file-based caching for a long time. One thing I can just never really get around is the fact that parts of the page change dynamically with each request (ex. header, job listing).
I have thought about simplifying my HTML significantly and then just loading other data elements via AJAX as you suggested. A this point, that just feels like a lot of work. I hate the timeouts... but I'm having a bit of a timeout myself (ok, bad joke).
Maybe this weekend, I'll try to look into this deeper.
Yeah, I can see how it will be a slightly more complicated changeover.
Also have you optimised sql? assuming you're using mySql there are loads of good articles about taming the memory usage for it, which might be a better way to go before thinking about the page cacheing.
Mat
@Mat,
I am currently using MySQL. I'll have a look at my indexing to see if I can put some better practices in place. Indexing is definitely something that I have self-taught over the last few years - plenty of room for improvement.
That said, since a bunch of stuff is being cached now, I would think that SQL issues wouldn't be the issue any more (even if they were before). So frustrating.
@Ben,
Yeah the main thing I was thinking about was the actual memory footprint of mySql - this can be tuned, although performance can take a hit.
Do you have any way of checking how many times your cache is filling up?
Basically there can come a point when things are being added to the cache quicker than they time out, so you get a lot of hits on the db even though the cache is running. There won't ever be a problem with your cache filling up with the soft refs you're using but at the same time it wont be protecting your db.
@Mat,
Unfortunately, the whole concept of footprint tuning (ie. changing JVM max/min heap size and the kinds of RAM different apps are allowed to take) is currently beyond my understanding. Server optimization, in general, is something I need to get better at.
As far as the cache timing out / filling up issue, the worse case scenario would be that it was so ineffective it was as if it was not there at all, essentially putting me back at ground zero. However, the blog entry information all caches indefinitely (or until the RAM forces garbage collection on the soft references). I don't currently track that.
Actually, what might be interesting would be to log how often a soft-reference actually becomes dereferenced? That would be a nifty thing to track.
@Ben,
As others have mentioned, ideally you just need to review logs and use monitoring tools as needed to identify exactly *what* the problem is, before attempting to fix it. I will point out a relatively simple observation though. You probably do not need to concern yourself with the JVM heap and permanent generation space memory allocations at this time. If you had major problems with these settings, you'd be getting Java OutOfMemory errors and likely need to restart your JRun service.
If there were issues with slow garbage collection or something, you'd likely be having problems on a more regular basis throughout the day (not just one chunk of time every 24 hours).
These do not appear to be the problems you're having. You're just seeing a lot of request timeouts. This means there are enough slow requests piling up to reach your maximum simultaneous request (CF Admin) setting, but those requests are eventually handled and more requests continue to be served (as opposed to crashing JRun entirely). You may very well be able to increase your max simultaneous requests without running into memory issues, but it's something you need to monitor. That said, the fact that this is not a problem throughout the day suggests a need to identify *what* is causing that request traffic jam at 4am.
I once saw an issue where real slow requests were always from similar IP addresses, and it turned out to be dial-up users :P It's unlikely that you have a load of dial-up users pounding your blog at 4am, but something is going on there--very likely bots of some sort. Good luck!
@Jamie,
You make some good points. I also haven't even looked at the log files yet. When I have free time, I tend to forget I even have this problem, until I see the load of error emails in my inbox :) This weekend should have some time to really dig into this a bit more.
> I am sure there are easy ways to invalidate a <br/>
> specific query cache; but, I don't know them <br/>
> off-hand.
<br/>
cachedwithin="#createTimeSpan(iif(structKeyExists(form, 'content'), DE('-7'), DE('7')),0,0,0)#"
@Nelle,
Ah, thanks! Because you couldn't always use the cachedWithin attribute in conjunction with CFQueryParam tags, I never really got into them. Thanks for the clarification.
Ben,
Did you ever get this figured out?
Randy
@Randy,
What are you referring to exactly?
Ben,
I was referring to the issue of: ". And, with over 1000 "request timeout" blog errors in my inbox each morning, the time to start experimenting is certainly now."
The comments on this thread were pretty good, and I was curious if you managed to figure out what the problem was. I know the problem was probably a bot, but I am curious as to what you needed to tweak in your code or what the problem code was etc.
Randy