Storing Your ColdFusion Scheduled Tasks In The Database
As I've talked about before, I am not the biggest fan of ColdFusion scheduled tasks. Or rather, I love the idea of ColdFusion scheduled tasks; but, I don't like managing them in the ColdFusion Administrator. Furthermore, some tasks require a more complex "interval" than the options provided in the task manager. As such, I've gotten into the habit of moving my scheduled tasks from the ColdFusion administrator into the database. Typically, I have a single scheduled tasks that runs about every 5 minutes or so and uses the database to actually determine which subsequent scheduled tasks to execute.
To explore this concept, let me start with the database table that I use to manage the scheduled tasks in a given project (each project would get its own database table). In this demo, the database table will be called, "task."
id - Primary key of the table.
name - The name of the task (for explanatory usage).
description - The full description of the task (for explanatory usage).
template - The ColdFusion template that contains the actual algorithm for this particular task.
interval - The number of days between task execution (such as that created with createTimeSpan()). This value can be overridden by the task algorithm if a non-standard number of days needs to be used (such as the "1st of every month").
metaData - JSON (JavaScript Object Notation) data that can be persisted between task executions.
errorLog - A JSON-formatted version of the CFCatch object that bubbled up from the task template.
dateOfLastExecution - The date of the last execution of this task.
dateOfNextExecution - The date of the next scheduled execution of this task.
dateStarted - The date the currently-execution task started. During task execution, this is a date/time object. Between executions, this is NULL.
In this table, the dateOfNextExecution and dateStarted fields are really the two fields that determine if a task should be executed. Remember, we're going to be invoking each task from a centralized point of entry into our application; as such, we're going to find that most of the time, a scheduled task does not need to be run. As you will see in the code below, a task needs to be executed if 1) its date of next execution has passed and 2) it is not currently running.
When I first started to approach this problem, my first instinct was to spawn each task in its own CFThread tag. This makes sense since it would allow tasks to run in parallel. However, using CFThread presents two serious drawbacks. For starters, there is a limit to the number of threads that can execute at any given time (especially in a Standard installation of ColdFusion). This means that task-based threads might have to wait a while for an available thread. Furthermore, ColdFusion cannot handle nested CFThread tags. This means that our individual task algorithms couldn't use the CFThread tag without throwing an error. Since CFThread is such a powerful tag, I didn't want to exclude its use from tasks that might very well take benefit from it.
Without the CFThread tag, the next best way to run code in parallel is with the CFHTTP tag. While the CFHTTP tag wants to run in serial, we can finagle it to run in "parallel" by making sure that its timeout is low (one second). In doing so, we can trigger each task with its own CFHTTP tag while not having to wait for the previous CFHTTP tag to return.
This is what our centralized point-of-entry does. We still have one ColdFusion scheduled task that runs every few minutes. This page - tasks.cfm - then turns around and tries to invoke each task defined within the database.
Tasks.cfm - Our One ColdFusion Scheduled Task
<!---
Query for all the tasks that look like they should be
executing - that is, tasks whose "next" execution date has
passed AND who are not currently executing.
--->
<cfquery name="tasks" datasource="testing">
SELECT
t.id
FROM
task t
WHERE
t.dateOfNextExecution <= <cfqueryparam value="#now()#" cfsqltype="cf_sql_timestamp" />
AND
<!--- Make sure task is not currently executing. --->
t.dateStarted IS NULL
ORDER BY
t.id ASC
</cfquery>
<!---
Now that we have the tasks, we are going to examine each one
using a separate HTTP call. This way, we don't use a CFThread and
don't have to worry about nested thread errors. In order to do
this, let's get the base HTTP URL.
--->
<cfset baseUrl = (
"http://" &
cgi.server_name &
getDirectoryFromPath( cgi.script_name )
) />
<!--- Loop over each task to invoke. --->
<cfloop query="tasks">
<!---
When running the task, we don't want to wait for the task
to return - this way, we can try to have all the tasks
running in parallel (as best as possible).
--->
<cfhttp
method="get"
url="#baseUrl#run.cfm"
timeout="1"
throwonerror="false">
<!--- Pass the task ID through to the RUN page. --->
<cfhttpparam
type="url"
name="id"
value="#tasks.id#"
/>
</cfhttp>
</cfloop>
As you can see, this page simply gathers the tasks from the database and launches a CFHTTP tag for each task. It performs some execution logic (based on task-based date fields); but, as you'll see below, this logic is made redundant on the "run" page.
As the tasks.cfm page invokes each task, notice that it sets two important attributes on the CFHTTP tag:
- timeout = "1"
- throwonerror = "false"
The timeout attribute ensures that ColdFusion will only wait 1 second for the CFHTTP request to finish execution. The throwOnError attribute ensures that when/if the CFHTTP tag doesn't finish in one second, ColdFusion doesn't throw an error. With these two attributes in place, we can more-or-less look at the CFHTTP tags as executing in parallel rather than in serial.
Each CFHTTP examines and then potentially invokes a single scheduled task. This logic is performed in run.cfm. This "run" page takes care of setting up and then tearing down a single task. That is, it makes sure that the task should be run, assembles the task data, invokes the task in its own module (for the lowest-touch sandboxing), catches any errors, and then updates the database for the next execution.
Run.cfm - Individual Task Invocation
<!---
Set a reasonable timeout for this task execution. This can
always be overriden in the individual task templates.
--->
<cfsetting requesttimeout="20" />
<!--- Param the task ID. --->
<cfparam name="url.id" type="numeric" default="0" />
<!---
Since this page can be run either from the database or manually,
let's wrap the page in an exlcusive lock so as to make sure that
subsequent executions don't overlap.
NOTE: We are not going to throw an error if the lock times-out
since this task will just be run again later.
--->
<cflock
name="tasks_#url.id#_#hash( getCurrentTemplatePath() )#"
type="exclusive"
timeout="1"
throwontimeout="false">
<!---
Now that we are exclusive, query for the given task. When
doing so, we are going to make sure that the task is not
currently running (that its dateStarted time is NULL).
--->
<cfquery name="task" datasource="testing">
SELECT
t.id,
t.name,
t.description,
t.template,
t.interval,
t.metaData,
t.dateOfLastExecution,
t.dateOfNextExecution,
t.dateStarted
FROM
task t
WHERE
t.id = <cfqueryparam value="#url.id#" cfsqltype="cf_sql_integer" />
AND
t.dateOfNextExecution <= <cfqueryparam value="#now()#" cfsqltype="cf_sql_timestamp" />
AND
<!--- Make sure task is not currently executing. --->
t.dateStarted IS NULL
</cfquery>
<!---
Make sure that the task was found. If not, then just exit
out as there's nothing left to do.
--->
<cfif !task.recordCount>
<!--- Nothing more to do. --->
<cfexit />
</cfif>
<!--- ------------------------------------------------- --->
<!--- ------------------------------------------------- --->
<!---
If we've made it this far, then we have a task that needs
to be executed. As such, let's flag it as being started.
--->
<cfquery name="changeTaskStatus" datasource="testing">
UPDATE
task
SET
dateStarted = <cfqueryparam value="#now()#" cfsqltype="cf_sql_timestamp" />
WHERE
id = <cfqueryparam value="#task.id#" cfsqltype="cf_sql_integer" />
</cfquery>
<!---
When we run the task, let's wrap it in a try/catch so that we
can log any errors that occur.
--->
<cftry>
<!---
When we execute the task, we're going to do so as a
module. This will give the task some level of sandboxing.
However, we also want the task algorithm to be able to
modify the task metaData and next execution date. As
such, let's create a task data object to pass into the
module during execution.
--->
<cfset taskData = {
id = task.id,
name = task.name,
description = task.description,
template = task.template,
interval = task.interval,
metaData = task.metaData,
dateOfLastExecution = task.dateOfLastExecution,
dateOfNextExecution = task.dateOfNextExecution
} />
<!---
Before executing the task, check to see if the meta data
is valid JSON data. If so, let's implicitly deserialize
it prior to task execution.
--->
<cfif isJSON( taskData.metaData )>
<!--- Deserialize meta data. --->
<cfset taskData.metaData = deserializeJSON( taskData.metaData ) />
</cfif>
<!---
Execute the task as a module to give it a little bit of
a sandbox to play in. We don't want it messing up the
variables in this page. When doing this, let's pass the
Task data object in for reference.
--->
<cfmodule
template="#task.template#"
task="#taskData#">
<!---
If we made it this far, then the task has executed
completely and without error. Update the record for
next execution.
--->
<cfquery name="updateTask" datasource="testing">
UPDATE
task
SET
<!---
Move the date started into the last exectuion for
debugging purposes.
--->
dateOfLastExecution = dateStarted,
<!---
Flag the task as no longer running so that it can
be invoked next time.
--->
dateStarted = NULL,
<!---
Check to see if the date of next execution in the
Task object is the same as the one originally
passed-in. If so, then perform the update
automatically. If the Task-based date of next
execution is different, assume the task algorithm
set it explicitly (and that we should use that one
directly).
NOTE: We are using dateDiff() rather than EQ
so as to account for differently formatted dates
and dates that are too similar to warrant a
difference.
--->
<cfif !dateDiff( "n", task.dateOfNextExecution, taskData.dateOfNextExecution )>
<!---
Update the date of next execution by
incrementing the current time by the given
internval. Remember, the interval in our
system is defined as a fractional number of
days. As such, we can simply use date-math to
make the addition.
--->
dateOfNextExecution = <cfqueryparam value="#(task.dateOfNextExecution + task.interval)#" cfsqltype="cf_sql_timestamp" />,
<cfelse>
<!---
Use the date provided by the task (which was
presumed to have been updated by the task
algorithm).
--->
dateOfNextExecution = <cfqueryparam value="#taskData.dateOfNextExecution#" cfsqltype="cf_sql_timestamp" />,
</cfif>
<!---
Store any meta data that has been put into the
task object. Since the database can only hold
string data, we'll convert the value to JSON.
--->
metaData = <cfqueryparam value="#serializeJSON( taskData.metaData )#" cfsqltype="cf_sql_longvarchar" />,
<!--- Clear out any old error-log. --->
errorLog = <cfqueryparam value="" cfsqltype="cf_sql_longvarchar" />
WHERE
id = <cfqueryparam value="#task.id#" cfsqltype="cf_sql_integer" />
</cfquery>
<!--- Catch any errors that bubbled up from the task. --->
<cfcatch>
<!---
Log error in database. For this demo, we will be
logging the CFCatch object as JSON to the text field.
--->
<cfquery name="logError" datasource="testing">
UPDATE
task
SET
errorLog = <cfqueryparam value="#serializeJSON( cfcatch )#" cfsqltype="cf_sql_longvarchar" />
WHERE
id = <cfqueryparam value="#task.id#" cfsqltype="cf_sql_integer" />
</cfquery>
<!--- --------------------------------- --->
<!--- --------------------------------- --->
<!---
At this point, you probably want to shoot
out an email to someone so as to alert them
that an unexpected TASK error has occurred.
Or, you might create that as a task in an of
itself (a task that checks the errorLog
fields of other tasks).
--->
<!--- --------------------------------- --->
<!--- --------------------------------- --->
</cfcatch>
</cftry>
</cflock>
When it comes time to actually invoking the given task, the run page uses the ColdFusion CFModule tag to execute the given CFM template. This gives the task execution a little bit of a sandbox and protects the "run" page from any variables that might be created, manipulated, or destroyed during task execution.
For convenience, the run page assembles a taskData object to pass into the module during execution. This provides the deserialized metaData persisted by the previous execution. It also provides the task algorithm with a way to override the dateOfNextExecution. By default, the run page will simply calculate the dateOfNextExecution by adding the given interval; however, if the task needs to be more flexible or complex with its execution dates, it can explicitly override the dateOfNextExecution property of the taskData object. If the run page detects that this has been done, it will use the provided date of execution rather than calculating one using the interval.
Since the actual task template (CFM) is being executed as a CFModule tag, the taskData is made available in the Attributes scope of the task algorithm. This can be used as-is; or, it can be copied into the Variables scope. Since structs in ColdFusion are copied by-reference, the location of the reference doesn't much matter.
To see an example task, complete with metaData, take a look at the following code. This code would periodically email a girlfriend to let her know how wonderful she is. Notice that the metaData is used to make sure that a single flattering "attribute" is not used twice in a row.
email_girlfriend.cfm - An Example Scheduled Task
<!---
This page gets invoked as a Module / custom tag for
sandboxing. Copy the Task object reference to the variables
scope for convenience.
--->
<cfset task = attributes.task />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- Param the task meta-data. Make sure that it is a struct. --->
<cfif !isStruct( task.metaData )>
<!---
Create the meta data. This data will be persisted in the
database automatically.
--->
<cfset task.metaData = {
name = "Tricia",
email = "ben+tricia@bennadel.com",
lastAttribute = "",
attributes = [
"beautiful",
"gorgeous",
"sexy",
"stunning",
"amazing",
"wonderful"
]
} />
</cfif>
<!--- Select a random attribute to use in the mailing. --->
<cfset attribute = task.metaData.attributes[
randRange( 1, arrayLen( task.metaData.attributes ) )
] />
<!---
Make sure that we are using a different attribute than last
time - we don't want to sound repetitive!
--->
<cfloop condition="(attribute eq task.metaData.lastAttribute)">
<!--- Try to select another attribute. --->
<cfset attribute = task.metaData.attributes[
randRange( 1, arrayLen( task.metaData.attributes ) )
] />
</cfloop>
<!---
Now that we have our attribute, store it for next time so
as to make sure we don't repeat it directly.
--->
<cfset task.metaData.lastAttribute = attribute />
<!--- Send out the email. --->
<cfmail
to="#task.metaData.email#"
from="ben@bennadel.com"
subject="Hey baby - was just thinking about you."
type="html">
<p>
Hey baby, I was just thinking about you and about how
<strong>#attribute#</strong> you are. I wanted to tell
you that yesterday, but sometimes I am afraid that I
just gush over you too much.
</p>
<p>
Anyway, can't wait to see you later!
</p>
</cfmail>
As you can see, other than getting a reference to the Task data object, the task algorithm doesn't have to know anything about the scheduled execution. Unless it needs to explicitly override the "task.dateOfNextExecution" property, this task algorithm can run just as if it were any other page in the application.
Scheduled tasks are a really powerful feature of any application. And, ColdFusion scheduled tasks are pretty awesome; but, sometimes, it's really nice to keep scheduled task information in a database. Not only does this approach allow relevant meta data to be persisted between task executions, it also allows for more complex intervals to be used. And, if you're concerned about giving out access to the ColdFusion administrator, this approach requires nothing more than database access for a single project.
Want to use code from this post? Check out the license.
Reader Comments
Ben,
I do something pretty similar, actually with my Scheduler.cfc component. It doesn't run tasks in parallel, but does ensure that tasks can't be running more than once at the same time an provide pretty good information about exceptions and such.
http://www.bryantwebconsulting.com/blog/index.cfm/2009/2/26/Schedulercfc-10
Nice to see that I am not the only one who likes this approach.
At one place I worked, this was pretty much exactly what we used to kick off CF tasks. We could exercise Linux-like control over the tasks themselves and also maintain a nice log of when everything had been run just in case something odd happened.
For the vast majority of situations, a setup like this works fine. In the rare case where you need to ensure that a task runs every time as scheduled, you'll have to set up some kind of handler for DST. The system we had allowed users to set up reports and have them delivered on a user-specified schedule, and there were always some reports that were set to run between 2 and 3 AM.
The monthly tasks were the biggest problems because you'd have another month until they ran again (if you didn't catch that they didn't run), or you'd get a full month of data followed by an hour of data, depending on whether it was spring or fall. Fun times.
What we did was to manually run a script to move those tasks forward an hour; once the time change had passed, we'd move them back to the original time.
I really like this solution. I haven't had time to read it all in depth, have just perused a little of it, and I am not all that knowledgable about administrative stuff, including scheduled tasks, but this looks like some good stuff here. Thanks!
@Steve,
Very cool stuff. I like the idea of moving the features into a ColdFusion component. When I think page requests, I still tend to think in terms of CFMs. But, as I move into more and more CFC work, it definitely seems more natural.
Adam Lehman hinted on Twitter that they might unveil some more CF10 details including new scheduled task updates at RIACon. Sorry I can't make it their inaugural year :(
@Dave,
It took me a minute to realize what you were talking about when it came to daylight savings time :) Yeah, I can see that being a problem when you are running things early in the AM (as I think many of us do for reports and whatnot).
The think I really liked about this approach was that the task itself could dynamically change the date of next-execution. This would allow the task to take into account all kinds of things that might pop-up.
@Anna,
Thanks Anna Banana :)
@Ben,
You might want to combine Tasks.cfm and Run.cfm so that you can put cftransactions around the select where t.dateStarted IS NULL and the subsequent updates that set t.dateStarted. This prevents a manual execution of Tasks.cfm from getting into a race condition with the scheduled excecution.
It often doesn't occur to folks to put cftransactions around the selects that determine updates, but they're kind of an ideal way to keep something from being done twice.
P.S.: Ben, since you mentioned Adam Lehman talking about new features of CF10, it reminded me of a new feature idea I had yesterday. Do you know if we're past the cutoff for suggestions?
@Ben,
That's a good observation, so I'll clarify for others who might not have run into that problem ... in the US, when clocks "spring forward" from 2 AM to 3 AM, time on the server will change from 1:59 AM to 3:00 AM - no times in between will "happen".
The nice thing about Ben's solution is that it runs tasks whose execution time has passed, so there aren't really any problems with missing a task in the spring. The system we used checked every minute and queued only tasks whose execution time was on that minute, so on that system, a task that should run at 2:30 AM was skipped, while on Ben's, it would run the next time the scheduler task ran.
When clocks "fall back" to 2 AM, time on the server runs from 2:00 AM to 2:59 AM like normal ... but then goes back to 2:00 AM and repeats that path through 2:59 AM, finally reaching 3:00 AM and continuing as normal.
In our system, that meant we ran daily/weekly/monthly reports twice, because every time from 2:00 AM to 2:59 AM occurred twice on that day. In Ben's system, hourly tasks would be skipped once:
1:15 AM - runs as normal
2:15 AM (before time change) - runs as normal
2:15 AM (after time change) - does not run because the next execution time is 3:15 AM
3:15 AM - runs as normal
It would probably be better to run a script that would manually adjust these tasks ... there's not really an easy way to address this window, and while you could write a function to adjust the tasks automatically, it'd be pretty complex, and it only applies to one hour per year.
The ColdFusion scheduler never gave me the type of control I wanted over schedule and I was always worrying about time-outs. (I have a script that downloads lots of content on a scheduled basis.) I thought about a database, but then I'm still back to hoping that ColdFusion runs the task and doesn't time out and it still requires something to trigger the scripts (ie, CFHTTP).
Using Windows, I finally settled for CRON:
http://www.nncron.ru/
http://www.z-cron.com/scheduler.html
CURL:
http://curl.haxx.se/
and batch files that perform GET or POST request to ColdFusion scripts hosted on various servers. This allows me to keep all of my tasks on a separate server and I don't have to use any ColdFusion threads to trigger and/or perform the page request. I can also run as many simultaneous page requests that I want.
A sample BAT file looks like this:
CURL -o c:\logs\Sendrenewals.htm http://website.com/index.cfm -d "Task=Sendrenewals"
(and the script only responds to a couple different IP addresses.)
The CRON Task is maintained in a text file and looks like this (each day at 6:30am):
30 6 * * * ~c:\tasks\Sendrenewals.bat
If I wanted it to run every year, on June 7th at 17:45, I'd do this:
45 17 7 6 * ~c:\tasks\Sendrenewals.bat
This is each minute:
* * * * * ~c:\tasks\Sendrenewals.bat
How about the first Monday of each month, at 9am?:
0 9 1-7 * 1 ~c:\tasks\Sendrenewals.bat
I can also easily migrate this solution (CRON, CURL & BAT files) to any Windows computer by copying the sub-directory & support files. For notifications when a task has run, I either configure it to happen via the ColdFusion script I'm running or use the BLAT commandline program.
http://www.blat.net/
Just an aside - I was wondering about run.cfm...
It's not being called as a custom tag, so I'm thinking you should be <cfabort>'ing rather than <cfexit>'ing ?
(Unless I've missed one of your many excellent other posts on cfabort and onRequestEnd of course...)
The problem I have always had with trying to prevent simultatnious exections of tasks is that the coldfusion has too many changes to never actually catch an error. The DateStarted field never gets reset to null and the error doesn't actually get logged. Then you have tasks just sitting there not getting executed again.
Any thoughts?
I have separate tables for tasks and actions (each time any task gets run) and use variables to track things in process as well (an advantage of using a persistent-scoped CFC).
This solves a lot of concurrency and error-trapping problems. It also allows for nice reporting on run times and errors and such.
@WebManWalking,
I know this is a rather old post, but I'm trying to make my way through a back-log of comments :) I think the reason I didn't want to combine Tasks and Run was because I was worried that tasks would take too long to execute and would end up pushing each other off -- the tasks might have to run in Serial rather than Parallel.
So then, I figured I might just add CFThread to the situation to add parallel processing back in place. But, if I did that, then I was worried that the individual scheduled tasks might not be able to launch their own threads (since CFThread's cannot contain other CFThread tags).
All in all, breaking them apart seemed to allow for the best mimic of the native scheduled task concept in which each task would be hit using a HTTP request from the ColdFusion Administrator (or wherever scheduled tasks are launched from).
@Dave,
I have since learned that all the daylight saving time (DST) stuff is a nightmare :) Talk about a complex world!!
@James,
I've had a number of people tell me they much pref external CRON scenarios like the one you are talking about. I, personally, have never set up a CRON job on a computer. That always seemed like some old-school Kung Fu that I never learned in school (like running "grep" commands and what not). People rave about it, though.
@Geoff,
The docs are not great about it. In the description, it says it is used with Custom Tags. But, if you look farther down on the Usage, you'll see:
"If this tag is encountered outside the context of a custom tag, for example in the base page or an included page, it executes in the same way as cfabort."
You can include the "method" attribute, but I believe it gets ignored unless you are in the context of Custom Tags.
@Tyler,
There's a lot that can go wrong, from bugs to simply running out of memory, etc.. I wish I had a magic solution to fixing it, but I don't. Put a Try/Catch around everything and try to log / email any uncaught errors.
Perhaps you can create a different Task that monitors the other tasks to make sure they don't sit in "active" state for too long. Perhaps it could send an email if a given task has been active longer than its Interval.
@Steve,
Yeah, logging values to the database is key. I like the idea of tracking each execution - I had not thought of that.
@Ben,
First off, thank you for your site. It seems as though I often the answers to my CF problems or enough of an answer to point me in the right direction whenever I need it. This task scheduler alone provided most of what I needed to quickly replace my old one that suddenly stopped working after applying a CF8 patch.
I have a task that loops through a directory to process files, which uses <cflocation> to return to the same CFM page upon error to continue processing the remaining files. When this task is executed and hits the <cflocation>, the task stays in a "running" status because it does not complete the subsequent task update query. I tried setting the <cflocation> target to the URL that the task scheduler used to initially execute the task, but ended up with the same result. I already added a monitoring task to alert me when this occurs and could add a reset query to that task, but that seems like too much of a band-aid to me. Any suggestions?