Creating A Health Probe Using Netlify Cron Functions, Postmark, And ColdFusion
A few weekends ago, shortly after upgrading my database to MySQL 8.0.28, my blog went offline for about 8-hours. I believe the issue was related to a bug in how ColdFusion caches per-application datasources. After several days of trial-and-error, I think I finally figured out how to safely create a per-application datasource while working around the bug. And, once the fires were all put out, I started to think about that 8-hour offline window; and, how I might operationalize the monitoring of my site. I had recently heard that Netlify released cron / scheduled functions. So, I wanted to see if I could create a health probe for my ColdFusion site using Netlify scheduled functions and the Postmark SMTP service.
View this code in my BenNadel.com Healthcheck project on GitHub.
Currently in Beta, the Netlify Scheduled Functions have to be enabled in Labs before they can be utilized. But, once enabled, a scheduled function is just like a normal Serverless Function with two major differences:
A scheduled function cannot be invoked from a public URL (except while in local development). Attempting to invoke a scheduled function using a URL will result in
403 Forbidden
error.A scheduled function is automatically invoked on a repetitive basis using a "cron expression". This cron expression can be defined in the scheduled function itself or in the
netlify.toml
file of the Netlify site.
To start exploring this idea, the first thing I had to do was create a health-check end-point for my ColdFusion blog. Normally, I wouldn't make database access part of a health-check. But, since a failure to connect to the database was the cause of my aforementioned outage, I decided to include a minimal database interaction as part of the request logic.
Here's the "health" portion of the health-check:
<cfscript>
request.template.type = "blank";
results = queryExecute( "SELECT ( 1 ) AS value ;" );
if ( results.value != 1 ) {
throw(
type = "BenNadel.Probe.UnexpectedValue",
message = "The health-check database query returned an unexpected value."
);
}
</cfscript>
By executing the queryExecute()
function with nothing other than the SQL statement, I am verifying that:
The datasources are defined.
The default datasource is defined (and exists within the datasources).
The ColdFusion server can connect to the database using the datasourcs configuration and can successfully execute the SQL.
If this runs successfully, the health-check will return a 200 OK
status. If it fails for any reason, the site will return a 500 Server Error
status. From a machine's standpoint, the content of the health-check is irrelevant. The only thing that matters is the HTTP Status Code. If the health-check returns a 200 OK
it indicates that the site is healthy. If it returns anything else, it indicates that the site is unhealthy.
ASIDE: If something terrible happens in the request routing, it's possible that a false-positive could be generated even though the site was down (or unreachable). This would be a case where checking the response content of the health-check would be valuable. Then, not only would you be assured of a
200 OK
response, you'd also know that you're hitting the right application. But, that's beyond the scope of this exploration.
Once I had the health-check deployed to my site, I went about creating a new Netlify site to host the scheduled function that would ping my health-check. First, I created a non-cron version of the function just to make sure that it was hitting the health-check and registering the response. After all, I haven't touched Netlify in a while and I needed to refamiliarize myself with how it worked.
The following serverless Function uses Axios to ping my health-check and then return some status-text letting me know if my ColdFusion blog was up and running:
// Import node modules.
var axios = require( "axios" );
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
/**
* I provide the Netlify serverless function logic.
*/
exports.handler = async function( event, context ) {
try {
var apiResponse = await axios({
method: "get",
url: "https://www.bennadel.com/index.cfm?event=probe.healthcheck",
timeout: 3000
});
var statusText = "All systems are OK.";
} catch ( error ) {
var statusText = ( error.response )
? `Probe responded with status code [${ error.response.status }].`
: `Unexpected error [${ error.message }].`
;
// If the response exists on the error object, it means that the request was made
// and the origin server responded with a status code. As such, let's only log the
// errors if there was a problem with the actual request itself (ex, an Axios
// configuration problem).
if ( ! error.response ) {
console.error( error );
}
}
return({
statusCode: 200,
headers: {
"Content-Type": "application/x-json; charset=utf-8"
},
body: JSON.stringify({
statusText: statusText
})
});
};
With this serverless Function in place, I started tweaking my health-check to return non-200 status codes so that I could confirm that this Axios try/catch
logic was working property. And, once I felt confident that everything was running as expected, I created a scheduled version of this Function that would act as my cron job / health probe.
Of course, with the scheduled version, just returning some status-text isn't going to solve my "operational readiness" problem. Instead, the Function needs to alert me to the downtime. And for that, I'm going to make an API call to Postmark and send myself an email.
PARTY LIKE IT'S 1999: Sometimes, you just gotta send yourself an email. Of course, if we were to leverage something like AWS SNS (Simple Notification Service) or Twilio, we could also send text-messages. But, I wanted to keep it simple! And, I don't want to be woken up in the middle of night - I'm not that keen on keeping my site online.
To do this, I created a Postmark server, which just allocates an isolated API key for a specific use-case (ie, this health-check). And then, I made that API key available as an environment variable in my Netlify deploy, POSTMARK_SERVER_TOKEN
. Once this was in place, I created my scheduled Function:
// Using the dotenv package allows us to have local-versions of our ENV variables in a
// .env file while still using different build-time ENV variables in production.
require( "dotenv" ).config();
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
// Import node modules.
var axios = require( "axios" );
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
/**
* I provide the Netlify serverless function logic.
*/
exports.handler = async function( event, context ) {
try {
var apiResponse = await axios({
method: "get",
url: "https://www.bennadel.com/index.cfm?event=probe.healthcheck",
timeout: 3000
});
} catch ( error ) {
if ( error.response ) {
console.log( `Healthcheck returned with non-200 status code [${ error.response.status }].` );
// CAUTION: Even though we don't care about the underlying API call to
// Postmark, we still have to AWAIT the call otherwise the Postmark API call
// will timeout. I assume this has to do with the Function being torn-down
// once the handler() returns.
await sendEmail( error.response.status );
} else {
console.log( "Probe could not make outbound request." );
console.log( error.message );
}
}
return({
statusCode: 200,
body: ""
});
};
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
/**
* I send an alert email to the boss.
*/
async function sendEmail( statusCode ) {
try {
var apiResponse = await axios({
method: "post",
headers: {
"Accept": "application/json",
"Content-Type": "application/json",
"X-Postmark-Server-Token": process.env.POSTMARK_SERVER_TOKEN
},
url: "https://api.postmarkapp.com/email",
data: {
From: "ben@bennadel.com",
To: "ben@bennadel.com",
Subject: "!! BenNadel.com Down !! - Healthcheck failed with non-200 status code.",
HtmlBody: `
<h1>
BenNadel.com Not Responding
</h1>
<p>
The healthcheck has responded with a non-200 status code
[${ statusCode }]. You best check the site to see if it is up.
</p>
`,
MessageStream: "outbound"
},
timeout: 5000
});
} catch ( error ) {
console.log( "Failed to send alert email." );
console.log( error );
}
}
Here, I'm using Axios to ping my ColdFusion health-check end-point. And, if it returns anything other than a 200 OK
status code, I then turn around and again use Axios to send an alert email to my new Postmark server letting me know that my site is unreachable.
ASIDE: At first, I wasn't including an
await
in front of mysendEmail()
call since I didn't actually care about the response. And, locally, this worked fine. However, once I deployed this Function, my Postmark API call was always timing-out. I suspect this has something to do with the Function being "torn down" after then handler returns. Once I added theawait sendEmail()
, then everything worked perfectly. I guess everything has to finish processing before the handler is allowed to return.
On its own, there's nothing about this Netlify Function that indicates that it is a scheduled Function. For that, I am using the netlify.toml
file - the site's configuration file - to denote the Function as a Scheduled Function:
# Settings in the [build] context are global and are applied to all contexts unless
# otherwise overridden by more specific contexts.
[build]
# Directory to change to before starting a build. This is where we will look for
# package.json/.nvmrc/etc. If not set, defaults to the root directory.
# base = ""
# Directory that contains the deploy-ready HTML files and assets generated by the
# build. This is relative to the base directory if one has been set, or the root
# directory if a base has not been set.
publish = "public/"
# Default build command.
command = "echo 'default context'"
[functions]
# Directory with serverless functions, including background and cron functions, to
# deploy. This is relative to the base directory if one has been set, or the root
# directory if a base hasn't been set.
directory = "netlify/functions/"
[functions."probe-cron"]
# https://crontab.guru/every-10-minutes
schedule = "*/10 * * * *"
Here, I'm configuring the probe-cron
Severless Function to be invoked every 10-minutes using the schedule
attribute.
Once I deployed this to Netlify, I went into my ColdFusion health-probe, artificially started returning a 500 Server Error
, and just like that, emails starts showing up in my Inbox:
Amazing! Sure, 10-minute granularity isn't the best; and, an email isn't guaranteed to get my attention; but, it's just a blog - it's not a critical piece of infrastructure. I'd rather suffer 10-minutes of downtime than 8-hours of downtime. So, in the grand scheme of things it's a big step forward.
The Netlify Free Tier For Functions
Right now, this Scheduled Function health-probe isn't costing me a penny. The free tier of Netlify is very generous. You get 150,000 executions and 100 hours of runtime per month. If I hit my ColdFusion health-check end-point every 10-minutes, that's only 144 requests a day or about 4,320 requests per month - well below the limits.
Want to use code from this post? Check out the license.
Reader Comments
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →