Monitoring Circuit Breaker State In ColdFusion
In my last exploratory post, I extracted state management from the Circuit Breaker in order to create a more generic Circuit Breaker and a polymorphic state implementation. In this post, I wanted to look at how we might subsequently monitor that extracted state. While the Circuit Breaker itself will rethrow errors that can be logged by your application framework, there is certainly value in being able to tap into the state changes of the Circuit Breaker. For example, if a Circuit Breaker is tripped (ie, moved to the Opened state), you might want to send a Slack notification to your Operations team.
The most straightforward way to monitor the Circuit Breaker State changes would be to pass some sort of logging component into the Circuit Breaker State and then have the Circuit Breaker State alert the logging component when meaningful events take place. In my state implementation, the meaningful events would be:
- Circuit opened.
- Circuit closed.
- Request fulfilled successfully.
- Request failed.
We can then craft a Monitor interface that logs these four events:
component
output = false
hint = "I provide a default No-Op (no operation) monitor and interface for Circuit Breaker monitoring."
{
/**
* I log the changing of the Circuit Breaker state to Closed.
*
* @output false
*/
public void function logClosed() {
// ...
}
/**
* I log a failure to fulfill a request in the Circuit Breaker.
*
* CAUTION: The error is being passed-in for DECISION PURPOSES ONLY. You should not
* be logging this error here. The Circuit Breaker will rethrow the error as part of
* the request processing. Let's the normal page flow log the error.
*
* @error I am the error passed to the failure tracker.
* @output false
*/
public void function logFailure( required any error ) {
// ...
}
/**
* I log the changing of the Circuit Breaker state to Opened.
*
* @output false
*/
public void function logOpened() {
// ...
}
/**
* I log a successful request fulfillment in the Circuit Breaker.
*
* @output false
*/
public void function logSuccess() {
// ...
}
}
At this point, we don't know what the monitor will actually do with this event data; and we don't care. From the viewpoint of the Circuit Breaker State, the monitoring is an optional blackbox. But, of course, the Circuit Breaker State does need to know how to interact with that blackbox. As such, we have to update our Circuit Breaker State to accept and consume the monitor component.
In the following code, all we're doing - building on top of the previous post - is adding a few log-method calls during the existing state management. Since the monitor is an optional part of the workflow, I've encapsulated the existence-check behind a few private methods. This way, the state workflow can safely consume those private methods, making the workflow easier to read and to reason about.
component
output="false"
hint = "I provide the NON-SYNCHRONIZED state for a Circuit Breaker instance."
{
/**
* I initialize the Circuit Breaker State strategy. This state component is meant
* to help drive the control flow of a Circuit Breaker.
*
* @failedRequestThreshold I am the number of requests that can fail before the circuit is opened.
* @activeRequestThreshold I am the number of parallel requests that can be concurrently active before the circuit is opened.
* @openStateTimeout I am the time (in milliseconds) that the circuit will remain open until the target is tested for health.
* @monitor I am the optional state change monitor.
* @output false
*/
public any function init(
numeric failedRequestThreshold = 10,
numeric activeRequestThreshold = 10,
numeric openStateTimeout = ( 60 * 1000 ),
any monitor = ""
) {
// Store the properties.
variables.failedRequestThreshold = arguments.failedRequestThreshold;
variables.activeRequestThreshold = arguments.activeRequestThreshold;
variables.openStateTimeout = arguments.openStateTimeout;
variables.monitor = arguments.monitor;
// NOTE: There is no "half-open" state. The half-open pseudo-state will be
// entered into by a single request in which a full state change isn't necessary.
states = {
CLOSED: "CLOSED",
OPENED: "OPENED"
};
// Default to a closed (ie, flowing) state.
state = states.CLOSED;
// Initialize the counters.
activeRequestCount = 0;
failedRequestCount = 0;
// Initialize the timers - each of these store UTC millisecond values.
checkTargetHealthAtTick = 0;
lastFailedRequestAtTick = 0;
}
// ---
// PUBLIC METHODS.
// ---
/**
* I determine if a health check can be initiated against the target.
*
* @output false
*/
public boolean function canPerformHealthCheck() {
return( ! isAtCapacity() && ! isWaitingForTargetToRecover() );
}
/**
* I return a summary of the state of the Circuit Breaker. This can be used for logging
* and debugging purposes.
*
* @output false
*/
public string function getSummary() {
return(
( isOpened() ? "State: OPENED, " : "State: CLOSED, " ) &
"Active request count: [#activeRequestCount#], " &
"Failed request count: [#failedRequestCount#]."
);
}
/**
* I determine if the Circuit Breaker is currently closed and can accept requests.
*
* @output false
*/
public boolean function isClosed() {
return( state != states.OPENED );
}
/**
* I determine if the Circuit Breaker is closed and cannot currently accept any requests.
*
* @output false
*/
public boolean function isOpened() {
return( state == states.OPENED );
}
/**
* I reset the Circuit Breaker State, rolling back all counters and timers to a
* healthy state.
*
* @output false
*/
public boolean function reset() {
// Revert to a closed (ie, flowing) state.
state = states.CLOSED;
// Reset the counters.
activeRequestCount = 0;
failedRequestCount = 0;
// Reset the timers.
checkTargetHealthAtTick = 0;
lastFailedRequestAtTick = 0;
// Even though we are not sure if the original state (pre-reset) was Opened,
// let's log this reset as a state-change.
logClosed();
}
/**
* I track a failed action in the Circuit Breaker.
*
* NOTE: The associated error is being passed into the failure method in case any
* additional logic needs to be implemented based on error type.
*
* @error I am the error that was thrown during the request execution.
* @output false
*/
public void function trackRequestFailure( required any error ) {
activeRequestCount--;
// Check to see if the current failure count is still relevant. Since we are
// tracking errors in a rolling window, it might be time to reset the count
// before we track the current failure.
if ( isClosed() && isNewErrorWindow() ) {
failedRequestCount = 0;
}
failedRequestCount++;
lastFailedRequestAtTick = getTickCount();
logFailure( error );
// Check to see if the current failure exceeded the allowable failure rate for
// the Circuit Breaker. If so, we'll have to trip it open.
if ( isClosed() && isFailing() ) {
state = states.OPENED;
checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );
logOpened();
}
}
/**
* I track the start of an action in the Circuit Breaker. Every "start" should be
* followed by either a completion in "success" or in "failure".
*
* @output false
*/
public void function trackRequestStart() {
// If a request is being initiated while the circuit is tripped open, it must be
// a health check. Since the ability to accept a health check is, in part, driven
// by the open-state timeout, in order to prevent parallel requests from also
// initiating a health check request, let's bump out the timer. This will also
// implicitly "reset" the timeout, for all intents and purposes, if the health
// check fails.
if ( isOpened() ) {
checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );
}
activeRequestCount++;
// If the current request just exhausted the request pool, open the circuit so
// no more requests can be initiated.
if ( isClosed() && isAtCapacity() ) {
state = states.OPENED;
// NOTE: Since this "trip" is based on capacity and not on error rate, there
// is no need to adjust the health-timer. We want the circuit to re-close as
// pending requests complete.
logOpened();
}
}
/**
* I track a successful action in the Circuit Breaker.
*
* @output false
*/
public void function trackRequestSuccess() {
activeRequestCount--;
logSuccess();
// Any successful request that returns while the Circuit Breaker is open will
// move the circuit back into a closed, flowing state. This may be the "health
// check" request; or, it may be a previously long-running request that finally
// returned some time after the circuit was tripped open; or, it may be an
// "at capacity" request that has completed, releasing a slot in the request
// pool. At this point, there is no differentiating between the various types
// of successful returns.
if ( isOpened() && ! isAtCapacity() ) {
state = states.CLOSED;
// Reset failure tracking.
failedRequestCount = 0;
lastFailedRequestAtTick = 0;
checkTargetHealthAtTick = 0;
logClosed();
}
}
// ---
// PRIVATE METHODS.
// ---
/**
* I determine if the Circuit Breaker has exhausted its request pool and should no
* longer accept any requests until pending requests have completed.
*
* @output false
*/
private boolean function isAtCapacity() {
return( activeRequestCount >= activeRequestThreshold );
}
/**
* I determine if the Circuit Breaker is failing based on the failed request threshold.
*
* @output false
*/
private boolean function isFailing() {
return( failedRequestCount >= failedRequestThreshold );
}
/**
* I determine if a new error-tracking window should be initiated. Errors are tracked
* in a rolling window so that infrequent errors don't eventually trip the Circuit
* Breaker unnecessarily.
*
* @output false
*/
private boolean function isNewErrorWindow() {
return( lastFailedRequestAtTick < ( getTickCount() - openStateTimeout ) );
}
/**
* I determine if the OPEN Circuit Breaker is currently waiting before attempting to
* check the health of the target (ie, whether or not it is yet appropriate to check
* the health of the target).
*
* @output false
*/
private boolean function isWaitingForTargetToRecover() {
return( checkTargetHealthAtTick > getTickCount() );
}
/**
* I log the changing of the state from Opened to Closed.
*
* @output false
*/
private void function logClosed() {
if ( ! isSimpleValue( monitor ) ) {
monitor.logClosed();
}
}
/**
* I log a request failure.
*
* @output false
*/
private void function logFailure( required any error ) {
if ( ! isSimpleValue( monitor ) ) {
monitor.logFailure( error );
}
}
/**
* I log the changing of the state from Closed to Opened.
*
* @output false
*/
private void function logOpened() {
if ( ! isSimpleValue( monitor ) ) {
monitor.logOpened();
}
}
/**
* I log a request success.
*
* @output false
*/
private void function logSuccess() {
if ( ! isSimpleValue( monitor ) ) {
monitor.logSuccess();
}
}
}
The beautiful thing about having extracted the state management from the Circuit Breaker itself is that we don't have to change the Circuit Breaker at all. Since the monitoring of the state didn't require any breaking changes on the state API, the Circuit Breaker doesn't have to know anything about the changes to the underlying state implementation.
To test this, we can now create a simple monitor that logs the Circuit Breaker State events to an in-memory array:
component
extends = "CircuitBreakerMonitor"
output = false
hint = "I log Circuit Breaker events to an in-memory event log."
{
/**
* I initialize the in-memory monitor.
*
* @output false
*/
public any function init() {
events = [];
}
// ---
// PUBLIC METHODS.
// ---
/**
* I return the recorded Circuit Breaker events.
*
* @output false
*/
public array function getEvents() {
return( events );
}
/**
* I log the changing of the Circuit Breaker state to Closed.
*
* @output false
*/
public void function logClosed() {
arrayAppend( events, "Circuit breaker moved to CLOSED state." );
}
/**
* I log a failure to fulfill a request in the Circuit Breaker.
*
* @error I am the error passed to the failure tracker.
* @output false
*/
public void function logFailure( required any error ) {
arrayAppend( events, "Circuit breaker experienced failure." );
}
/**
* I log the changing of the Circuit Breaker state to Opened.
*
* @output false
*/
public void function logOpened() {
arrayAppend( events, "Circuit breaker moved to OPENED state." );
}
/**
* I log a successful request fulfillment in the Circuit Breaker.
*
* @output false
*/
public void function logSuccess() {
arrayAppend( events, "Circuit breaker experienced success." );
}
}
And remember, to test this, we don't even need to create a Circuit Breaker - we can test the state directly:
<cfscript>
// Create an instance of our in-memory monitor. This will keep track of the Circuit
// Breaker events in an in-memory event log.
monitor = new InMemoryMonitor();
// Pass the in-memory monitor instance to our Circuit Breaker state.
state = new CircuitBreakerState(
failedRequestThreshold = 2,
activeRequestThreshold = 2,
openStateTimeout = 1000,
monitor = monitor
);
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
state.trackRequestStart(); // Pending: 1.
state.trackRequestSuccess(); // Pending: 0.
state.trackRequestStart(); // Pending: 1.
state.trackRequestSuccess(); // Pending: 0.
state.trackRequestStart(); // Pending: 1.
state.trackRequestFailure( "Error 1" ); // Pending: 0.
state.trackRequestStart(); // Pending: 1.
state.trackRequestStart(); // Pending: 2 -- hit active threshold.
state.trackRequestSuccess(); // Pending: 1.
state.trackRequestSuccess(); // Pending: 0.
state.trackRequestStart(); // Pending: 1.
state.trackRequestFailure( "Error 2" ); // Pending: 0.
state.trackRequestStart(); // Pending: 1.
state.trackRequestFailure( "Error 3" ); // Pending: 0 -- hit error threshold.
state.trackRequestStart(); // Pending: 1.
state.trackRequestSuccess(); // Pending: 0.
// Log-out the events monitored by the Circuit Breaker state.
writeDump( var = monitor.getEvents(), format = "text" );
</cfscript>
In this test code, we're tracking requests in such a way that we exceed both the active request threshold and the failed request threshold. And, when we run the above code, we get the following page output:
array - Top 12 of 12 rows
1) Circuit breaker experienced success.
2) Circuit breaker experienced success.
3) Circuit breaker experienced failure.
4) Circuit breaker moved to OPENED state.
5) Circuit breaker experienced success.
6) Circuit breaker moved to CLOSED state.
7) Circuit breaker experienced success.
8) Circuit breaker experienced failure.
9) Circuit breaker experienced failure.
10) Circuit breaker moved to OPENED state.
11) Circuit breaker experienced success.
12) Circuit breaker moved to CLOSED state.
As you can see, our monitor was able to tap into the changes in the Circuit Breaker State. And, most importantly, the mintor was able to see exactly when the Circuit Breaker was tripped open, indicating poor health of the remote system (that the Circuit Breaker is proxying). This would enable your engineering team to react quickly to any upstream failures.
Circuit Breakers build in protection from upstream failures, allowing your code to fail fast and possibly respond with fallback values. But, such protection is only half the battle; unhealthy systems still need to be investigated, debugged, and fixed. Monitoring the state changes within a Circuit Breaker can allow engineering teams to see the realtime health of system integrations and respond quickly when help is needed.
Want to use code from this post? Check out the license.
Reader Comments
@All,
I finally took all of my noodling on the concept of Circuit Breakers and turned it into a GitHub project. While it's not the end of the journey, this forced me to clean it up and add unit tests:
www.bennadel.com/blog/3190-coldfusion-circuit-breaker-project-on-github.htm
Now, I'll have a more directed way to continue evolving my understanding of the concept.