Thinking About Fallback Values In Circuit Breakers In ColdFusion
In my previous noodling on Circuit Breakers in ColdFusion, I talked about the ability for a Circuit Breaker to throw special types of errors - errors that the calling context could catch and respond to. But, the more I thought about this, the less I could think of a reason as to why the calling context would want to differentiate between a "CircuitBreakerOpen" error and [for example] a "ConnectionTimeout" error. If you go through the trouble of providing a fallback value, it would seem to make the most sense to simply provide that fallback value for all errors. As such, I wanted to revisit the Circuit Breaker again, this time building the concept of a fallback directly into the action marshaling.
In my current approach to Circuit Breakers in ColdFusion, the breaker can be tripped-open if the target component or closure appears to be unhealthy. An unhealthy target is one that throws too many errors in a fixed period of time; or, one that fails to respond in a timely manner, holding too many requests open. In either case, the circuit breaker will start to "fail fast", throwing "CircuitBreakerOpen" errors for all subsequent requests that need to be marshaled.
At first, I thought that the calling context would catch and respond to these types of errors:
<cfscript>
// Create our Circuit Breaker and the gateway that we'll proxy its actions.
breaker = new CircuitBreaker();
testGateway = new TestGateway();
// In my previous pass on the Circuit Breaker, I had imagined the calling context
// could try-catch errors and then return a fallback value if it so desired.
try {
result = breaker.executeMethod( testGateway, "makeBadCall", [ "Meh" ] );
} catch ( CircuitBreakerOpen error ) {
// Define a fallback specifically for the case in which the Circuit Breaker
// had been tripped open.
result = "Some fallback value";
} catch ( any error ) {
// Or, a fallback for any other type of error.
result = "Some generic fallback value";
}
writeOutput( result );
</cfscript>
As you can see in this approach, the calling context has the ability to respond specifically to "CircuitBreakerOpen" errors. But, as I stated above, I can't really think of a good reason as to why the calling context would want to differentiate. As such, I went back and I updated the Circuit Breaker execution methods to accept an optional argument for a fallback value that the Circuit Breaker would return in the case of an error. The fallback value can either be a static value or a function / closure. In the case of a function or closure, the fallback value will only be invoked (and returned) if an error occurs.
<cfscript>
// Create our Circuit Breaker and the gateway that we'll proxy its actions.
breaker = new CircuitBreaker();
testGateway = new TestGateway();
// Now, you can pass the fallback value in as an optional argument. This fallback
// value will then be used for any error that occurs during the action marshaling.
// The fallback value can be a static value (of almost any kind):
result = breaker.executeMethod( testGateway, "makeBadCall", [ "Meh" ], "Static fallback value." );
writeOutput( "#result# <br />" );
// ... or it can be a Function / Closure that will be evaluated if and only if an
// error occurs during action marshaling:
result = breaker.executeMethod(
testGateway,
"makeBadCall",
[ "Meh" ],
function() {
return( "Fallback value from evaluated closure." );
}
);
writeOutput( "#result# <br />" );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
// The ".executeMethod()" works with a method name but the ".execute()" method
// works with Closures and Functions for marshaled invocation:
result = breaker.execute(
function() {
return( testGateway.makeBadCall( "Meh" ) );
},
"Static fallback value."
);
writeOutput( "#result# <br />" );
// And of course, the fallback value for .execute() can be either a static value
// or a Function / Closure.
result = breaker.execute(
function() {
return( testGateway.makeBadCall( "Meh" ) );
},
function() {
return( "Fallback value from evaluated closure." );
}
);
writeOutput( "#result# <br />" );
</cfscript>
Because the fallback value is optional, I ended up creating two different execution methods to simplify the logic - one that takes a target component and a method name; and, one that takes a function or closure:
.execute( closure [, fallback ] )
.executeMethod( component, methodName [, methodArguments [, fallback ] ] )
Internally to the Circuit Breaker, there are two different reasons that an error can be propagated. Either the Circuit Breaker is open and the marshaled request needed to "fail fast"; or the target itself threw an error. In order to keep the internal logic cleaner, I didn't want to build the fallback concept directly into that internal workflow. Instead, I moved the request marshaling into a private method - run() - and kept the top-level execute() methods as fallback-aware entry points:
<cfscript>
/**
* I marshal the given action inside the Circuit Breaker.
*
* @target I am the function or closure to be invoked.
* @fallback I am the value to be evaluated if the action fails to complete successfully.
* @output false
*/
public any function execute(
required any target,
any fallback
) {
try {
return( run( target ) );
} catch ( any error ) {
// If a fallback has been provided, return the fallback instead of letting
// the error propagate to the calling context.
if ( structKeyExists( arguments, "fallback" ) ) {
return( evaluateFallback( fallback ) );
}
rethrow;
}
}
/**
* I marshal the given action inside the Circuit Breaker.
*
* @target I am the component receiving the message.
* @methodName I am the message being sent to the target.
* @methodArguments I am the message arguments being sent to the target.
* @fallback I am the value to be evaluated if the action fails to complete successfully.
* @output false
*/
public any function executeMethod(
required any target,
required string methodName,
any methodArguments = [],
any fallback
) {
try {
return( run( target, methodName, methodArguments ) );
} catch ( any error ) {
// If a fallback has been provided, return the fallback instead of letting
// the error propagate to the calling context.
if ( structKeyExists( arguments, "fallback" ) ) {
return( evaluateFallback( fallback ) );
}
rethrow;
}
}
</cfscript>
As you can see, the execute() methods do nothing more than initiate the request marshaling and handle the fallback value (if it is provided). This creates a convenient way to consume the Circuit Breaker while the "guts" of Circuit Breaker don't need to know anything about the concept of a fallback.
Altogether, this is the latest CircuitBreaker.cfc code:
component
output = false
hint = "I marshal the invocation of actions (closures or method calls), providing circuit-breaker protection."
{
/**
* I initialize the Circuit Breaker with the given options.
*
* @failedRequestThreshold I am the number of requests that can fail before the circuit it opened.
* @activeRequestThreshold I am the number of parallel requests that can be concurrently active before the circuit is opened.
* @openStateTimeout I am the time (in milliseconds) that the circuit will remain open until the target is tested.
* @output false
*/
public any function init(
numeric failedRequestThreshold = 10,
numeric activeRequestThreshold = 10,
numeric openStateTimeout = ( 60 * 1000 )
) {
// Store the properties.
variables.failedRequestThreshold = arguments.failedRequestThreshold;
variables.activeRequestThreshold = arguments.activeRequestThreshold;
variables.openStateTimeout = arguments.openStateTimeout;
// NOTE: There is no "half-open" state. The half-open pseudo state will be entered
// by a single request in which a true state change isn't necessary.
states = {
CLOSED: "CLOSED",
OPENED: "OPENED"
};
// Default to a closed (ie, flowing) state.
state = states.CLOSED;
// Reset the counters.
activeRequestCount = 0;
failedRequestCount = 0;
// Reset the timers - each of these store UTC millisecond values.
checkTargetHealthAtTick = 0;
lastFailedRequestAtTick = 0;
// All access to the shared state of the circuit breaker will be synchronized
// using this lock name. Errors are aggregated across all requests to execute a
// target function or method.
lockName = "CircuitBreaker-#createUUID()#";
return( this );
}
// ---
// PUBLIC METHODS.
// ---
/**
* I marshal the given action inside the Circuit Breaker.
*
* @target I am the function or closure to be invoked.
* @fallback I am the value to be evaluated if the action fails to complete successfully.
* @output false
*/
public any function execute(
required any target,
any fallback
) {
try {
return( run( target ) );
} catch ( any error ) {
// If a fallback has been provided, return the fallback instead of letting
// the error propagate to the calling context.
if ( structKeyExists( arguments, "fallback" ) ) {
return( evaluateFallback( fallback ) );
}
rethrow;
}
}
/**
* I marshal the given action inside the Circuit Breaker.
*
* @target I am the component receiving the message.
* @methodName I am the message being sent to the target.
* @methodArguments I am the message arguments being sent to the target.
* @fallback I am the value to be evaluated if the action fails to complete successfully.
* @output false
*/
public any function executeMethod(
required any target,
required string methodName,
any methodArguments = [],
any fallback
) {
try {
return( run( target, methodName, methodArguments ) );
} catch ( any error ) {
// If a fallback has been provided, return the fallback instead of letting
// the error propagate to the calling context.
if ( structKeyExists( arguments, "fallback" ) ) {
return( evaluateFallback( fallback ) );
}
rethrow;
}
}
/**
* I determine if the Circuit Breaker is in a closed state.
*
* @output false
*/
public boolean function isClosed() {
return( state == states.CLOSED );
}
/**
* I determine if the Circuit Breaker is in an open state.
*
* NOTE: The half-open state is considered open for our purposes.
*
* @output false
*/
public boolean function isOpen() {
return( state != states.CLOSED );
}
// ---
// PRIVATE METHODS.
// ---
/**
* I evaluate the given fallback input to produce an output. If the fallback is a
* function or closure, it will be invoked; otherwise, it will be returned as-is.
*
* @fallback I am the fallback producer being evaluated.
* @output false
*/
private any function evaluateFallback( required any fallback ) {
if ( isCustomFunction( fallback ) || isClosure( fallback ) ) {
return( fallback() );
} else {
return( fallback );
}
}
/**
* I proxy the execution / invocation of the given action.
*
* @target I am the function or component being executed.
* @methodName I am the message being sent to the target (if it's a component).
* @methodArguments I am the message arguments being sent to the target (if it's a component).
* @output false
*/
public any function run(
required any target,
string methodName,
any methodArguments
) {
// CAUTION: All reading-from and writing-to the shared state of the circuit
// breaker is being SYNCHRONIZED with exclusive locking. While this does incur
// some overhead, no heavy processing should being done inside the locks. As such
// the duration of any lock should be negligible. Each request has two locks:
// one before the target execution to test state and one after the target
// execution to clean up state.
lock
name = lockName
type = "exclusive"
timeout = 1
throwOnTimeout = true
{
var currentTick = getTickCount();
// If we have reached the threshold of active requests, we are going to
// consider the circuit to be at capacity (and unable to accept any new
// request traffic).
var isCircuitAtCapactiy = ( activeRequestCount == activeRequestThreshold );
// If the circuit breaker is currently closed (ie, flowing), check to see if
// we're about to go over the active request threshold.
if ( ( state == states.CLOSED ) && isCircuitAtCapactiy ) {
// There are too many concurrent requests still awaiting a response from
// the target; this likely means that the target is having trouble
// responding and has become unhealthy - trip the breaker open.
state = states.OPENED;
// Keep the breaker open until some time in the future (giving the target
// a chance to return to a healthy state).
checkTargetHealthAtTick = ( currentTick + openStateTimeout );
}
// If the circuit breaker is currently open (ie, not flowing), then we either
// want to fail-fast or perform a single test on the target to see if we can
// close (ie, allow flow on) the circuit.
if ( state == states.OPENED ) {
// If we're still in the timeout for the open circuit, we'll consider the
// target to still be in a state of recovery.
var isTargetRecovering = ( currentTick < checkTargetHealthAtTick );
// Check to see if we need to fail fast.
if ( isCircuitAtCapactiy || isTargetRecovering ) {
throw(
type = "CircuitBreakerOpen",
message = "Target invocation failing fast due to open circuit breaker.",
detail = "The circuit is open and therefore the requested action could not be executed."
extendedInfo = "Active request count: [#activeRequestCount#], Failed request count: [#failedRequestCount#], Testing health in [#( checkTargetHealthAtTick - currentTick )#]."
);
}
// If we made it this far, the circuit break is open; but, we want to
// allow a single test (the current request) to be run against the target
// in order to see if the target has reached a healthy state (at which
// point the circuit can be closed once again). To make sure that no
// parallel requests try to perform the same test, push out the timeout.
// --
// NOTE: This is an implied HALF-OPEN state.
checkTargetHealthAtTick = ( currentTick + openStateTimeout );
}
activeRequestCount++;
} // END: Lock.
try {
// Try to execute the requested action.
var result = ( isClosure( target ) || isCustomFunction( target ) )
? target()
: invoke( target, methodName, methodArguments )
;
lock
name = lockName
type = "exclusive"
timeout = 1
throwOnTimeout = true
{
activeRequestCount--; // Circuit breaker is no longer at-capacity.
// If we made it this far, it means that the target method invocation has
// completed successfully. As such, we can clean up any opened state.
if ( state == states.OPENED ) {
state = states.CLOSED;
failedRequestCount = 0;
}
} // END: Lock.
// The target method may not return a defined value, even in a successful
// invocation. As such, we have to check to see if the result exists before
// we try to return the result upstream.
if ( structKeyExists( local, "result" ) ) {
return( result );
} else {
return; // void.
}
// Catch any errors thrown by target invocation.
} catch ( any error ) {
lock
name = lockName
type = "exclusive"
timeout = 1
throwOnTimeout = true
{
activeRequestCount--; // Circuit breaker is no longer at-capacity.
var currentTick = getTickCount();
// If the previous error occurred in the distant past (ie, a time greater
// than the open-state timeout), reset the error count before we record
// the current failure.
if ( lastFailedRequestAtTick < ( currentTick - openStateTimeout ) ) {
failedRequestCount = 0;
}
lastFailedRequestAtTick = currentTick;
// If we made it here, the invocation of the target method failed (ie,
// threw an error); as such, we need to check to see if this failure
// pushed us past the failure capacity of the circuit breaker.
if ( ++failedRequestCount > failedRequestThreshold ) {
// Too many requests against the target have failed. The target is
// likely in an unhealthy state. Trip the circuit open.
state = states.OPENED;
// Keep the breaker open until some time in the future (giving the
// target a chance to return to a healthy state).
checkTargetHealthAtTick = ( currentTick + openStateTimeout );
}
} // END: Lock.
rethrow;
}
}
}
From the various Circuit Breaker implementations that I've looked at online, the complexity can range from super simple to mind-bogglingly complex. I hope to keep my experimentation on the simple side. I think building the fallback value into the Circuit Breaker itself simplifies consumption without increasing the complexity of the internal code.
Want to use code from this post? Check out the license.
Reader Comments
@All,
After this, I read up on some approaches to Circuit Breakers in which the state management is broken out into its own component. I wanted to take a look at how that might work with my implementation:
www.bennadel.com/blog/3171-extracting-state-management-from-a-circuit-breaker-in-coldfusion.htm
It's actually kind of nice - it forces the handling of requests to follow a generic run-book, allowing the state management implementation to be polymorphic. This also makes things easier to test since you can test the state management directly (making the test surface area of the Circuit Breaker itself quite small).