Gathering Data In Parallel Inside An Asynchronous Generator-Based Workflow In JavaScript
In my post yesterday, I explored the idea of using ES6 Generators and the "yield" operator as a means of writing asynchronous code that looks and feels like it runs synchronously. This is great because it truly simplifies the syntactic boilerplate around Promises. But, when you start doing this, it's easy to lose sight of the fact that JavaScript's non-blocking nature is one of the reasons that we use it. As such, I wanted to do a quick follow-up post on how to execute parallel data access inside an asynchronous Generator-based workflow.
Imagine that we have an ES6 generator function (that we're wrapping in a promise-based workflow runner) that looks like this:
// I get the relational data for the user with the given Id.
function* getRelationalDataGenerator( id ) {
var user = yield( getUser( id ) );
var friends = yield( getFriends( user.id ) );
var enemies = yield( getEnemies( user.id ) );
return({
user,
friends,
enemies
});
}
Notice that we're getting the "user" object first and then using the "user.id" value to make the next two calls. This is super clean, easy to read, and returns the data that we're looking for. However, it fails to leverage the fact that the Friends collection and the Enemies collection can be queried in parallel; while they both rely on "user", neither of the collection queries relies on the other. So while this asynchronous code executes like its synchronous, it misses out on core JavaScript behavior.
To fix this, all we need to do is initiate the collection queries outside the context of a yield operator. This way, the asynchronous actions will do what they do best - run in parallel. Then, in order to join the parallel processing back to the synchronous workflow, so to speak, all we have to do is yield the values. This will automatically pause the generator until the parallel request is resolved.
// I get the relational data for the user with the given Id.
function* getRelationalDataGenerator( id ) {
// Before we do anything, we have to get the user and ensure existence.
var user = yield( getUser( id ) );
// Once we have the user, we can start to collect some of the subsequent data in
// parallel (to take advantage of JavaScript's non-blocking nature). To do this, all
// we have to do is initiate several asynchronous data-requests outside the context
// of a "yield" statement. These will naturally run in parallel.
var thread = {
friends: getFriends( user.id ),
enemies: getEnemies( user.id )
};
// Once we have the parallel data requests running in ... parallel, all we have to
// do is yield the values.
var friends = yield( thread.friends );
var enemies = yield( thread.enemies );
return({
user,
friends,
enemies
});
}
As you can see, I'm taking the collection queries for Friends and Enemies and I'm initiating them outside the context of "yield". This allows the queries to run in parallel. Each of these queries results in a Promise, which I'm aggregating in a plain-old JavaScript object called "thread". I chose the name "thread" because I felt that it nicely indicated the intent - that this data access was happening "outside" the synchronized generator workflow. Then, in order to "join" the collection queries back to the generator workflow, I'm using yield to pluck data out of the thread.
In a situation likes this, we don't need to use a construct like Promise.all(); yield is essentially doing that for us. Since our yield operators pause on each promise, and our workflow proxy binds to the resolution of said promises, the generator won't move onto the next step until each promise is resolved. As such, the series of yield operations becomes functionally-equivalent to a Promise.all() resolution.
That said, not using Promise.all() is just a personal choice. I'm trying to find a syntax that strikes a balance between brevity and clarity. If your JavaScript engine supports destructuring, you might like to pluck values right out of a Promise.all() resolution:
var [ friends, enemies ] = yield(
Promise.all([
getFriends( user.id ),
getEnemies( user.id )
])
);
While this code is certainly more concise, I feel like it requires more cognitive load. I tend to like having things a little more spelled-out thanks to my unfrozen caveman brain. But, again, just my personal preference.
Anyway, bringing this whole demo together, here's the parallel data access inside the context of an asynchronous generator-based workflow:
// I get the relational data for the user with the given Id.
function* getRelationalDataGenerator( id ) {
// Before we do anything, we have to get the user and ensure existence.
var user = yield( getUser( id ) );
// Once we have the user, we can start to collect some of the subsequent data in
// parallel (to take advantage of JavaScript's non-blocking nature). To do this, all
// we have to do is initiate several asynchronous data-requests outside the context
// of a "yield" statement. These will naturally run in parallel.
var thread = {
friends: getFriends( user.id ),
enemies: getEnemies( user.id )
};
// Once we have the parallel data requests running in ... parallel, all we have to
// do is yield the values.
var friends = yield( thread.friends );
var enemies = yield( thread.enemies );
return({
user,
friends,
enemies
});
}
// Invoke the generator as a promise-based workflow.
createPromiseWorkflow( getRelationalDataGenerator )
.call( null, 4 )
.then(
function handleResolve( data ) {
console.log( "Resolve:" );
console.log( data );
},
function handleReject( error ) {
console.log( "Reject:" );
console.log( error );
}
)
;
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
// I get the user with the given id.
function getUser( id ) {
var promise = Promise.resolve({
id: id,
name: "Sarah"
});
return( promise );
}
// I get the friends for the given user.
function getFriends( userId ) {
var promise = Promise.resolve([
{
id: 201,
name: "Joanna"
}
]);
return( promise );
}
// I get the enemies for the given user.
function getEnemies( userId ) {
var promise = Promise.resolve([
{
id: 301,
name: "Matt"
}
]);
return( promise );
}
// ----------------------------------------------------------------------------------- //
// ----------------------------------------------------------------------------------- //
// On its own, a Generator Function produces a generator, which is just a function
// that can be executed, in steps, as an iterator; it doesn't have any implicit promise
// functionality. However, if a generator happens to yields promises during iteration,
// we can wrap that generator in a proxy and let the proxy pipe yielded values back
// into the next iteration of the generator. In this manner, the proxy can manage an
// internal promise chain that ultimately manifests as a single promise returned by
// the proxy.
function createPromiseWorkflow( generatorFunction ) {
// Return the proxy that is now lexically-bound to the generator function.
return( iterationProxy );
// I proxy the generator and "reduce" its iteration values down to a single value,
// represented by a promise. Returns a promise.
function iterationProxy() {
// When we call the generator function, the body of the generator is NOT
// executed. Instead, an iterator is returned that can iterate over the
// segments of the generator body, delineated by yield statements.
var iterator = generatorFunction.apply( this, arguments );
// function* () {
// var a = yield( getA() ); // (1)
// var b = yield( getB() ); // (2)
// return( [ a, b ] ); // (3)
// }
// When we initiate the iteration, we need to catch any errors that may occur
// before the first "yield". Such an error will short-circuit the process and
// result in a rejected promise.
try {
// When we call .next() here, we are kicking off the iteration of the
// generator produced by our generator function. The function will start
// executing and run until it hits the first "yield" statement (1), which
// will return, as its result, the value supplied to the "yield" statement.
// The .next() result will look like this:
// --
// {
// done: false,
// value: getA() // Passed to "yield"; may or may not be a Promise.
// }
// --
// We then pipe this result back into the next iteration of the generator.
return( pipeResultBackIntoGenerator( iterator.next() ) );
} catch ( error ) {
return( Promise.reject( error ) );
}
// I take the given iterator result, extract the value, and pipe it back into
// the next iteration. Returns a promise.
// --
// NOTE: This function calls itself recursively, building up a promise-chain
// that represents each generator iteration step.
function pipeResultBackIntoGenerator( iteratorResult ) {
if ( iteratorResult.done ) {
// If the generator is done iterating through its function body, we can
// return one final promise of the value that was returned from the
// generator function (3). The iteratorResult would look like this:
// --
// {
// done: true,
// value: [ a, b ]
// }
// --
// So, our return() statement here really is doing this:
// --
// return( Promise.resolve( [ a, b ] ) ); // (3)
return( Promise.resolve( iteratorResult.value ) );
}
// If the generator is NOT DONE iterating through its function body, we need
// to bridge the gap between the yields. We can do this by turning each step
// into a promise that can build on itself recursively.
var intermediaryPromise = Promise
// Normalize the value returned by the iterator in order to ensure that
// its a promise (so that we know it is "thenable").
.resolve( iteratorResult.value )
.then(
function handleResolve( value ) {
// Once the promise has returned with a value, we need to
// pipe that value back into the generator function, which is
// currently paused on a "yield" statement. When we call
// .next( value ) here, we are replacing the currently-paused
// "yield" with the given "value", and resuming the iteration.
// Essentially, this pre-yielded statement:
// --
// var a = yield( getA() ); // (1)
// --
// ... becomes this after we call .next( value ):
// --
// var a = value; // (1)
// --
// At this point, the generator function continues its execution
// until the next yield; or until it hits the a return (implicit
// or explicit).
return( pipeResultBackIntoGenerator( iterator.next( value ) ) );
// CAUTION: If iterator.next() throws an error that is not
// handled by the generator, it will cause an exception inside
// this resolution handler, which will cause the promise to be
// rejected.
},
function handleReject( reason ) {
// If the promise value from the previous step results in a
// rejection, we need to pipe that rejection back into the
// generator where the generator may or may not be able to handle
// it gracefully. When we call iterator.throw(), we resume the
// generator function with an error. If the generator function
// doesn't catch this error, it will bubble up right here and
// cause an error inside the of handleReject() function (which
// will lead to a rejected promise). However, if the generator
// function catches the error and returns a value, that value
// will be wrapped in an iterator result and piped back into the
// generator.
return( pipeResultBackIntoGenerator( iterator.throw( reason ) ) );
}
)
;
return( intermediaryPromise );
}
}
}
When we run this code in the terminal, we get the following output:
Resolve:
{
user: { id: 4, name: 'Sarah' },
friends: [ { id: 201, name: 'Joanna' } ],
enemies: [ { id: 301, name: 'Matt' } ]
}
As you can see, even though the Friends and Enemies collections were gathered in parallel, the promises were all "re-joined" to the generator workflow through the subsequent yield operations.
When you're used to dealing with Promises, it can be very enticing to see the Promise boilerplate hidden behind a series of yield operations, inside an ES6 generator, managed by an asynchronous task runner. But, it's also easy to lose sight of the fact that Promises give us complete control over the order of operations. Luckily, we don't have to sacrifice one for the other; we can still run data access in parallel by initiating it outside the context of a yield operation.
Want to use code from this post? Check out the license.
Reader Comments
@All,
One of my co-workers, Scott Rippey, pointed-out something very astute about the way I'm gathering the data. It may allow errors to fail silently. Consider the code:
var thread = {
. . . friends: getFriends( user.id ),
. . . enemies: getEnemies( user.id )
};
Since the next yield() is on the "friends" collection, any error in the friends query should lead to a rejection, which should in turn lead to an Error being thrown.
However, if *both* the friends and enemies queries had errors, the *second error* would not get caught by the friends-yield. As such, it might fail silently.
Using Promise.all([ .... ]) would serve to catch both errors. That said, I would have to do some testing to see what actually happens when both error.
@All,
THAT SAID, a Promise.all() approach would still only return the first error that occurred. So, even if we had something like:
Promise.all([ getFriends(), getEnemies() ]);
... only the getFriends() error would be reported (assuming the enemies error took longer to show).
Of course, some of this just might be the risk you take when running a number of things in parallel.
Hey Ben,
The poem from your previous post is just hilarious :)
Here is what Douglas Crockford tells about generators in general: https://youtu.be/PSGEjv3Tqo0?t=5m48s - I cannot agree with him more.
At first I was fascinated about controlling async code with co and generators, but pretty quickly realized it is very limited. It is only suitable for very basic async computations that produce single values. There is no easy and natural way neither to compose computations nor to deal with errors - especially retrying.
What is offered by Rx(JS) is way, way superior + it is cross language paradigm.
@All,
I tried to do some noodling on the problem of error handling for parallel promises:
www.bennadel.com/blog/3125-on-the-difficult-problem-of-logging-errors-in-parallel-promises-in-javascript.htm
It seems to be a non-trivial problem. I was able to figure *something* out with the Q-library. But, definitely am open to better approaches. It's not just a "generator" problem -- it's a "promise" problem.
@Artur,
Thanks for the link, I'll definitely check it out. I'm always in the mood for some Crockford goodness.
I just recently started getting into RxJS for Angular 2. It's cool stuff, though not the easiest thing to pick up. Then again, neither were promises :D I'm still not exactly sure how / when is the right time to use event streams. Definitely cool stuff, though.
Just want to mention that all references to parallel should likely be changed to concurrently as parallel suggests true multi-threading or multiprocess, while JavaScript is single threaded. Only a small point but worth being aware of.
@Mark,
Yet concurrently means "at the same time", which is NOT happening; and parallel means "going in the same direction", which is.
@Nick,
In software it's more accurate to interpret "parallel" as being actually "at the same time" (e.g. utilising multiple processes) and "concurrency" as being a "context switch" mechanism.
Concurrency is what's happening in JS so that's how it should be described in this article. Otherwise it can become misleading.
@Mark, @Nick
Since I am not formally versed in the "computer sciencey" aspects of the terminology, I doubt I would be able to consistently use the correct term no matter which direction I use :) Like interchanging "function" and "method" or "object" and "class". Yes, they all mean different things - and if I was smarter, I would get it right; but, I'm not quite there yet.