Using The "X-Amzn-Trace-Id" Header For Request Tracing Through Amazon's Load Balancers
At InVision, David Bainbridge and I have been working hard to figure out why some users are getting randomly logged-out of one of our client-side applications. Part of what makes this issue so challenging to debug is that there are many services that touch requests coming out of this application. And, even though we are using request tracing headers in our distributed system, we are struggling to connect the dots as those requests pass-through Amazon's load balancers. Yesterday, however, David discovered that Amazon's load balancers will record (and modify) the HTTP header, X-Amzn-Trace-Id
, within their request logs. I think this may really help us!
As requests pass through the InVision infrastructure, they are supposed to include - and propagate - the following request tracing HTTP headers:
Request-ID
- a unique ID for every request.Request-Source
- the originator of the request.Calling-Source
- the previous hop in the request's network chain.
CAVEAT: I say "supposed to" because this was an evolving architectural decision. As such, not all services - especially older services - uphold these tracing requirements.
Using HTTP headers for request tracing is great when you are logging information inside one of your own application / domain services. But, Amazon doesn't know anything about these architectural decisions. As such, none of our Amazon ALB (Application Load Balancer) log aggregation contains Request-ID
, Request-Source
, or Calling-Service
.
Which brings us back to the HTTP header, X-Amzn-Trace-Id
. According to their documentation, as requests come into the load balancer, Amazon will look for the X-Amzn-Trace-Id
header. If it doesn't exist, Amazon will inject the tracing ID (and log it). And, if the HTTP header does exist, Amazon will modify it with the tracing ID (and log it).
To test this, I popped into one of our AngularJS application, grabbed the $http
service out of the root injector, and tried making an AJAX (Asynchronous JavaScript and JSON) request to a non-existent end-point. The request replicated our tracing headers in the X-Amzn-Trace-Id
header:
// Running this in my Chrome dev tools:
var requestID = `uid${ Date.now() }`;
var callingService = "MySPA";
angular.element( document )
.injector()
.get( "$http" )
.get(
"/this/endpoint/doesnt/exist",
{
headers: {
"Request-ID": requestID,
"Calling-Service": callingService,
// As requests pass-through Amazon load balancers, the "X-Amzn-Trace-Id"
// HTTP header is either injected (if it doesn't exist) or it is modified
// (where the existing value is treated as a semi-colon delimited list).
"X-Amzn-Trace-Id": `Request-ID=${ requestID };Calling-Service=${ callingService }`
}
}
)
;
Notice that my outbound X-Amzn-Trace-Id
includes duplication of both the Request-ID
and the Calling-Service
values, treated as a semi-colon-delimited list. This results in a a 404 Not Found
response; but, when we look in Loggly (our log aggregation service), we see the request passing through our load balancer (click for larger version):
As you can see, even though this Amazon load balancer log item doesn't show our HTTP tracing headers, it does show the X-Amzn-Trace-Id
header which now contains a copy of our HTTP tracing headers!
I don't know if this tracing will help us get the bottom of our problem. But, at the very least, it will help us correlate client-side requests with the traffic that goes through our load balancers. And, after weeks of debugging this problem, every little bit of information is more than welcomed!
Want to use code from this post? Check out the license.
Reader Comments
@All,
Continuing on in this investigation, now that I have the
Request-ID
in the Amazon ALB logs, I want to get it in the Nginx access logs as well. I can do this with a custom formatter:www.bennadel.com/blog/4055-including-tracing-headers-in-nginx-1-18-0-access-logs-using-custom-formatting.htm
In our stack, each ColdFusion container runs both a Lucee CFML server and an nginx server that acts as a reverse proxy. The ColdFusion code already has request tracing (added for this investigation). So now, I want to see if the requests are dying in the nginx → ColdFusion network hop.