HTTP 503 and 504 Errors

503s and 504s show up when a Linkerd proxy is trying to make so many requests to a workload that it gets overwhelmed.

When the workload next to a proxy makes a request, the proxy adds it to an internal dispatch queue. When things are going smoothly, the request is pulled from the queue and dispatched almost immediately. If the queue gets too long, though (which can generally happen only if the called service is slow to respond), the proxy will go into load-shedding, where any new request gets an immediate 503. The proxy can only get out of load-shedding when the queue shrinks.

Failfast also plays a role here: if the proxy puts a service into failfast while there are requests in the dispatch queue, all the requests in the dispatch queue get an immediate 504 before the proxy goes into load-shedding.

To get out of failfast, some endpoints for the service have to become available.

To get out of load-shedding, the dispatch queue has to start emptying, which implies that the service has to get more capacity to process requests or that the incoming request rate has to drop.