• GitHub
  • Slack
  • Linkerd Forum

Handling Rate-Limited Endpoints

When backends implement rate limiting and return HTTP 429 or gRPC RESOURCE_EXHAUSTED by default, the proxy treats these as successful responses from a load balancing perspective. Since these types of responses are typically very fast, Linkerd’s EWMA load balancing may actually send more traffic to these rate-limited endpoints. This can create a feedback loop where clients experience high 429 or RESOURCE_EXHAUSTED rates.

Linkerd has two experimental features to help route traffic away from endpoints which are in a rate-limited state.

Linkerd Production Tip

This page contains best-effort instructions by the open source community. Production users with mission-critical applications should familiarize themselves with Linkerd production resources and/or connect with a commercial Linkerd provider.

Warning

Rate Limit Aware Load Balancing is an experimental, opt-in feature.

Load Biaser

Linkerd can be configured to use a more sophisticated version of the EWMA load balancing algorithm which takes rate-limit responses (HTTP 429 or gRPC RESOURCE_EXHAUSTED) into account. This algorithm is called the Load Biaser because it biases traffic away from endpoints which have returned rate-limit responses recently.

The Load Biaser works exactly the same as EWMA except that when it receives a rate-limited response, it substitutes a fixed penalty value for the response’s actual latency (unless the latency is higher). For example, if the penalty is configured to be 5s and the Load Biaser receives a 429 response in 10ms, it will treat the latency of that response as 5s for load balancing purposes.

In this way, the load balancer will not favor endpoints which return rate-limited responses quickly.

The penalty value can be further refined if the server sets the Retry-After HTTP response header or the grpc-retry-pushback-ms gRPC trailer. If one of these values is present and is higher than the configured penalty, it will be used in place of the penalty. This allows servers to exert a higher or lower amount of pushback.

To enable Linkerd to use the Load Biaser for a Service, set the following annotation on the Service resource:

AnnotationTypeDefaultNotes
balancer.alpha.linkerd.io/penalize-failuresboolfalseEnables the Load Biaser for this Service

The Load Biaser can be further configured with these annotations on the Service resource:

AnnotationTypeDefault
balancer.alpha.linkerd.io/load-biaser-penaltyduration5sThe latency value to inject for rate-limited responses and failures
balancer.alpha.linkerd.io/load-biaser-max-retry-afterduration300sThe maximum allowed value of a Retry-After header

Unified Circuit Breaker

Linkerd can be configured to use a more sophisticated version of consecutive failures failure accrual called Unified failure accrual.

The Unified failure accrual can be configured with a success rate threshold. If the percent of responses within a fixed time window drops below this threshold, the circuit breaker will trip, temporarily cutting off traffic to this endpoint and giving it time to recover. Critically, any rate-limited responses will count as failures for this success rate calculation.

The Unified failure accrual will ALSO trip if it encounters a configured number of consecutive failures, just like the consecutive failures accrual.

To enable the Unified failure accrual circuit breaker on a Service, set the following annotation to "unified" on the Service resource:

AnnotationTypeDefaultNotes
balancer.linkerd.io/failure-accrualstring.NoneThe failure-accrual mode. Set to unified to enable Unified failure accrual

The Unified failure accrual can be further configured with these annotations on the Service resouce:

AnnotationTypeDefaultNotes
balancer.alpha.linkerd.io/failure-accrual-success-rate-thresholdnumber between 0 and 10.8The success rate threshold at which to trip the breaker
balancer.alpha.linkerd.io/failure-accrual-success-rate-windowduration10sThe window over which the success rate is calculated
balancer.alpha.linkerd.io/failure-accrual-success-rate-min-requestsnumber5Only trip if there are at least this many requests in the window
balancer.linkerd.io/failure-accrual-consecutive-max-failuresnumber7Trip if we encounter this many consecutive failures
balancer.linkerd.io/failure-accrual-consecutive-min-penaltyduration1sThe minimum duration for which to cut off traffic
balancer.linkerd.io/failure-accrual-consecutive-max-penaltyduration1mThe maximum duration for which to cut off traffic
balancer.linkerd.io/failure-accrual-consecutive-jitter-rationumber between 0.0 and 100.00.5The amount of randomness to inject into the backoff

See the reference documentation for details on failure accrual configuration.