Retries
Linkerd can be configured to automatically retry requests when it receives a failed response instead of immediately returning that failure to the client. This is valuable tool for improving success rate in the face of transient failures.
Retries are a client-side behavior, and are therefore performed by the outbound side of the Linkerd proxy.1 If retries are configured on an HTTPRoute or GRPCRoute with multiple backends, each retry of a request can potentially get sent to a different backend. If a request has a body larger than 64KiB then it will not be retried.
Configuring Retries
Retries are configured by a set of annotations which can be set on a Kubernetes Service resource or on a HTTPRoute or GRPCRoute which has a Service as a parent. Client proxies will then retry failed requests to that Service or route. If any retry configuration annotations are present on a route resource, they override all retry configuration annotations on the parent Service.
retry.linkerd.io/http
: A comma separated list of HTTP response codes which should be retried. Each element of the list may bexxx
to retry a single response code (for example,"504"
– remember, annotation values must be strings!);xxx-yyy
to retry a range of response codes (for example,500-504
);gateway-error
to retry response codes 502-504; or5xx
to retry all 5XX response codes. This annotation is not valid on GRPCRoute resources.
retry.linkerd.io/grpc
: A comma seperated list of gRPC status codes which should be retried. Each element of the list may becancelled
deadline-exceeded
internal
resource-exhausted
unavailable
This annotation is not valid on HTTPRoute resources.
retry.linkerd.io/limit
: The maximum number of times a request can be retried. If unspecified, the default is1
.retry.linkerd.io/timeout
: A retry timeout after which a request is cancelled and retried (if the retry limit has not yet been reached). If unspecified, no retry timeout is applied. Units must be specified in this value e.g.5s
or200ms
.
Examples
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: schlep-default
namespace: schlep
annotations:
retry.linkerd.io/http: 5xx
retry.linkerd.io/limit: "2"
retry.linkerd.io/timeout: 300ms
spec:
parentRefs:
- name: schlep
kind: Service
group: core
port: 80
rules:
- matches:
- path:
type: PathPrefix
value: "/"
kind: GRPCRoute
apiVersion: gateway.networking.k8s.io/v1alpha2
metadata:
name: schlep-default
namespace: schlep
annotations:
retry.linkerd.io/grpc: internal
retry.linkerd.io/limit: "2"
retry.linkerd.io/timeout: 400ms
spec:
parentRefs:
- name: schlep
kind: Service
group: core
port: 8080
rules:
- matches:
- method:
type: Exact
service: schlep.Schlep
method: Get
-
The part of the proxy which handles connections from within the pod to the rest of the cluster. ↩︎