Injecting Faults
It is easy to inject failures into applications by using the HTTPRoute resource to redirect a percentage of traffic to a specific backend. This backend is completely flexible and can return whatever responses you want - 500s, timeouts or even crazy payloads.
The books demo is a great way to show off this behavior. The overall topology looks like:
In this guide, you will split some of the requests from webapp
to books
.
Most requests will end up at the correct books
destination, however some of
them will be redirected to a faulty backend. This backend will return 500s for
every request and inject faults into the webapp
service. No code changes are
required and as this method is configuration driven, it is a process that can be
added to integration tests and CI pipelines. If you are really living the chaos
engineering lifestyle, fault injection could even be used in production.
Prerequisites
To use this guide, you’ll need a Kubernetes cluster running:
- Linkerd and Linkerd-Viz. If you haven’t installed these yet, follow the Installing Linkerd Guide.
Setup the service
First, add the books sample application to your cluster:
kubectl create ns booksapp && \
linkerd inject https://run.linkerd.io/booksapp.yml | \
kubectl -n booksapp apply -f -
As this manifest is used as a demo elsewhere, it has been configured with an error rate. To show how fault injection works, the error rate needs to be removed so that there is a reliable baseline. To increase success rate for booksapp to 100%, run:
kubectl -n booksapp patch deploy authors \
--type='json' \
-p='[{"op":"remove", "path":"/spec/template/spec/containers/0/env/2"}]'
After a little while, the stats will show 100% success rate. You can verify this by running:
linkerd viz -n booksapp stat-inbound deploy
The output will end up looking at little like:
NAME SERVER ROUTE TYPE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
authors [default]:4191 [default] 100.00% 0.20 0ms 1ms 1ms
authors [default]:7001 [default] 100.00% 3.00 2ms 36ms 43ms
books [default]:4191 [default] 100.00% 0.23 4ms 4ms 4ms
books [default]:7002 [default] 100.00% 3.60 2ms 2ms 2ms
traffic [default]:4191 [default] 100.00% 0.22 0ms 3ms 1ms
webapp [default]:4191 [default] 100.00% 0.72 4ms 5ms 1ms
webapp [default]:7000 [default] 100.00% 3.25 2ms 2ms 65ms
Create the faulty backend
Injecting faults into booksapp requires a service that is configured to return errors. To do this, you can start NGINX and configure it to return 500s by running:
cat <<EOF | linkerd inject - | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: error-injector
namespace: booksapp
data:
nginx.conf: |-
events {}
http {
server {
listen 8080;
location / {
return 500;
}
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: error-injector
namespace: booksapp
labels:
app: error-injector
spec:
selector:
matchLabels:
app: error-injector
replicas: 1
template:
metadata:
labels:
app: error-injector
spec:
containers:
- name: nginx
image: nginx:alpine
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: nginx-config
configMap:
name: error-injector
---
apiVersion: v1
kind: Service
metadata:
name: error-injector
namespace: booksapp
spec:
ports:
- name: service
port: 8080
selector:
app: error-injector
EOF
Inject faults
With booksapp and NGINX running, it is now time to partially split the traffic
between an existing backend, books
, and the newly created
error-injector
. This is done by adding an
HTTPRoute configuration to your cluster:
cat <<EOF | kubectl apply -f -
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
name: error-split
namespace: booksapp
spec:
parentRefs:
- name: books
kind: Service
group: core
port: 7002
rules:
- backendRefs:
- name: books
port: 7002
weight: 90
- name: error-injector
port: 8080
weight: 10
EOF
Note
Two versions of the HTTPRoute resource may be used with Linkerd:
- The upstream version provided by the Gateway API, with the
gateway.networking.k8s.io
API group - A Linkerd-specific CRD provided by Linkerd, with the
policy.linkerd.io
API group
The two HTTPRoute resource definitions are similar, but the Linkerd version implements experimental features not yet available with the upstream Gateway API resource definition. See the HTTPRoute reference documentation for details.
When Linkerd sees traffic going to the books
service, it will send 9/10
requests to the original service and 1/10 to the error injector. You can see
what this looks like by running stat-outbound
:
linkerd viz stat-outbound -n booksapp deploy/webapp
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
webapp authors:7001 [default] 98.44% 4.28 25ms 47ms 50ms 0.00% 0.00%
└────────────────────► authors:7001 98.44% 4.28 15ms 42ms 48ms 0.00%
webapp books:7002 error-split HTTPRoute 87.76% 7.22 26ms 49ms 333ms 0.00% 0.00%
├────────────────────► books:7002 100.00% 6.33 14ms 42ms 83ms 0.00%
└────────────────────► error-injector:8080 0.00% 0.88 12ms 24ms 25ms 0.00%
We can see here that 0.88 requests per second are being sent to the error injector and that the overall success rate is 87.76%.
Cleanup
To remove everything in this guide from your cluster, run:
kubectl delete ns booksapp