Using the Debug Sidecar
Debugging a service mesh can be hard. When something just isn’t working, is the problem with the proxy? With the application? With the client? With the underlying network? Sometimes, nothing beats looking at raw network data.
In cases where you need network-level visibility into packets entering and
leaving your application, Linkerd provides a debug sidecar with some helpful
tooling. Similar to how proxy sidecar
injection works, you add a debug sidecar to
a pod by setting the config.linkerd.io/enable-debug-sidecar: "true"
annotation
at pod creation time. For convenience, the linkerd inject
command provides an
--enable-debug-sidecar
option that does this annotation for you.
(Note that the set of containers in a Kubernetes pod is not mutable, so simply adding this annotation to a pre-existing pod will not work. It must be present at pod creation time.)
Linkerd Production Tip
The debug sidecar image contains
tshark
, tcpdump
,
lsof
, and iproute2
. Once installed, it starts automatically logging all
incoming and outgoing traffic with tshark
, which can then be viewed with
kubectl logs
. Alternatively, you can use kubectl exec
to access the
container and run commands directly.
For instance, if you’ve gone through the Linkerd Getting Started guide and installed the emojivoto application, and wish to debug traffic to the voting service, you could run:
kubectl -n emojivoto get deploy/voting -o yaml \
| linkerd inject --enable-debug-sidecar - \
| kubectl apply -f -
to deploy the debug sidecar container to all pods in the voting service. (Note that there’s only one pod in this deployment, which will be recreated to do this–see the note about pod mutability above.)
You can confirm that the debug container is running by listing
all the containers in pods with the voting-svc
label:
kubectl get pods -n emojivoto -l app=voting-svc \
-o jsonpath='{.items[*].spec.containers[*].name}'
Then, you can watch live tshark output from the logs by simply running:
kubectl -n emojivoto logs deploy/voting linkerd-debug -f
If that’s not enough, you can exec to the container and run your own commands in the context of the network. For example, if you want to inspect the HTTP headers of the requests, you could run something like this:
kubectl -n emojivoto exec -it \
$(kubectl -n emojivoto get pod -l app=voting-svc \
-o jsonpath='{.items[0].metadata.name}') \
-c linkerd-debug -- tshark -i any -f "tcp" -V -Y "http.request"
A real-world error message written by the proxy that the debug sidecar is
effective in troubleshooting is a Connection Refused
error like this one:
ERR! [<time>] proxy={server=in listen=0.0.0.0:4143 remote=some.svc:50416}
linkerd2_proxy::app::errors unexpected error: error trying to connect:
Connection refused (os error 111) (address: 127.0.0.1:8080)
In this case, the tshark
command can be modified to listen for
traffic between the specific ports mentioned in the error, like this:
kubectl -n emojivoto exec -it \
$(kubectl -n emojivoto get pod -l app=voting-svc \
-o jsonpath='{.items[0].metadata.name}') \
-c linkerd-debug -- tshark -i any -f "tcp" -V \
-Y "(tcp.srcport == 4143 and tcp.dstport == 50416) or tcp.port == 8080"
Be aware that there is a similar error with the message Connection reset by peer
. This error is usually benign, if you do not see correlated errors or
messages in your application log output. In this scenario, the debug
container may not help to troubleshoot the error message.
ERR! [<time>] proxy={server=in listen=0.0.0.0:4143 remote=some.svc:35314}
linkerd2_proxy::app::errors unexpected error: connection error:
Connection reset by peer (os error 104)
Of course, these examples only work if you have the ability to exec
into
arbitrary containers in the Kubernetes cluster. See linkerd tap
for an alternative to this approach.