Distributed tracing with Linkerd
Using distributed tracing in practice can be complex. In this guide, we’ve assembled a recommendation on the best way to make use of distributed tracing with Linkerd.
First, let’s understand what exactly “distributed tracing support” looks like in Linkerd. It’s actually quite simple: when a Linkerd data plane proxy sees a tracing header in b3 format in a proxied HTTP request (see below for why this particular format), Linkerd will emit a trace span for that request. This span will include information about the exact amount of time spent in the Linkerd proxy, and, in the future, potentially other information as well.
And that’s it. As you can see, Linkerd’s role in distributed tracing is actually quite simple. The complexity lies in everything else that must be in place in order to make this feature of Linkerd useful.
What else is required? To use Linkerd’s new distributed tracing feature, you’ll need several additional components in your system:
- An ingress layer that kicks off the trace on particular requests.
- A client library for your application. (Your application code must propagate trace headers, and ideally emit its own spans as well.)
- A trace collector to collect span data and turn them into traces.
- A trace backend to store the trace data and allow the user to view/query it.
Let’s dive in to how distributed tracing works with a working example. Then we’ll describe each of the components in more detail and explain how to use those components in your own application.
First, make sure you’ve installed Linkerd version 2.6 or later.
$ linkerd version Client version: stable-2.6 Server version: stable-2.6
Start by cloning the reference architecture repository:
git clone [email protected]:adleong/emojivoto.git && \ cd emojivoto
Next, install Jaeger and the OpenCensus collector. It’s important to inject these components with Linkerd so that they can receive spans from the Linkerd proxy over a secure connection.
linkerd inject tracing.yml | kubectl apply -f -
Finally, install the NGINX ingress controller and the Emojivoto application itself. Since we inject these components with Linkerd, we will be able to see the Linkerd proxy itself in the resulting traces.
linkerd inject emojivoto.yml | kubectl apply -f - && \ linkerd inject ingress.yml | kubectl apply -f -
With all of that in place, we can finally use the Jaeger dashboard to explore traces flowing through the system.
kubectl -n tracing port-forward deploy/jaeger 16686 &; \ open http://localhost:16686
Congrats! You have a functioning distributed tracing system with Linkerd.
The architecture in this guide has four components: NGINX for ingress, OpenCensus for the client library, OpenCensus for the trace collector, and Jaeger for the backend. We’ll describe each of these components in more detail. Of course, each of these components is swappable–we’ve detailed the requirements for substituting a different option for each component below.
The ingress is an especially important component for distributed tracing because it creates the root span of each trace and is responsible for deciding if that trace should be sampled or not. Having the ingress make all sampling decisions ensures that either an entire trace is sampled or none of it is, and avoids creating “partial traces”.
Distributed tracing systems all rely on services to propagate metadata about the current trace from requests that they receive to requests that they send. This metadata, called the trace context, is usually encoded in one or more request headers. There are many different trace context header formats and while we hope that the ecosystem will eventually converge on open standards like W3C tracecontext, we only use the b3 format today. Being one of the earliest widely used formats, it has the widest support, especially among ingresses like NGINX.
This reference architecture includes a simple NGINX config that samples 50% of traces and emits trace data to the collector (using the Zipkin protocol). Any ingress controller can be used here in place of NGINX as long as it:
- Supports probabilistic sampling
- Encodes trace context in the b3 format
- Emits spans in a protocol supported by the OpenCensus collector
Client library: OpenCensus
While it is possible for services to manually propagate trace propagation headers, it’s usually much easier to use a library which does three things:
- Propagates the trace context from incoming request headers to outgoing request headers
- Modifies the trace context (i.e. starts a new span)
- Transmits this data to a trace collector
We recommend using OpenCensus in your service and configuring it with:
The OpenCensus agent exporter will export trace data to the OpenCensus collector over a gRPC API. The details of how to configure OpenCensus will vary language by language, but there are guides for many popular languages. You can also see an end-to-end example of this in Go with our example application, Emojivoto.
You may notice that the OpenCensus project is in maintenance mode and will become part of OpenTelemetry. Unfortunately, OpenTelemetry is not yet production ready and so OpenCensus remains our recommendation for the moment.
The OpenCensus collector receives trace data from the OpenCensus agent exporter and potentially does translation and filtering before sending that data to Jaeger. Having the OpenCensus exporter send to the OpenCensus collector gives us a lot of flexibility: we can switch to any backend that OpenCensus supports without needing to interrupt the application.
Jaeger is one of the most widely used tracing backends and for good reason: it is easy to use and does a great job of visualizing traces. However, any backend supported by OpenCensus can be used instead.
If your application is injected with Linkerd, the Linkerd proxy will participate in the traces and will also emit trace data to the OpenCensus collector. This enriches the trace data and allows you to see exactly how much time requests are spending in the proxy and on the wire. To enable Linkerd’s participation:
- Set the
config.linkerd.io/trace-collectorannotation on the namespace or pod specs that you want to participate in traces. This should be set to the address of the OpenCensus collector service. In our reference architecture this is
- Set the
config.alpha.linkerd.io/trace-collector-service-accountannotation on the namespace of pod specs that you want to participate in traces. This should be set to the name of the service account of the collector and is used to ensure secure communication between the proxy and the collector. This can be omitted if the collector is running as the default service account. This is the case for the reference architecture so we omit it.
- Ensure the pods that you want to emit spans are injected with the Linkerd proxy.
- Ensure the OpenCensus collector is injected with the Linkerd proxy.
While Linkerd can only actively participate in traces that use the b3 propagation format (as in the reference architecture above), Linkerd will always forward unknown request headers transparently, which means it will never interfere with traces that use other propagation formats. We would also love to expand Linkerd’s support for more propagation formats. Please open an issue (or pull request!) if this interests you.
Hopefully this guide makes it easier for you to understand the different moving parts of distributed tracing and to get started instrumenting your application. While this isn’t the only way to get distributed tracing for your application, we’d hope it represents a good starting point for your exploration.