Distributed tracing with Linkerd

Using distributed tracing in practice can be complex, for a high level explanation of what you get and how it is done, we've assembled a list of myths.

This guide will walk you through configuring and enabling tracing for emojivoto. Jump to the end for some recommendations on the best way to make use of distributed tracing with Linkerd.

To use distributed tracing, you'll need to:

  • Add a collector which receives spans from your application and Linkerd.
  • Add a tracing backend to explore traces.
  • Modify your application to emit spans.
  • Configure Linkerd's proxies to emit spans.

In the case of emojivoto, once all these steps are complete there will be a topology that looks like:

Topology
Topology

Prerequisites

  • To use this guide, you'll need to have Linkerd installed on your cluster. Follow the Installing Linkerd Guide if you haven't already done this.

Install Trace Collector & Jaeger

The first step of getting distributed tracing setup is installing a collector onto your cluster. This component consists of “receivers” that consume spans emitted from the mesh and your applications as well as “exporters” that convert spans and forward them to a backend.

Next, we will need Jaeger in your cluster. The all inone configuration will store the traces, make them searchable and provide visualization of all the data being emitted.

Linkerd now has add-ons which enables users to install extra components that integrate well with Linkerd. Tracing is such one add-on which includes OpenCensus Collector and Jaeger

To add the Tracing Add-On to your cluster, run:

First, we would need a configuration file where we enable the Tracing Add-On.

cat >> config.yaml << EOF
tracing:
  enabled: true
EOF

This configuration file can also be used to apply Add-On configuration (not just specific to tracing Add-On). More information on the configuration fields allowed, can be found here

Now, the above configuration can be applied using the --addon-config file with CLI or through values.yaml with Helm.

linkerd upgrade --addon-config config.yaml | kubectl apply -f -

You will now have a linker-collector and linkerd-jaeger deployments in the linkerd namespace that are running as part of the mesh. Collector has been configured to:

  • Receive spans from OpenCensus clients
  • Export spans to a Jaeger backend

The collector is extremely configurable and can use the receiver or exporter of your choice.

Jaeger itself is made up of many components. The all-in-one image bundles all these components into a single container to make demos and showing tracing off a little bit easier.

Before moving onto the next step, make sure everything is up and running with kubectl:

kubectl -n linkerd rollout status deploy/linkerd-collector
kubectl -n linkerd rollout status deploy/linkerd-jaeger

Install Emojivoto

Add emojivoto to your cluster with:

kubectl apply -f https://run.linkerd.io/emojivoto.yml

It is possible to use linkerd inject to add the proxy to emojivoto as outlined in getting started. Alternatively, annotations can do the same thing. You can patch these onto the running application with:

kubectl -n emojivoto patch -f https://run.linkerd.io/emojivoto.yml -p '
spec:
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
        config.linkerd.io/trace-collector: linkerd-collector.linkerd:55678
        config.alpha.linkerd.io/trace-collector-service-account: linkerd-collector
'

Before moving onto the next step, make sure everything is up and running with kubectl:

kubectl -n emojivoto rollout status deploy/web

Modify the application

Unlike most features of a service mesh, distributed tracing requires modifying the source of your application. Tracing needs some way to tie incoming requests to your application together with outgoing requests to dependent services. To do this, some headers are added to each request that contain a unique ID for the trace. Linkerd uses the b3 propagation format to tie these things together.

We've already modified emojivoto to instrument its requests with this information, this commit shows how this was done. For most programming languages, it simply requires the addition of a client library to take care of this. Emojivoto uses the OpenCensus client, but others can be used.

To enable tracing in emojivoto, run:

kubectl -n emojivoto set env --all deploy OC_AGENT_HOST=linkerd-collector.linkerd:55678

This command will add an environment variable that enables the applications to propagate context and emit spans.

Explore Jaeger

With vote-bot starting traces for every request, spans should now be showing up in Jaeger. To get to the UI, start a port forward and send your browser to http://localhost:16686.

kubectl -n linkerd port-forward svc/linkerd-jaeger 16686
Jaeger
Jaeger

You can search for any service in the dropdown and click Find Traces. vote-bot is a great way to get started.

Search
Search

Clicking on a specific trace will provide all the details, you'll be able to see the spans for every proxy!

Search
Search

There sure are a lot of linkerd-proxy spans in that output. Internally, the proxy has a server and client side. When a request goes through the proxy, it is received by the server and then issued by the client. For a single request that goes between two meshed pods, there will be a total of 4 spans. Two will be on the source side as the request traverses that proxy and two will be on the destination side as the request is received by the remote proxy.

Additionally, As the proxy adds application meta-data as trace attributes, Users can directly jump into related resources traces directly from the linkerd-web dashboard by clicking the Jaeger icon in the Metrics Table, as shown below

Linkerd-Jaeger
Linkerd-Jaeger

Cleanup

To cleanup, remove the tracing components along with emojivoto by running:

kubectl delete ns tracing emojivoto

Troubleshooting

I don't see any spans for the proxies

The Linkerd proxy uses the b3 propagation format. Some client libraries, such as Jaeger, use different formats by default. You'll want to configure your client library to use the b3 format to have the proxies participate in traces.

I don't see any traces

Instead of requiring complex client configuration to ensure spans are encrypted in transit, Linkerd relies on its mTLS implementation. This means that it is required the collector is part of the mesh. If you are using a service account other than default for the collector, the proxies must be configured to use this as well with the config.alpha.linkerd.io/trace-collector-service-account annotation.

Recommendations

Ingress

The ingress is an especially important component for distributed tracing because it creates the root span of each trace and is responsible for deciding if that trace should be sampled or not. Having the ingress make all sampling decisions ensures that either an entire trace is sampled or none of it is, and avoids creating “partial traces”.

Distributed tracing systems all rely on services to propagate metadata about the current trace from requests that they receive to requests that they send. This metadata, called the trace context, is usually encoded in one or more request headers. There are many different trace context header formats and while we hope that the ecosystem will eventually converge on open standards like W3C tracecontext, we only use the b3 format today. Being one of the earliest widely used formats, it has the widest support, especially among ingresses like Nginx.

This reference architecture includes a simple Nginx config that samples 50% of traces and emits trace data to the collector (using the Zipkin protocol). Any ingress controller can be used here in place of Nginx as long as it:

  • Supports probabilistic sampling
  • Encodes trace context in the b3 format
  • Emits spans in a protocol supported by the OpenCensus collector

If using helm to install ingress-nginx, you can configure tracing by using:

controller:
  config:
    enable-opentracing: "true"
    zipkin-collector-host: linkerd-collector.linkerd

Client Library

While it is possible for services to manually propagate trace propagation headers, it's usually much easier to use a library which does three things:

  • Propagates the trace context from incoming request headers to outgoing request headers
  • Modifies the trace context (i.e. starts a new span)
  • Transmits this data to a trace collector

We recommend using OpenCensus in your service and configuring it with:

The OpenCensus agent exporter will export trace data to the OpenCensus collector over a gRPC API. The details of how to configure OpenCensus will vary language by language, but there are guides for many popular languages. You can also see an end-to-end example of this in Go with our example application, Emojivoto.

You may notice that the OpenCensus project is in maintenance mode and will become part of OpenTelemetry. Unfortunately, OpenTelemetry is not yet production ready and so OpenCensus remains our recommendation for the moment.

It is possible to use many other tracing client libraries as well. Just make sure the b3 propagation format is being used and the client library can export its spans in a format the collector has been configured to receive.

Collector: OpenCensus

The OpenCensus collector receives trace data from the OpenCensus agent exporter and potentially does translation and filtering before sending that data to Jaeger. Having the OpenCensus exporter send to the OpenCensus collector gives us a lot of flexibility: we can switch to any backend that OpenCensus supports without needing to interrupt the application.

Backend: Jaeger

Jaeger is one of the most widely used tracing backends and for good reason: it is easy to use and does a great job of visualizing traces. However, any backend supported by OpenCensus can be used instead.

Linkerd

If your application is injected with Linkerd, the Linkerd proxy will participate in the traces and will also emit trace data to the OpenCensus collector. This enriches the trace data and allows you to see exactly how much time requests are spending in the proxy and on the wire. To enable Linkerd's participation:

  • Set the config.linkerd.io/trace-collector annotation on the namespace or pod specs that you want to participate in traces. This should be set to the address of the OpenCensus collector service.
  • Set the config.alpha.linkerd.io/trace-collector-service-account annotation on the namespace of pod specs that you want to participate in traces. This should be set to the name of the service account of the collector and is used to ensure secure communication between the proxy and the collector. This can be omitted if the collector is running as the default service account.
  • Ensure the OpenCensus collector is injected with the Linkerd proxy.

While Linkerd can only actively participate in traces that use the b3 propagation format, Linkerd will always forward unknown request headers transparently, which means it will never interfere with traces that use other propagation formats.