Distributed tracing and instrumentation

As the number and complexity of services increases, uniform observability across the data center becomes more critical. Linkerd’s tracing and metrics instrumentation is designed to be aggregated, providing broad and granular insight into the health of all services. Linkerd’s role as a service mesh makes it the ideal data source for observability information, particularly in a polyglot environment.

Distributed tracing

As requests pass through multiple services, identifying peformance bottlenecks becomes increasingly difficult using traditional debugging techniques. Distributed tracing provides a holistic view of requests transiting through multiple services, allowing for immediate identification of latency issues.

With linkerd, distributed tracing comes for free. Simply configure linkerd to export tracing data to a backend trace aggregator, such as Zipkin. This will expose latency, retry, and failure information for each hop in a request.

Metrics

Linkerd provides detailed histograms of communication latency and payload sizes, as well as success rates and load-balancing statistics, in both human-readable and machine-parsable formats. This means that even polyglot applications can have a consistent, global view of application performance. There are hundreds of counters, gauges, and histograms available, including:

  • latency (avg, min, max, p99, p999, p9999)
  • request volume
  • payload sizes
  • success, retry, and failures counts
  • failure classification
  • heap and GC performance
  • load balancing statistics

While linkerd provides an open telemetry plugin interface for integration with any metrics aggregator, it includes three export formats out of the box: TwitterServer, Prometheus, and statsd.

Further reading

If you’re ready to start using distributed tracing in your setup, see the Tracers section of the linkerd config.

For configuring a telemeter such as StatsD, see the Telemetry section of the linkerd config.

For a guide on setting up an end-to-end distributed tracing pipeline, see Distributed Tracing for Polyglot Microservices on the Buoyant Blog.

For more detail about Metrics instrumention, see the Metrics section of the getting started guide.

For configuring your metrics endpoint, see the Admin section of the linkerd config.

For a guide on setting up an end-to-end monitoring pipeline on Kubernetes, see A Service Mesh for Kubernetes, Part I: Top-Line Service Metrics. For DC/OS, see linkerd on DC/OS for Service Discovery and Visibility. Both of these leverage our out-of-the-box linkerd+prometheus+grafana setup, linkerd-viz