Configuring Proxy Concurrency

The Linkerd data plane’s proxies are multithreaded, and are capable of running a variable number of worker threads so that their resource usage matches the application workload.

In a vacuum, of course, proxies will exhibit the best throughput and lowest latency when allowed to use as many CPU cores as possible. However, in practice, there are other considerations to take into account.

A real world deployment is not a load test where clients and servers perform no other work beyond saturating the proxy with requests. Instead, the service mesh model has proxy instances deployed as sidecars to application containers. Each proxy only handles traffic to and from the pod it is injected into. This means that throughput and latency are limited by the application workload. If an application container instance can only handle so many requests per second, it may not actually matter that the proxy could handle more. In fact, giving the proxy more CPU cores than it requires to keep up with the application may harm overall performance, as the application may have to compete with the proxy for finite system resources.

Therefore, it is more important for individual proxies to handle their traffic efficiently than to configure all proxies to handle the maximum possible load. The primary method of tuning proxy resource usage is limiting the number of worker threads used by the proxy to forward traffic. There are multiple methods for doing this.

Using the `proxy-cpu-limit` Annotation

The simplest way to configure the proxy’s thread pool is using the config.linkerd.io/proxy-cpu-limit annotation. This annotation configures the proxy injector to set an environment variable that controls the number of CPU cores the proxy will use.

When installing Linkerd using the linkerd install CLI command, the --proxy-cpu-limit argument sets this annotation globally for all proxies injected by the Linkerd installation. For example,

linkerd install --proxy-cpu-limit 2 | kubectl apply -f -

For more fine-grained configuration, the annotation may be added to any injectable Kubernetes resource, such as a namespace, pod, or deployment.

For example, the following will configure any proxies in the my-deployment deployment to use two CPU cores:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: my-deployment
  # ...
spec:
  template:
    metadata:
      annotations:
        config.linkerd.io/proxy-cpu-limit: '1'
  # ...

Note

Unlike Kubernetes CPU limits and requests, which can be expressed in milliCPUs, the proxy-cpu-limit annotation should be expressed in whole numbers of CPU cores. Fractional values will be rounded up to the nearest whole number.

Using Kubernetes CPU Limits and Requests

Kubernetes provides CPU limits and CPU requests to configure the resources assigned to any pod or container. These may also be used to configure the Linkerd proxy’s CPU usage. However, depending on how the kubelet is configured, using Kubernetes resource limits rather than the proxy-cpu-limit annotation may not be ideal.

The kubelet uses one of two mechanisms for enforcing pod CPU limits. This is determined by the --cpu-manager-policy kubelet option. With the default CPU manager policy, none, the kubelet uses CFS quotas to enforce CPU limits. This means that the Linux kernel is configured to limit the amount of time threads belonging to a given process are scheduled. Alternatively, the CPU manager policy may be set to static. In this case, the kubelet will use Linux cgroups to enforce CPU limits for containers which meet certain criteria.

When the environment variable configured by the proxy-cpu-limit annotation is unset, the proxy will run a number of worker threads equal to the number of CPU cores available. This means that with the default none CPU manager policy, the proxy may spawn a large number of worker threads, but the Linux kernel will limit how often they are scheduled. This is less efficient than simply reducing the number of worker threads, as proxy-cpu-limit does: more time is spent on context switches, and each worker thread will run less frequently, potentially impacting latency.

On the other hand, using cgroup cpusets will limit the number of CPU cores available to the process. In essence, it will appear to the proxy that the system has fewer CPU cores than it actually does. This will result in similar behavior to the proxy-cpu-limit annotation.

However, it’s worth noting that in order for this mechanism to be used, certain criteria must be met:

The kubelet must be configured with the static CPU manager policy
The pod must be in the Guaranteed QoS class. This means that all containers in the pod must have both a limit and a request for memory and CPU, and the limit for each must have the same value as the request.
The CPU limit and CPU request must be an integer greater than or equal to 1.

If you’re not sure whether these criteria will all be met, it’s best to use the proxy-cpu-limit annotation in addition to any Kubernetes CPU limits and requests.

Using Helm

When using Helm, users must take care to set the global.proxy.cores Helm variable in addition to global.proxy.cpu.limit, if the criteria for cgroup-based CPU limits described above are not met.

Configuring Proxy Concurrency

Using the proxy-cpu-limit Annotation

Using Kubernetes CPU Limits and Requests

Using Helm

Using the `proxy-cpu-limit` Annotation