How do you operate modern, cloud-native applications at scale? What problems arise in practice, and how are they addressed? What is *actually* required to run a cloud-native, microservices-based application under high-volume and unpredictable workloads, without introducing friction to feature releases or product changes?
For all the talk about microservices, it turns out that very few people can actually answer these questions. The rapid rise of exciting new technologies like Docker, Mesos, Kubernetes, and gRPC easily makes armchair architects of us all. But actual high-traffic, production usage? By our reckoning, the number of companies that have actually solved the problems of running microservices at scale is a handful at best.
Twitter is one of those companies. And while it’s certainly had its share of public outages, it operates one of the highest-scale microservice applications in the world, comprising hundreds of services, tens of thousands of nodes, and millions of RPS per service. Shockingly enough, it turns out that this is not easy to do. The problems that arise are not obvious. The failure modes are surprising, hard to predict, and sometimes even hard to describe. It can be done, but it takes years of thought and work to make everything work well in practice.
When Oliver and I left Twitter in the not-too-distant past, our goal was to take these years of operational knowledge and turn them into something that the rest of the world could use. Happily, a tremendous amount of that knowledge was already encoded in an open-source project called Finagle, the high-throughput RPC library that powers Twitter’s microservice architecture.
Finagle is Twitter’s core library for managing the communication between services. Practically every online service at Twitter is built on Finagle, and it powers millions upon millions of RPC calls every second. And it’s not just Twitter—Finagle powers the infrastructure at Pinterest, SoundCloud, Strava, StumbleUpon, and many other companies.
Linkerd is our open-source *service mesh* for cloud-native applications. It’s built directly on Finagle, and is designed to give you all the operational benefits of Twitter’s microservice-based, orchestrated architecture—those many lessons learned over many years—in a way that’s self-contained, has minimal dependencies, and can be dropped into existing applications with a minimum of change.
If you’re building a microservice and want to take advantage of the benefits of Finagle—including intelligent, adaptive load balancing, abstractions over service discovery, and intra-service traffic routing—you can use Linkerd to add these features without having to change your application code. Plus, fancy dashboards!
Linkerd isn’t complete yet, but in the spirit of “release early and release often”, we think it’s time to get this baby out to the wild.
So if this piques your interest, start with linkerd.io for docs and downloads. And if you’re interested in contributing, head straight to the Linkerd Github repo. We’re strong believers in open source—Finagle itself has been open source since almost the beginning—and we’re excited to build a community around this.
We have a long roadmap ahead of us, and a huge list of exciting features we’re looking forward to adding to Linkerd. Come join us!
(If you’re wondering about the name: we like to think of Linkerd as a “dynamic linker” for cloud-native apps. Just as the dynamic linker in an OS takes the name of a library and a function, and does the work necessary to *invoke* that function, so too Linkerd takes the name of a service and an endpoint, and does the work necessary to make that call happen—safely, securely and reliably. See Marius’s talk at FinagleCon for more about this model.)