Microservices, distributed systems, and containers are becoming the norm. With this, orchestration software, such as K8s, and Mesos are becoming more and more relevant to the computing ecosystem in the datacenter today. These systems become the under crust of our operational environment. We build such a system on top of Mesos, named DC/OS. Back in mid-2015, the company realized that they needed to build a networking layer that was on par with other competitors in the market. With a small team of two people, we were tasked with building and maintaining a set of distributed systems to support DC/OS's network subsystem. We decided to turn to Erlang because of the healthy ecosystem and focus on reliability. Not only did we build the individual services, we also built a shared control plane, Lashup. In this talk, we'll cover how we were able to leverage off the shelf tools available to the Erlang community such as CRDTs, property checking, distributed erlang, and datastores to build a system that provides membership, failure detection, multicast, and acts as an eventually consistent datastore. We'll also talk about how we used this library to build a load balancer, distributed DNS server, and overlay while maintaining a high standard of reliability. A persistent theme throughout the talk will be about the implicit business value that Erlang is able to drive with minimal effort due to its nature from deployment to supportability.
Talk objectives:
* Discuss the benefits of decentralized, eventually consistent vs. centralized strongly consistent distributed systems for container orchestration
* Discuss property checking, and testing practical software inside of the data center.
* Discuss the operational excellence of Erlang in practical systems. Often times Erlang is thrown around as this magical system that's easy to operate and run at scale. There are a lot of myths around what can be done, but there are some practical pieces of knowledge that should be applied in real life.
* Evangelize our libraries, and control plane that we've built in order to enable the rapid evolution of our Erlang stack.
Target audience:
The first and the primary party is those that are distributed system practitioners that are operating in a highly distributed world, where they need to build distributed systems. There is a subset of these folks who are particularly interested in cutting edge research around gossip, multicast, and CRDTs as an alternative to tightly coupled, strongly consistent systems. This system is a practical, production approach to such techniques, and presented as a sort of experience report.
Secondarily, this talk covers container orchestration and the programming challenges that lie within. These systems are rapidly evolving, complex systems that must show the highest reliability in production. Folks who are interested in testingĀ and building systems for operational excellence will be able to hear about how we attacked these facets of the challenge.
Slides
Sargun Dhillon has a background in operations, and distributed systems, specifically in the area of infrastructure. He's has years of experience building large and small datacenter orchestration software used internally and externally. He has built much of Mesosphere's container SDN stack in Erlang and C. Previous to this, he's worked at companies such as Basho, Microsoft, and Yammer. Given this background, he spends much of his time thinking about how to make massively distributed, scalable systems friendlier to run. GitHub:
sargun
Twitter:
@sargun