In the previous post I talked a bit about Docker and the main benefits you can get from running your applications as isolated, loosely coupled containers. We then saw how to “dockerize” a small python web service and how to run this container in AWS, first manually and then using Elastic Beanstalk to quickly deploy changes to it. This was really good from an introduction to Docker point of view but in real life one single container running on a host will not cut it. You will need a set of related containers running together and collaborating, each with the ability to be deployed independently. This also means that you need a way to know which container is running what and where. In this post I wanted to talk a bit about service discovery. Particularly, I’m going to show how you can use Consul running as a container to achieve this goal in a robust and scalable way.
Consul came out of Hashicorp, the same company behind popular tools like Vagrant and Packer. They are pretty good at creating DevOps friendly tools so I take some time to play around with anything they come up with. Consul has several components that provide different functionalities but in a nutshell is a highly distributed and highly available tool for service discovery. Clients can register new services with Consul, specifying a name and additional information in the form of tags and then query Consul for services that match their criteria using either HTTP or DNS. We’ll see an example later on.
In addition to clients specifying the services they want to register they can also specify any number of health checks. The health check can be made against your application (e.g., the REST endpoint is listening to connections on port X) or on the physical node itself (e.g., the CPU utilization is above 90%). Consul will use these health checks to know which nodes it should exclude when a client queries for a specific service.
Finally, Consul also provides a highly scalable and fault tolerant Key/Value store, which your services can use for anything they want: dynamic configuration, feature flags, etc.
So how does it work? The main thing you need is a Consul agent running as a server. This Consul server is responsible for storing data and replicating it to other servers. You can have a fully functioning Consul system with just 1 server but that is usually a bad idea for a production deployment. Your server becomes your single point of failure and you can not discover your services if that server goes down. The Consul documentation recommends setting up a cluster with 3 or 5 Consul servers running to avoid data loss. More than that and the communication starts to suffer from progressively increasing overhead. In addition to running as a server, an agent can also run in client mode. These agents have a lot less responsibilities than servers and are pretty much stateless components.
Usually, nodes wanting to register services running on them with Consul do so by registering them with their local running Consul agent. However, you can also register external services so you don’t need to run a Consul agent on every node that is hosting your services.
Queries can be made against any type of Consul
agent, either running as a server or as a client. Unlike servers, you can have
thousands or tens of of thousands of Consul clients without any significant
impact on performance or network overhead.
I would strongly suggest taking a look at its documentation to get a more detailed explanation of how all of this works.
Running a single node cluster
And now, the fun part! Lets see how we can bootstrap a Consul cluster using Docker containers. We’ll first run a Consul cluster consisting of a single server to see how it works. We’ll use the amazing image built by Jeff Lindsay:
You should see something like:
-server -bootstrap tells Consul to start this agent in server mode and not
wait for any other instances to join. Notice how Consul actually warns you about
this when you start the server: bootstrap mode enabled! do not enable unless
We can now query Consul through its REST API. Since I’m running boot2docker I need to get the VM IP first:
You get a JSON response specifying the nodes that are currently part of the Consul cluster, which in our case so far is just one. You can also go to http://192.168.59.103:8500/ (replace the IP by whatever your Docker host IP is) in your browser to see a nice UI with information about the currently registered services and nodes.
Lets now add a new service. We usually want to register all the services that are under our control. But what about the external ones? It is seldom the case where we don’t use any third party services. It would certainly be nice to treat both types equally from a service discovery point of view. We’ll start by adding an external service, following the example given in the documentation:
Here we registered the “google” node as offering the “search” service. But what if google is down for some reason? (can that happen?). We can register multiple search services:
We can now query Consul through its HTTP API to see all the services that are currently registered with it:
We can see that the “search” service that we added before is registered. Note that we don’t see any mention about the 2 specific services we added. If we want to get more information about any particular service we can also do that:
We can also use the DNS interface to query for services:
Running a Consul cluster
Ok, so we were able to run a single Consul agent in server mode and register an external service. But, as I mentioned before, this is usually a very bad idea for availability reasons. So lets see how we could run a cluster with 3 servers, all of them running locally on different Docker containers.
We’ll start the first node similarly to the way we did it before:
Note here that instead of passing the
-bootstrap flag we are passing a
-bootstrap-expect 3 flag, which tells Consul that it should wait until 3
servers join to actually start the cluster.
In order to join the cluster a node only needs to know the location of 1 node
that is already part of it. So to join the second node we will need the IP of
the first one (the only node we know of so far). We can get this IP using
docker inspect and looking for the
IPAddress field. Or you can just export
that to an environment variable with:
We can now start our 2 remaining servers and join them with the first one:
After doing that you should see something like this on the
Basically, after joining the second node Consul tells us that it can not yet start the cluster. But after joining the third node, it tries to bootstrap the cluster, elects a leader node and marks the 3 nodes as healthy.
So now we have our 3 servers cluster up and running. Note however, that we did not specify any port mapping information on any of the three nodes. This means that we would have no way of accessing the cluster from outside. Luckily this is not a problem because with our cluster running we can now join any number of nodes in client mode and interact with the cluster through those clients. Lets join the first client node with:
Note that we didn’t pass the
-server parameter this time and we added the port
We can now interact with the cluster through our client node. We could, for
instance, use the REST API to see all the nodes that are currently part of the
It is important to understand that we only need to know the address of 1 of the
nodes (either server or client) to join. Until now we have used the
variable which contains the IP of node1 but we could just as easily add a
new node using the IP of node4 for instance, which is a client:
Similarly, we can send our queries to any node in the cluster and the answer will be always the same thanks to Consul’s replication algorithms. Here we’ll use port 8501, which is the port exposed by the last client we joined:
This combined with the fact that we can have thousands of clients in the cluster without any performance impact makes Consul an extremely highly available service discovery solution.
In addition to its service discovery and health check capabilities, Consul offers a key/value store for whatever you may need. We can easily access it through its REST API. We’ll keep using the 5 node cluster we got running before. First, lets make sure that there is nothing currently saved there:
We got back a 404 because the key doesn’t exist yet, great! Let’s now add a value for key1 and query again:
Note that the
Value field is base64 encoded. According to the
documentation this is to
allow non UTF-8 characters.
Before we saw that we could query any node in the cluster for registered
services or a list of nodes and the answer would be the same.
It’s no surprise that this also applies to the key/value store. We can add a key
and query for one from any node. In our example, we could use
$DOCKER_IP:8501/v1/kv/key1 (changing the port to 8501 to query a different node
that the one we used on the PUT) and we would get exactly the same answer from
In the last post we saw an overview of Docker and its benefits. This is really easy to see when you consider a service running on a single container. But when you start to throw in hundreds or thousands of containers things start to get a bit more complicated. One of the first things you need is to know where each container lives and what services it offers. You also need some basic form of health-check to be sure that you don’t try to send requests to containers that are either not able to reply or are not there anymore. Consul, a highly scalable and efficient service discovery tool, solves these problems in a very elegant way. Of course, there are many other alternatives out there with different capabilities like etcd or SkyDns. I haven’t had a chance to play around with those yet so I don’t have an informed opinion about them.
One thing that we haven’t talked about yet is how would you go about registering your containers. By this I don’t mean the Consul-specific way of registering services but rather a more general question: who is responsible for doing this? Should the container know how to register itself with the cluster? Should the operator running the container do this? Someone else? All these approaches have pros and cons. In the next post I’ll discuss these options as well as showing a really amazing tool from Jeff Lindsay that makes it incredibly easy and transparent to deal with container registration.