Posts on Development the way it should be

AWS CloudFormation gotchas - Security groups

Tue, 30 Jan 2018 00:00:00 +0000

If you are working with AWS and keeping your infrastructure as code (Hint: you really should) then you’ve probably come across CloudFormation or Terraform at some point. If you are using the former option, then there’s a small gotcha related to security groups that might cause some unexpected behaviour if you are not aware of it (for me it caused a production incident…).

The use case

Imagine you have the following small template defining a security group that allows incoming HTTPS traffic from a specific IP range and a load balancer that will use that security group:

AWSTemplateFormatVersion: '2010-09-09'
Description: My awesome template
Outputs:
  SecurityGroup:
    Value: !Ref SecurityGroup
Resources:
  SecurityGroup:
    Properties:
      GroupDescription: Security group used for the test
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: '-1'
      SecurityGroupIngress:
        - CidrIp: 10.2.2.0/24
          FromPort: '443'
          IpProtocol: tcp
          ToPort: '443'
      VpcId: >
    Type: 'AWS::SecurityGroup'

  LoadBalancer:
    Type: AWS::LoadBalancer
    Properties:
      Scheme: internal
      Subnets: >
      SecurityGroups:
        - !Ref SecurityGroup

You create your CloudFormation stack and everything works as expected. Both resources are created and the security group is associated with the load balancer to make sure that only HTTPS traffic from that specific IP range is accepted. After a short test, you start routing all traffic to your new shiny load balancer.

The problem

A few weeks later you come back to this simple template to add a new resource and you try to remember what that specific IP range meant. Was it the subnet of one of your clients? If so, which one? Who do I need to notify in case the port or range needs to change? After some searching through your archived emails you finally find where that range came from. Since you are a good engineer and you don’t want anyone else to waste 20 minutes of their life trying to understand why that rule is the way it is, you decide “I should probably add a description to my rule now that AWS supports it”.

So you go ahead and add that:

SecurityGroup:
  Properties:
    GroupDescription: Security group used for the test
    SecurityGroupEgress:
      - CidrIp: 0.0.0.0/0
        IpProtocol: '-1'
    SecurityGroupIngress:
      - CidrIp: 10.2.2.0/24
        FromPort: '443'
        IpProtocol: tcp
        ToPort: '443'
        Description: "This IP range belongs to the subnet of Team X"
    VpcId: >
  Type: 'AWS::SecurityGroup'

Quite happy with yourself you do your usual aws cloud-formation update-stack and you switch to something else. Immediately after firing the stack update you start getting alarms (because of course you monitor your infrastructure) telling you that no traffic is getting through your load balancer. After a few minutes the alarms go away and you start getting traffic again but it was enough for your clients to notice that all their requests were timing out. What the hell happened!?

You know the problem is related to your change in the security group but how can adding a description possibly cause that? You quickly go to your CloudFormation console and look at the events of the stack, nothing wrong there. You go to the CloudFormation documentation to see if you missed something, but nowhere does it say anything special about adding a description to one of the rules. In fact, it explicitly says that changes to the SecurityGroupIngress require “No interruption”.

You start to think that maybe it had nothing to do with your update and that it was an extremely unlucky coincidence. But on a last attempt to see if you find anything weird you log in to your AWS Console and find your security group there. It looks correct but just for fun you click the “Edit” button. Of course you would never update stuff like that manually (infra as code remember?) but you are desperate at this point. And in there (only in there) you see this:

NOTE: Any edits made on existing rules will result in the edited rule being deleted and a new rule created with the new details. This will cause traffic that depends on that rule to be dropped for a very brief period of time until the new rule can be created.

What? How? Why? You would understand if that was the case when changing the port or the IP range, but adding a description to it? Really? To add salt to the wound, you then find out that AWS has an operation on their API to do exactly this, it’s aptly called UpdateSecurityGroupRuleDescriptionsIngress. So this feels like pure laziness. Instead of checking what actually changed in the definition of the rule and call the proper API in case that was only the description, CloudFormation decides it’s easier to fully recreate the rule if anything changes (with the corresponding traffic drop).

You decide to test this for yourself by creating a new test template with only the security group definition, initially without description:

AWSTemplateFormatVersion: '2010-09-09'
Description: Security group description update test
Outputs:
  TestSecurityGroup:
    Value: !Ref TestSecurityGroup
Resources:
  TestSecurityGroup:
    Properties:
      GroupDescription: Security group used for the test
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: '-1'
      SecurityGroupIngress:
        - CidrIp: 10.2.2.0/24
          FromPort: '443'
          IpProtocol: tcp
          ToPort: '443'
      VpcId: >
    Type: 'AWS::SecurityGroup'

After the stack is created you check your security group from the AWS cli:

$> aws ec2 describe-security-groups --group-ids  | jq '.SecurityGroups[0].IpPermissions'

[
  {
    "PrefixListIds": [],
    "FromPort": 443,
    "IpRanges": [
      {
        "CidrIp": "10.2.2.0/24"
      }
    ],
    "ToPort": 443,
    "IpProtocol": "tcp",
    "UserIdGroupPairs": [],
    "Ipv6Ranges": []
  }
]

Now, as you did before, you add a nice description to your ingress rule and do an update-stack. And here’s where it gets really interesting. As soon as you trigger the stack update you run your describe-security-groups command again and you see this as the output:

$> aws ec2 describe-security-groups --group-ids  | jq '.SecurityGroups[0].IpPermissions'

[]

That’s right, no ingress rules for your security group. Which of course means no traffic can get through. You try again after you stack is finished updating and you see your rule there again, this time with the description.

Like I said before, given that the functionality to update only a rule description is present on their API, this feels like a pretty serious bug in CloudFormation to me. But even if it wasn’t, I would definitely expect to see some sort of warning on CloudFormation docs (not in some obscure part of the AWS UI).

The solution

So, given this limitation in CloudFormation, how do you work around it? How do you add descriptions to your existing rules?

The most straightforward approach is simply to have a planned maintenance window for your service (or services) and just do the update. As far as I could see in my tests it usually takes less than a minute for the new rule to be created and put in place.

If that’s not acceptable for your use case then it gets a bit more tricky. Here are some of the things I tried and didn’t work.

Using the AWS cli, you can update the description using the API I mentioned previously (https://docs.aws.amazon.com/cli/latest/reference/ec2/update-security-group-rule-descriptions-ingress.html). That works and doesn’t incur in any downtime but now your CloudFormation template doesn’t really represent the current state of your infrastructure. You might think that adding the same description as you used in the cli to your template would work but it doesn’t. Since the latest version of the template that CloudFormation knows about doesn’t contain any description, the next update will recreate the rule just as it did the first time.

A second potential approach was to just add a second ingress rule to the template with the same IP range and port that includes our awesome description. By doing this I was expecting CloudFormation to leave the old rule untouched while creating a new one and, on a second update after that, to remove the first rule (the one without description). Unfortunately this doesn’t work because CloudFormation seems to use the (IP, port) pair as a way to identify each rule. That means 2 things for our example: the first update will not create a new rule and, even worse, the second update to remove the old rule will actually leave your security group without any ingress rules.

The only approach that has worked for me so far is a slight variation of the previous one: - Add a new rule that is more permissive than the original one so that it allows traffic from the same IP range (0.0.0.0/0 for instance if you are not worried about opening access to everyone for a few minutes) and update your stack - At this point you should have 2 ingress rules in your security group. Now you can add your description to the original rule and do an update. This will recreate the rule but that should be fine because we have our second rule still there that should allow traffic from the same source - Finally, delete the second rule. Now you should be back to having only your original IP range with the description included

Having to do 3 stack updates to add a simple description to an ingress rule is less than ideal but I haven’t found a better way to do it without incurring in some downtime for your clients.

What about Terraform?

If you are using Terraform then things should work as expected. Terraform will do the update without incurring in any packet loss. From the output of terraform plan you can actually see that it is creating a new rule with your description and deleting the old one, but it does so in a way that you always have at least 1 of them present. Kudos to HashiCorp for that!

Why you should follow the robustness principle in your APIs

Sat, 25 Mar 2017 00:00:00 +0000

Microservices are all the rage right now. Everyone is taking their big monoliths and decomposing them into smaller services with exposed APIs. If you are doing this right then your services should be completely decoupled and independently releasable. Yet the way some APIs are designed makes this extremely hard to accomplish, if not impossible. Let’s take a look at the problem and how to solve it.

Postel’s Law

Postel’s Law, also known as the robustness principle states that you should be conservative in what you send and liberal in what you accept from others. Although this was proposed initially for the specification of the TCP protocol, it has a very important place in the design and evolution of APIs and we’ll see why with an example.

Imagine we have 2 small, independent services owned by different teams: the user service and the address service. The address service exposes a small API to store users’ addresses, which initially only consists of the street address. The user service can POST a request with a JSON body like {"street_address": "1234 Fake St."} and the address service responds with a 201 - Created.

So far so good. However, we quickly realize that street address alone is not enough information so we want to start sending in city and postal code as well. Unfortunately, the address service does not follow the robustness principle we mentioned before because if it happens to receive any field on the request that it doesn’t recognize (anything other than street_address at the moment) it immediately fails, returning a 400 - Bad Request

The user service team is done with their part of the work needed to start sending the new fields but they can not release it until the address service is updated to take in those fields. We effectively introduced a major coupling point between the 2 services that adds the need to sync their releases. This is even worse if we have multiple environments with potentially different versions of our services deployed and/or multiple clients of the address service.

Of course, one approach to mitigate this would be for the user service to have this new functionality behind a feature flag. Then they can do their work and release to any environment with the flag disabled. Once the address service is updated the new feature can be enabled. Not a big problem, but each feature flag does introduce some extra complexity to the code. What happens the next time we want to add another field? Will we have one flag for each field that we want to send to the address service?

Now imagine that both services are running with the latest changes, the feature flag is enabled and we are storing the data we wanted. It’s Saturday 11 PM and consumers start calling in complaining that their address data is completely messed up. They are getting data from other people, which is not only a terrible user experience but a potential privacy disaster. It turns out that the latest release of the address service with the addition of the 2 new fields also included a nasty bug that caused the data to be assigned to the wrong users. The address team decides to rollback the release immediately to the latest known functioning version (the one that had only street_address on the API).

There are 2 possible scenarios at this point. Scenario 1 is that they rollback their service without realizing that the user service is still sending the new fields. This causes all new requests after the rollback to fail with a 400, effectively loosing all new data. Scenario 2 involves someone contacting the user team for them to disable the feature flag (if it’s still there) before doing the rollback. The result is equally bad in either case. You introduce unnecessary dependencies between teams and unexpected bugs.

This whole thing could have been avoided if the address service would’ve adhered to the robustness principle and simply ignored all unknown fields. The user service then could’ve started sending the new fields in all environments without worrying about when the address service was ready. Similarly, in the production incident scenario, the address service can safely rollback and the user service team doesn’t even have to know about it. No coupling, no synchronization between teams, no hassle.

Note that you can still enforce the presence of mandatory fields. The only thing we’ve done is to make sure we just ignore the fields we don’t care about.

This is very easy to do if you are using Jackson for instance, either globally by configuring you object mapper like:

new ObjectMapper()
  .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)

or on a class by class basis, with a simple annotation:

@JsonIgnoreProperties(ignoreUnknown = true)
public class Address {
}

Other languages and serialization/deserialization frameworks should provide similar options. If you are using JSON schema to validate your incoming requests you can set additionalProperties to true.

API versioning

Of course this doesn’t mean that your service should accept completely invalid input and try to make sense of it. As they say, keep an open mind but not so open that your brain falls off.

Adding new optional fields to an API, like we saw in the previous example, is a classic example of a change that is perfectly backwards compatible. Your service should be able to handle both the presence and the absence of those fields. Other times, however, there’s a fundamental change in the structure of your API. In these cases it’s a lot harder, if not impossible, to keep backwards compatibility.

This is one of the main reasons why it’s a good idea to have versioning of your APIs from day one. You never know how your service will need to evolve in the future so it’s better to start with the assumption that you will need versioning at some point. You have several alternatives to handle API versioning (URL vs Content-Type header for instance). Which one you choose will depend on your particular needs and constraints.

Also important for consumers of APIs

So far we’ve been assuming that this applies only to providers of an API in what they accept from their clients. But it is equally important for consumers as well.

When you consume an external API, most of the time you are not interested in every single field of the response. By making sure that you ignore any field you don’t have an interest in, you are not only writing less code but you are also making your service more resilient to changes in that API.

We can go back to the user and address service example from before, but this time looking at things from the point of view of the user service. In the same way that the address service provides an endpoint to POST data to, it also provides an endpoint where clients can GET address details. In the response the address service includes the street address, city, latitude and longitude.

The user service is only interested in the street address and city, it has no use for latitude and longitude. But for some reason it still maps every field of the response. With time the address service adds more fields to the response because other consumers need those. Every new addition involves extra work for the user service team, even when they still only care about the original street address and city.

To make things worse, the address service doesn’t really sync with its clients before doing backwards compatible changes (and why should they?). So the user service only realizes that they have to adapt to a new response after things start failing on their side.

Is it ever worth it?

So why would you want to be really conservative in what you accept and fail all requests that have any additional field? I haven’t found a real-life use case where the benefits from doing so outweigh the problems we discussed before. I suspect most services that actually do this are doing it because they just haven’t thought about it and they use the default behaviour of their language/tool. In the case of Jackson this default behaviour is to fail the deserialization when extra fields are present.

Some people argue that making the request fail early can make it explicit to the consumers of their API that they are doing something wrong. I don’t really find that argument compelling enough. This sort of issues should be detected through testing, either consumer driven contracts or integration/E2E tests.

Conclusion

Postel’s law, or the Robustness principle, is essential in the evolution of APIs. No one can accurately anticipate how your requirements are going to change or how many different number and types of consumers you are going to have. By making sure that you are lenient and ignore the fields you don’t really care about for specific requests you will be decoupling your service from your consumers. This is fundamental in a micro-service environment where the dependencies between services should be kept to a minimum, especially when considering releases.

Single interface to parse and update JSON/YAML from your terminal

Tue, 24 May 2016 00:00:00 +0000

If you’ve ever had to parse JSON from your terminal you probably know about jq. It’s basically sed for JSON and it works wonderfully well. If you’ve had to parse YAML from your terminal however, the problem becomes a bit harder. You can either go for some super obscure 15 lines sed and awk combination that has the advantage of being pure bash, or go with a higher level language (ruby or python comes to mind) to actually do the parsing and outputting the result to stdout. In this post I’ll show jyparser, a simple tool (packaged as a nice docker image) that allows you to use a jq-like syntax to parse and also update JSON and YAML files from your terminal using exactly the same commands.

The problem

So imagine you have your app and different JSON files for the different environments your app will be deployed to, with each file containing things like the environment name, the build version currently deployed, etc. Maybe something like:

~ cat my_app.json

{
  "app_name" : "awesome app",
  "build_version" : 1,
  "tags" : ["myTeam", "myCompany"]
}

Now as part of your deployment process you want to read the build_version variable from the JSON file, increase it by 1 and then update the original JSON with the new value.

This would not be super hard to do with plain jq:

~ version=$(cat my_app.json | jq '.build_version')
~ echo $version
1

~ new_version=$((version+1))
~ echo $new_version
2

~ cat my_app.json | jq --arg value $new_version '.build_version |= $value'
{
  "app_name": "awesome app",
  "build_version": "2",
  "tags": [
    "myTeam",
    "myCompany"
  ]
}

It’s not too hard but it’s not straightforward either, specially the update part. You have to know about jq update operator (|=) and how you can pass env variables using --arg.

Now imagine you decide to switch to YAML instead of JSON because either you started using a different tool that only accepts YAML or the same tool accepts both and you prefer it over JSON.

~ cat my_app.yml

app_name: awesome app
build_version: 1
tags:
- myTeam
- myCompany

You still want to accomplish the same thing, bump the build_version of your YAML. But your previous deployment bash script with your fancy jq query obviously doesn’t work anymore. Now you need to figure out how you’re going to parse and update that YAML, which like I mentioned in the beginning is not trivial (or at least I didn’t find a nice and easy way to do it).

Wouldn’t it be nice if you could somehow say: cat my_app.{yml, json} | get .build_version to read the value you are interested in and cat my_app.{yml, json} | set .build_version to update it? That is, use exactly the same command regardless of where the input is coming from (JSON or YAML). Enter jyparser

jyparser

jyparser stands for JSON/YAML Parser (I know, not very original but I always sucked at names) and it was created specifically for the use case I described above. Getting a single value from a JSON or YAML, doing something with it (if needed) and then setting a new value for it on the original input. Of course reading/updating entire objects/arrays in JSON or entire hashes/lists in YAML is also supported.

At its hearth jyparser is a simple wrapper around jq and 2 python 1 liners to convert from JSON to YAML and vice versa. It will detect the input’s type and, in the case of YAML, convert to JSON before applying jq and then convert the result back to YAML. Note that since YAML is actually a superset of JSON this will only work for those YAML files that can be correctly converted to JSON.

You can see the code here and the docker image here.

Usage

Let’s look at some examples of how you would usually use the tool.

The image’s entry point accepts 2 operations: get and set. It can take its inputs from stdin or read from a file if this is passed as the first parameter.

Read

The get command takes an arbitrary jq filter. If the result is a simple value (number, string or boolean) then that value is returned. Otherwise, the resulting JSON or YAML is returned (depending on what the input was).

Given the following JSON file:

~ cat test.json

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

If you wanted to get the value of the id property you could use:

~ cat test.json | docker run -i --rm jlordiales/jyparser get .menu.id

"file"

The JSON is passed via stdin, which is useful if you get that from something like curl. If you have an actual file that you want to use as input then you can pass it directly as the first parameter to the script:

~ docker run -i --rm -v `pwd`:/jyparser:ro jlordiales/jyparser test.json get ".menu.id"

"file"

The example above mounts the current dir with the file into /jyparser (which is the default WORKDIR for the docker image) and then uses that file as input.

Exactly the same command works for YAML as well. Given the equivalent YAML file:

~ cat test.yml
menu:
  id: file
  value: File
  popup:
    menuitem:
    - onclick: CreateNewDoc()
      value: New
    - onclick: OpenDoc()
      value: Open
    - onclick: CloseDoc()
      value: Close

We can get the id property with:

~ cat test.yml | docker run -i --rm jlordiales/jyparser get .menu.id

"file"

If the result from running the jq filter is not a simple value, then the corresponding JSON or YAML is returned:

~ cat test.json | docker run -i --rm jlordiales/jyparser get ".menu.popup.menuitem[1]"

{
  "value": "Open",
  "onclick": "OpenDoc()"
}

~ cat test.yml | docker run -i --rm jlordiales/jyparser get ".menu.popup.menuitem[1]"

onclick: OpenDoc()
value: Open

The jq filter that is passed as parameter is sent as is to the tool, so you are not limited so simple filters. Anything that is valid for jq is valid for jyparser as well.

Update

Similarly to the get operation, there’s a set one. This operation takes 2 parameters: a jq filter to select a specific element of the input and a new value to update that element to. The result is the original input with the value updated.

~ cat test.json | docker run -i --rm jlordiales/jyparser set ".menu.id" \"new_id\"
{
  "menu": {
    "id": "new_id",
    "value": "File",
    "popup": {
      "menuitem": [
        {
          "value": "New",
          "onclick": "CreateNewDoc()"
        },
        {
          "value": "Open",
          "onclick": "OpenDoc()"
        },
        {
          "value": "Close",
          "onclick": "CloseDoc()"
        }
      ]
    }
  }
}

Important: given the way bash scripts handle quotes on parameters passed to them, if the new value you want to set for the property is a string you need to explicitly escape the quotes as in the example. Otherwise, jq will complain that the value is not valid (rightfully so). This is not needed for numbers or booleans. So the following works as expected:

~ cat test.json | docker run -i --rm jlordiales/jyparser set ".menu.id" 15
{
  "menu": {
    "id": 15,
    "value": "File",
    "popup": {
      "menuitem": [
        {
          "value": "New",
          "onclick": "CreateNewDoc()"
        },
        {
          "value": "Open",
          "onclick": "OpenDoc()"
        },
        {
          "value": "Close",
          "onclick": "CloseDoc()"
        }
      ]
    }
  }
}

This way of updating the JSON is arguably a lot easier to read and use than the jq version we saw at the beginning. It’s just set and, best of all, the same works for YAML:

~ cat test.yml | docker run -i --rm jlordiales/jyparser set ".menu.id" \"new_id\"

menu:
  id: new_id
  popup:
    menuitem:
    - onclick: CreateNewDoc()
      value: New
    - onclick: OpenDoc()
      value: Open
    - onclick: CloseDoc()
      value: Close
  value: File

As with the get operation, set can take the input both from stdin and a file if passed as first argument.

Conclusion

If you are doing regular parsing/updating of JSON and/or YAML and you don’t want to have hugely complex combinations of jq with sed and awk but instead have a simple interface to work with both types then give jyparser a try. It was created for a very specific use case but it might be able to adapt to yours as well.

jyparser was heavily inspired by y2j, so make sure to check it out as well.

Orchestrating your containers with CoreOS, an introduction

Sun, 12 Jul 2015 00:00:00 +0000

Most docker tutorials that you’ll find out there (the ones in this blog included) will assume that you have a single host running all your containers or a few hosts but where you are manually managing them. While this is nice and simple to explain the basic concepts, it is probably not the way you want to run your applications in production. In most cases you will have a cluster of servers all running different containers that need to talk to each other and know how to function properly, even when some of those servers suddenly go offline.

This is the area of orchestration and scheduling of containers, a topic that is extremely hot these days. Particularly with big players in the industry working in new projects to abstract away most of the complexities inherent to running distributed containers. Amazon recently opened up their EC2 container service, Google has Kubernetes and Mesosphere is becoming pretty popular with the underlying Apache Mesos project.

One additional project that has been gaining a lot of attention in this area is CoreOS. In this post I’m going to try to explore CoreOS and give a basic overview of the problem that it tries to solve, how it works and how to work effectively with it.

Introduction

The idea behind CoreOS is the same as with any other cluster management system. You stop thinking about your individual servers and how they work together. Instead you think about your data center (a cluster of individual servers). In other words, you no longer say “run this container in server 1 and this other container in server 2” but “run these 2 containers in my data center” and let the cluster manager take care of where and how to do that.

This also means that if one or more of your individual servers die the cluster manager will take the containers that were running in those servers and distribute them across the remaining healthy nodes.

Following this philosophy, CoreOS is an open source lightweight operating system that comes together with a set of simple tools.
The main building block behind CoreOS is Docker. Since CoreOS doesn’t come with a package manager, everything you want to run on it has to run as a container. It should be noted that while CoreOS fully supports Docker, they are also working on their own container runtime called rkt.

To start and manage all these containers CoreOS uses Fleet. Fleet is based on systemd and extends it in order to work at the cluster level. In other words, while systemd works as a single machine init system, fleet works as a cluster init system.

To coordinate all different nodes and let Fleet know where to run your containers, CoreOS provides etcd, a distributed key/value store with a strong consistency and partition tolerance model. Etcd uses the Raft consensus algorithm to handle the communication between the different nodes. This is the same algorithm used by Consul by the way.

I will go into some detail about each of this tools and how they can work together with Docker. But first, lets setup our CoreOS cluster running on Vagrant.

Bootstraping a CoreOS cluster

The CoreOS documentation has very comprehensive guides to run CoreOS on anything from bare metal hardware, to cloud providers to virtualization platforms. You can follow the step by step guide in here to run a basic cluster locally on Vagrant.

The short version is that you can clone this repo and then do some minimal configuration. The relevant files for this part are the config.rb and user-data ones.

The Vagrantfile is pretty generic and reads all the configuration it needs from config.rb, so no need to change anything there. This latter file looks something like the following:

# Size of the CoreOS cluster created by Vagrant
$num_instances=6

# Official CoreOS channel from which updates should be downloaded
$update_channel='stable'

# Customize VMs
$vm_memory = 2048
$vm_cpus = 2

# Enable port forwarding from guest(s) to host machine, syntax is: { 80 => 8080 }, auto correction is enabled by default.
# 4001 is the default etcd port, we need this if we want to run fleetctl locally on the host
$forwarded_ports = {4001 => 4001}

The file is pretty self-explanatory. You see that we define the size of our cluster (6 instances) and we give it some extra memory and cpus to run on. Lastly we forward port 4001 which is the default port used by etcd. We’ll see why we want to do this in a bit.

Then we have the user-data file:

#cloud-config
coreos:
  etcd2:
    #generate a new token for each unique cluster from https://discovery.etcd.io/new
    discovery: https://discovery.etcd.io/>
    # multi-region and multi-cloud deployments need to use $public_ipv4
    advertise-client-urls: http://$public_ipv4:2379
    initial-advertise-peer-urls: http://$private_ipv4:2380
    # listen on both the official ports and the legacy ports
    # legacy ports can be omitted if your application doesn't depend on them
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
  fleet:
    public-ip: $public_ipv4
  flannel:
    interface: $public_ipv4
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: docker-tcp.socket
      command: start
      enable: true
      content: |
        [Unit]
        Description=Docker Socket for the API

        [Socket]
        ListenStream=2375
        Service=docker.service
        BindIPv6Only=both

        [Install]
        WantedBy=sockets.target

This is the only file you need to modify before starting the cluster. Go to https://discovery.etcd.io/new in your browser and copy the URL that you get as a response there. Now go to user-data and paste that URL where it says discovery: https://discovery.etcd.io/.

You can now do a vagrant up and wait while your cluster gets created. When it’s done you should be able to run vagrant status and see the 6 nodes running.

Trying out etcd

Now that you have a CoreOS cluster up and running, we can start to play around with the different tools that are shipped with it. Lets start with etcd, the distributed key/value store.

You’ll need 2 open terminals for this (or tabs, or splits or whatever you use). We’ll ssh into core-01 in one of them (with vagrant ssh core-01) and core-02 in the other (vagrant ssh core-02). Which nodes you ssh into is irrelevant, as long as they are different.

CoreOS comes with a tool to read and write from etcd, called etcdctl. But etcd also exposes an HTTP API that is really intuitive and easy to use. In fact, etcdctl is just a facade in front of this API. We’ll see how to use both here.

Lets start by writing a value. From core-01 do a etcdctl set /key1 value1. This command adds a new key/value pair to etcd where the key is key1 and the value is value1. Now from the second node, you can read the value with etcdctl get /key1. You should see value1 as a response.

Note how etcd replicated the value that you wrote on the first node to the second one almost instantaneously. In fact, it replicated the value to all nodes in the cluster not just the two you are ssh’ed into. This is the power of a distributed store.

If you wanted to use the HTTP API you could have accomplished the same thing using curl instead of etcdctl. We can write a second key/value pair in this way. From core-01 you can do curl -L -X PUT http://127.0.0.1:4001/v2/keys/key2 -d value="value2". Now to read the value from core-02 you can do a curl -L http://127.0.0.1:4001/v2/keys/key2. Admittedly, the etcdctl tool simplifies things a little bit but both options are there to choose from.

Another really interesting thing that etcd provides are TTL (time to live) values for each entry. This is quite useful when we use etcd for things like service discovery, where we don’t want to be reading stale values. To use it you simply pass the --ttl parameter when you set a value. To see this in action go back to core-01 and do a etcdctl set /key3 value3 --ttl 15. This will add the new key with a TTL of 15 seconds. If you go to core-02 now and do a etcdctl get /key3 you should see its value (provided it took you less than 15 seconds to do that). Now wait for a while and run the same get again. The key is gone!

Finally, if you want to list all the currently stored keys withing etcd you can use the etcdctl ls command. This will print the keys available at the root level. Alternatively, if you want to print keys at any level you can pass the --recursive flag (as in etcdctl ls --recursive).

Etcd provides some other cool functionalities (like atomic test and set updates, directories, event notifications) that are well documented if you do a etcdctl help.

Starting your first Fleet unit

As I mentioned in the introduction, CoreOS comes with a cluster manager called Fleet. You’ll use the fleetctl tool to interact with the cluster. To see this in action ssh into one of the nodes and do a fleetctl list-machines like:

$ vagrant ssh core-01

core-01$ fleetctl list-machines

MACHINE         IP              METADATA
0b9fd6f8...     172.17.8.104    -
128a2e32...     172.17.8.103    -
2addf739...     172.17.8.101    -
3f608471...     172.17.8.106    -
73c0b7fc...     172.17.8.105    -
eabc97ed...     172.17.8.102    -

We can see that our 6 CoreOS nodes are automatically recognized by Fleet as being part of the same cluster.

Like I said before, we know only care about our cluster and not our individual nodes. This nodes are completely ephemeral and we should assume they can come and go without previous notice. For this reason, it doesn’t matter from which node we run the previous fleetctl command. We could’ve ssh into “core-06” and the result would’ve been exactly the same.

Using fleetctl from your host

Being able to run fleetctl from within any node is great but even better is to be able to run it from outside the cluster as well. For now our cluster is running locally on Vagrant but the same setup could be running in AWS and we would probably like to control it from our laptop without the need to ssh into individual instances before.

Luckily, we can do this easily using the --tunnel flag. From your laptop run:

$ fleetctl --tunnel 127.0.0.1:2222 list-machines

MACHINE         IP              METADATA
0b9fd6f8...     172.17.8.104    -
128a2e32...     172.17.8.103    -
2addf739...     172.17.8.101    -
3f608471...     172.17.8.106    -
73c0b7fc...     172.17.8.105    -
eabc97ed...     172.17.8.102    -

This basically tunnels all communication with your cluster over SSH using the IP and port specified. Port 2222 is the default port that Vagrant uses to SSH into your VM (you can see this by running vagrant ssh-config).

If you get a message saying something like Failed initializing SSH client: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain, make sure that your Vagrant insecure ssh key is added to your ssh-agent by running ssh-add ~/.vagrant.d/insecure_private_key

To make fleetctl commands a bit less verbose we can actually put the tunnel configuration into an environment variable:

$ export FLEETCTL_TUNNEL=127.0.0.1:2222

Then we’ll be able to run Fleet just as we would if we were inside one of our nodes:

$ fleetctl list-machines

MACHINE         IP              METADATA
0b9fd6f8...     172.17.8.104    -
128a2e32...     172.17.8.103    -
2addf739...     172.17.8.101    -
3f608471...     172.17.8.106    -
73c0b7fc...     172.17.8.105    -
eabc97ed...     172.17.8.102    -

Running Fleet units

Having our nodes up and running with Fleet is great but it is not doing anything useful by itself. We want to start telling our cluster to run some services for us. This is where Fleet Units come into play.

As I mentioned before Fleet can be seen as systemd working at the cluster level instead of at the individual machines level. As such, in order to run anything with Fleet you need to submit regular systemd units files combined with some Fleet specific properties.

A unit file defines what process you want to run and gives Fleet some hints to help it determine how and where that process should be executed. To get started, lets see what the unit file to run our good old python service would look like:

$ cat python-test.service

[Unit]
Description=Python service
Requires=docker.service
After=docker.service

[Service]
TimeoutStartSec=0
Restart=on-failure

ExecStartPre=-/usr/bin/docker kill python-service
ExecStartPre=-/usr/bin/docker rm python-service
ExecStartPre=/usr/bin/docker pull jlordiales/python-micro-service

ExecStart=/usr/bin/docker run --name python-service -P jlordiales/python-micro-service

ExecStop=/usr/bin/docker stop python-service

Lets go through the unit file and see what each section is doing. The first line simply sets a description for our unit, which is helpful when looking at all the units that are currently running. The following 2 lines Requires and After specify ordering dependencies between units (the full documentation can be seen here). Since we are running a docker container we need the docker process to be started first. This dependency also means that if the docker unit is stopped this python-test.service unit will also be stopped.

We then have the [Service] section, which effectively describes how our service should run. We first tell systemd not to wait for a completion signal from our service (with TimeoutStartSec=0). Next, we ask systemd to restart our container whenever it exits unexpectedly (exit code different than 0). This is extremely useful if we want to have a self-healing cluster and we’ll see how this works in a moment.

Finally, the Exec* commands telling systemd how to run our container. The ExecStartPre commands are run before our container is started and are basically there to setup the environment to ensure that our main process can run smoothly. In our example, we make sure that no container with the same name is running by doing a docker kill and docker rm. Note that this 2 lines are prefixed with a - before the command to run. This is very important because by default systemd will execute the commands in the order they are specified and will stop as soon as one of them returns a non-zero exit code. By prefixing the command with - systemd will ignore the exit code and continue executing the next one. We need to do that for docker kill and docker rm because those commands will fail if there is no container named python-service.

The 2 remaining lines are pretty self-explanatory. ExecStart is the command that will start the main process for our unit. In our case we run our container as we usually would, specifying a name and the -P to expose its ports. One important thing to notice here is that we don’t pass the -d flag to docker (to run in detached mode). If we do that the unit would run for a few seconds and then exit, because the container would not be started as a child of the unit’s PID. Which basically means that from the unit’s point of view there is nothing to run. The ExecStop command in the last line will do a docker stop whenever we tell systemd to stop our unit.

So now that we have our unit file, how do we run it? Well, first we need to load the unit into our cluster because so far this is only a text file that we have edited in our local environment (outside of any of the CoreOS hosts). We can do this with fleetctl submit python-test.service. To see that the unit was actually submitted we can do a fleetctl list-unit-files, which should give us the list of all units that our cluster knows about. You can even look at the contents of the unit with fleetctl cat python-test.service.

With the unit file submitted we can now do:

$ fleetctl start python-test.service

Unit python-test.service launched on 0b9fd6f8.../172.17.8.104

In this case, Fleet decided that the node 172.17.8.104 was good enough to run our container. If we want to see all the currently running units we can do that with:

$ fleetctl list-units

UNIT                    MACHINE                         ACTIVE  SUB
python-test.service     0b9fd6f8.../172.17.8.104        active  running

We can also check the status of any given unit:

$ fleetctl status python-test.service

● python-test.service - Python service
   Loaded: loaded (/run/fleet/units/python-test.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Sat 2015-07-04 1022 ; 50s ago
  Process: 1726 ExecStartPre=/usr/bin/docker pull jlordiales/python-micro-service (code=exited, status=0/SUCCESS)
  Process: 1719 ExecStartPre=/usr/bin/docker rm python-service (code=exited, status=1/FAILURE)
  Process: 1655 ExecStartPre=/usr/bin/docker kill python-service (code=exited, status=1/FAILURE)
 Main PID: 1776 (docker)
   Memory: 8.3M
   CGroup: /system.slice/python-test.service
           └─1776 /usr/bin/docker run --name python-service -P jlordiales/python-micro-service

Jul 04 1007 core-04 docker[1726]: 595ded12b855: Pulling fs layer
Jul 04 1009 core-04 docker[1726]: 595ded12b855: Download complete
Jul 04 1009 core-04 docker[1726]: 7e0b582bc16d: Pulling metadata
Jul 04 1010 core-04 docker[1726]: 7e0b582bc16d: Pulling fs layer
Jul 04 1022 core-04 docker[1726]: 7e0b582bc16d: Download complete
Jul 04 1022 core-04 docker[1726]: 7e0b582bc16d: Download complete
Jul 04 1022 core-04 docker[1726]: Status: Downloaded newer image for jlordiales/python-micro-service:latest
Jul 04 1022 core-04 systemd[1]: Started Python service.
Jul 04 1022 core-04 docker[1776]: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
Jul 04 1022 core-04 docker[1776]: * Restarting with stat

There we can see that our process is active and running. We can also see the exit code of the 3 ExecStartPre instructions we discussed before. Finally, we can see some of the output from each process. To make sure that our container is running and responding where Fleet says it is we can get the Machine Id from the output of fleetctl list-units that we saw before (0b9fd6f8 in our case) and ssh directly into it with:

$ fleetctl list-units

UNIT                    MACHINE                         ACTIVE  SUB
python-test.service     0b9fd6f8.../172.17.8.104        active  running

$ fleetctl ssh 0b9fd6f8

core-03$ docker ps

CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS                     NAMES
bd6681b7eef3        jlordiales/python-micro-service:latest   "python app.py"     19 minutes ago      Up 19 minutes       0.0.0.0:32768->5000/tcp   python-service

core-03$ curl localhost:32768

Hello World from bd6681b7eef3

Self-healing nodes

When I was describing the unit file for the python unit, I briefly showed a property called Restart=on-failure, which means that systemd will automatically restart the process if it exits with an exit code different than 0. Lets see if this really works in our example. We’ll ssh again into the node running our container and kill it to see what happens:

$ fleetctl ssh 0b9fd6f8

core-03$ docker ps

CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS                     NAMES
bd6681b7eef3        jlordiales/python-micro-service:latest   "python app.py"     29 minutes ago      Up 29 minutes       0.0.0.0:32768->5000/tcp   python-service

core-03$ docker kill python-service

core-03$ docker ps

CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS                     NAMES
1da6c1b5cafc        jlordiales/python-micro-service:latest   "python app.py"     45 seconds ago      Up 44 seconds       0.0.0.0:32769->5000/tcp   python-service

Awesome! We killed the first running container and within seconds systemd started a new one for us. If however we use docker stop instead of docker kill (therefore stopping the container gracefully) systemd won’t try to restart it.

That is great if the process is killed for some reason but what happens if the entire node disappears all of the sudden? I won’t show it here but you can easily simulate this by doing a vagrant halt on the VM where your unit was placed. Fleet will detect that the node is dead and re-distribute all the units that were running in that node across the rest of cluster.

High availability services

One of the main benefits of using Fleet to mange our units is that it becomes really easy to run a highly available service with multiple instances running in different nodes. This, combined with the self-healing property we discussed in the previous section gives you a lot of power to do pretty cool stuff.

This replication of any given service across your nodes is enabled by something called Template unit files. This basically means that you can write a regular unit file like the one we wrote before and use this as a template to instantiate new units. The only difference is in the name of the unit file, that should now follow the patter @.. For example, for our previous python-test.service we should rename it to python-test@.service.

Lets rename our unit file and see how we can start as many instances of our python container as we want. But first, remove the unit file we loaded before with fleetctl destroy python-test.service. Now we can rename our unit and submit it to our cluster in the same way as we did for the first one:

$ mv python-test.service python-test@.service
$ fleetctl submit python-test@.service

With our template loaded in the cluster we can now start instances of that template using the name and some suffix after the @. For instance:

$ fleetctl start python-test@1 python-test@2 python-test@random

$ fleetctl list-units

UNIT                            MACHINE                         ACTIVE  SUB
python-test@1.service           0b9fd6f8.../172.17.8.104        active  running
python-test@2.service           128a2e32.../172.17.8.103        active  running
python-test@random.service      3f608471.../172.17.8.106        active  running

Here we can see that we started 3 instances of our python container. We can also use our shell expansion functionality to start multiple instances:

$ fleetctl start python-test@{3..5}

$ fleetctl list-units

UNIT                            MACHINE                         ACTIVE  SUB
python-test@1.service           0b9fd6f8.../172.17.8.104        active  running
python-test@2.service           128a2e32.../172.17.8.103        active  running
python-test@3.service           73c0b7fc.../172.17.8.105        active  running
python-test@4.service           eabc97ed.../172.17.8.102        active  running
python-test@5.service           2addf739.../172.17.8.101        active  running
python-test@random.service      3f608471.../172.17.8.106        active  running

Telling Fleet where to run your containers

By default Fleet makes no guarantees as to where in the cluster your units will run. In the last example from the previous section we saw that we started 6 different instances of our python unit and it just so happens that Fleet decided to run one on each node.

So what do we do if we have dependencies between our different units. Imagine for instance that you have 2 different containers, one running your application and one running a monitoring agent for that application. In that case you want to keep those 2 running on the same node always. Similarly, if you want to run multiple instances of your service to scale horizontally you want those to run on different nodes.

To do this, Fleet provides a set of fleet-specific options that allows you to control how the scheduling engine of Fleet will work. We’ll see how these work with some examples. But first we’ll need another unit file:

$ cat hello-world@.service

[Unit]
Description=MyApp
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill busybox1
ExecStartPre=-/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop busybox1

This simply runs a container that will keep printing a Hello World message to stdout.

Running units together

Now imagine that we want to run this hello-world unit together with our python-test one but we want to make sure that these 2 always run together in the same node. We can use the MachineOf Fleet attribute to achieve this.

$ cat python-test@.service

[Unit]
Description=Python service
Requires=docker.service
After=docker.service

[Service]
TimeoutStartSec=0
Restart=on-failure

ExecStartPre=-/usr/bin/docker kill python-service
ExecStartPre=-/usr/bin/docker rm python-service
ExecStartPre=/usr/bin/docker pull jlordiales/python-micro-service

ExecStart=/usr/bin/docker run --name python-service -P jlordiales/python-micro-service

ExecStop=/usr/bin/docker stop python-service

[X-Fleet]
MachineOf=hello-world@%i.service

We added the [X-Fleet] section to our unit file specifying that our unit should only be placed wherever there’s also a hello-world unit running. Lets see what happens when we submit these 2 units into our cluster:

$ fleetctl submit python-test@.service hello-world@.service
$ fleetctl start python-test@1 hello-world@1

$ fleetctl list-units

UNIT                    MACHINE                         ACTIVE  SUB
hello-world@1.service   0b9fd6f8.../172.17.8.104        active  running
python-test@1.service   0b9fd6f8.../172.17.8.104        active  running

As we expected, the 2 units were scheduled on the same node. The same thing would happen if we start multiple instances of each unit at the same time:

$ fleetctl start python-test@{2..4} hello-world@{2..4}

$ fleetctl list-units

UNIT                    MACHINE                         ACTIVE  SUB
hello-world@1.service   0b9fd6f8.../172.17.8.104        active  running
hello-world@2.service   128a2e32.../172.17.8.103        active  running
hello-world@3.service   2addf739.../172.17.8.101        active  running
hello-world@4.service   3f608471.../172.17.8.106        active  running
python-test@1.service   0b9fd6f8.../172.17.8.104        active  running
python-test@2.service   128a2e32.../172.17.8.103        active  running
python-test@3.service   2addf739.../172.17.8.101        active  running
python-test@4.service   3f608471.../172.17.8.106        active  running

This dependency between units also means that if the unit we depend on (hello-world in our example) is destroyed then all the units that were dependant on that one (python-test in our example) will also be destroyed. We can see this if we do:

$ fleetctl destroy hello-world@4

$ fleetctl list-units
UNIT                    MACHINE                         ACTIVE  SUB
hello-world@1.service   0b9fd6f8.../172.17.8.104        active  running
hello-world@2.service   128a2e32.../172.17.8.103        active  running
hello-world@3.service   2addf739.../172.17.8.101        active  running
python-test@1.service   0b9fd6f8.../172.17.8.104        active  running
python-test@2.service   128a2e32.../172.17.8.103        active  running
python-test@3.service   2addf739.../172.17.8.101        active  running

We removed hello-world@4 and Fleet automatically removed python-test@4 as well.

Running units away from each other

We saw how to run multiple units guaranteeing that they will be put always in the same node. How about the opposite scenario, running 2 or more units making sure that they are never put on the same node. We can use Fleet’s Conflicts option to achieve this. Let’s change our python-test@.service unit file to use this new option:

$ cat python-test@.service

[Unit]
Description=Python service
Requires=docker.service
After=docker.service

[Service]
TimeoutStartSec=0
Restart=on-failure

ExecStartPre=-/usr/bin/docker kill python-service
ExecStartPre=-/usr/bin/docker rm python-service
ExecStartPre=/usr/bin/docker pull jlordiales/python-micro-service

ExecStart=/usr/bin/docker run --name python-service -P jlordiales/python-micro-service

ExecStop=/usr/bin/docker stop python-service

[X-Fleet]
Conflicts=hello-world@%i.service

If we submit this new unit file and run multiple instances of our 2 services we’ll see that Fleet will place them on different nodes. If Fleet can not find a distribution that satisfies the constraints specified in the unit files then it will simply refuse to schedule them.

Conclusion

Container orchestration and scheduling is an exciting and relatively new area that is under heavy development by different players. CoreOS presents an easy and lightweight approach using etcd, Fleet and Docker as its backbone. In this post we saw how easy it is to create a local CoreOS cluster with Vagrant and run highly available and self-healing services with the help of Fleet.

By combining a few simple configuration values, we can ensure that our services are distributed across different regions and availability zones. This, combined with the fact that we can run CoreOS on pretty much any cloud provider or hardware, enables us to have very complex architectures with pretty much no manual intervention.

Accessing docker containers on localhost when using Boot2Docker

Thu, 02 Apr 2015 00:00:00 +0000

If you have been following my posts on Docker then you know by now that I usually run on OSX with Boot2Docker. It is definitely a really useful tool if you are not on a native Linux kernel and it makes using Docker on Mac and Windows almost as easy and transparent as if you were on Linux. That is, until you need to expose one or more ports from your containers and then you want to access those from your host. If you are on Linux then you can simply go to localhost and the port number and that’s it. If you are using boot2docker however, you need to remember that your docker host is actually the boot2docker VM and not your laptop, so you first need to know what that VM’s IP is. In this very short post I want to describe a way in which you can access your containers on localhost even if you are using boot2docker.

The important thing to know is that boot2docker is a Virtual Machine that runs on Virtual Box. And as with any other VM you can forward ports between your host and guest operating system. That means that if we can get Virtual Box to forward whatever port we expose from our containers, from our host OS to the boot2docker VM then those ports will be accessible from our localhost.

So how can we do that? I don’t really know if this works on Windows (I assume it does) but on Mac you can use the VBoxManage command line tool to control the different VMs that Virtual Box manages. So let’s imagine that our Nginx container exposes port 80 and then we map that to port 8080 on the VM when we do a docker run -p 8080:80. Normally you would be able to access this by going to http://$DOCKER_IP:8080. But with VBoxManage you can do: VBoxManage controlvm boot2docker-vm natpf1 "nginx,tcp,127.0.0.1,8080,,8080". This basically means: “Take the boot2docker-vm and create a new NAT rule called nginx that will forward all requests on the localhost (127.0.0.1) port 8080 to port 8080 on the VM”. Now we can access our container on http://localhost:8080. Simple as that! Best of all is that you can do this while the VM and your containers are running and doesn’t require you to restart anything. When you are done and you want to delete the NAT rule you can just do VBoxManage controlvm boot2docker-vm natpf1 delete "nginx".

But why would you do this?

Arguably, typing $DOCKER_IP instead of localhost makes little to no difference. In fact, if you count the time it takes you to do the VBoxManage stuff then it is probably slower. In the case when you are just playing around with different containers and want to test things locally I agree that this makes no sense. But sometimes it can be quite useful.

Imagine for instance that you are developing a webapp. You run all the backend services that your app needs as docker containers and then you just run the frontend part in your laptop, pointing to these containers. If you are developing on Mac then you would have to point your webapp to $DOCKER_IP but if then you move to Ubuntu for instance, you would need to change all places where you were previously using $DOCKER_IP to use localhost instead. In this scenario, creating a little script that runs the containers and then uses VBoxManage to forward the exposed ports can give you better portability between different platforms.

In any case, whether you find a good use case for it or not it is still good to know that you have that option if you ever need it. The rest is up to you!

Cheers!

Consul Template for transparent load balancing of containers

Wed, 01 Apr 2015 00:00:00 +0000

In the previous post we talked about Registrator and how, combined with a service discovery backend like Consul, it allows us to have transparent discovery for our containers while still keeping their portability. One thing we didn’t talk about though is how are we supposed to access those services registered in Consul from our consumer applications, which could be running as containers themselves.

As an example, imagine we have a service exposing a REST API. To provide horizontal scalability we decide to run 3 instances of that service, all registered in Consul. Each container will be listening on a random port assigned by Docker, so how do we know where to connect to from our consumers? We can use Consul’s own DNS capabilities, as we saw on the last post, but even though Consul offers the possibility of asking for SRV records (which include the port information as well as the IP) most client libraries in modern programming languages don’t care about this information and only use the IP address, leaving the task of specifying the port to the developer. We could always use Consul’s REST API to query for the services we are interested in and parse the IP and Port from there. But this approach seems rather complex and it would couple our consumer app to Consul’s specific API.

In this post I want to explore one possible approach to solve this problem in a portable and transparent way, both from the point of view of our services as from the point of view of our consumers. It is certainly not the only possible approach nor the best but it is something that I have seen working quite successfully in the past.

Introduction

Lets think about our current problem again. We have 2 or more containers that expose a REST API and we want to consume that API from another application. We are using Consul as a service discovery mechanism and Registrator to transparently register our containers there. We know that we can get the IP of our service by using Consul’s DNS interface but we don’t know which port on that IP to use. For the purposes of this post, our service container will be the Python service that we have been using so far (available in the Docker hub as jlodiales/python-micro-service). In turn, our consumer will simply be the curl command line tool.

It would be great if there was a proxy running on a well known port that we could send requests to. That proxy would then pass the request to the correct service and transmit the response back to us. This sounds a lot like something that Nginx or HAProxy could do. But now we have just moved the problem one step further. That is, how does our proxy know which port our containers are running on? Luckily for us, the guys from HashiCorp have developed a little standalone tool to do just this: Consul Template.

Consul Template

From the project’s Github repo:

This project provides a convenient way to populate values from Consul into the filesystem using the consul-template daemon. The daemon consul-template queries a Consul instance and updates any number of specified templates on the filesystem. As an added bonus, consul-template can optionally run arbitrary commands when the update process completes.

We’ll see how this works with a simple example. First, we’ll run our Consul cluster. For simplicity we’ll run just one node but exactly the same would apply on a multi-node setup.

$ docker run -p 8400:8400 -p 8500:8500 -p 8600:53/udp \
-h consul --name consul \
progrium/consul -server -advertise $DOCKER_IP -bootstrap

Notice that we are advertising the $DOCKER_IP as Consul’s IP. The reason for that is that Registrator will always register new containers as accessible in Consul’s advertise IP. We discussed this in the previous post. Also, as a remainder, the DOCKER_IP variable is simply boot2docker’s IP (export DOCKER_IP=$(boot2docker ip 2> /dev/null)). If you are running on native Linux then that would be localhost.

Now that we have Consul running, we’ll do the same for Registrator:

$ docker run -d \
-v /var/run/docker.sock:/tmp/docker.sock \
--name registrator -h registrator \
gliderlabs/registrator:latest consul://$DOCKER_IP:8500

And finally our Python service. As we said before, lets imagine we want to run 3 instances of it:

$ docker run -d -P --name node1 -h node1 jlordiales/python-micro-service:latest
$ docker run -d -P --name node2 -h node2 jlordiales/python-micro-service:latest
$ docker run -d -P --name node3 -h node3 jlordiales/python-micro-service:latest

We can query consul to make sure that our new containers are running:

$ curl $DOCKER_IP:8500/v1/catalog/service/python-micro-service

[
  {
    "Address": "192.168.59.103",
    "Node": "node1",
    "ServiceAddress": "",
    "ServiceID": "registrator5000",
    "ServiceName": "python-micro-service",
    "ServicePort": 49162,
    "ServiceTags": null
  },
  {
    "Address": "192.168.59.103",
    "Node": "node1",
    "ServiceAddress": "",
    "ServiceID": "registrator5000",
    "ServiceName": "python-micro-service",
    "ServicePort": 49163,
    "ServiceTags": null
  },
  {
    "Address": "192.168.59.103",
    "Node": "node1",
    "ServiceAddress": "",
    "ServiceID": "registrator5000",
    "ServiceName": "python-micro-service",
    "ServicePort": 49164,
    "ServiceTags": null
  }
]

Now for the fun part. We’ll install Consul Template and see what happens when we run it against our current setup. We can get the latest release from here for whatever architecture we are running on. In my case I’m running on a Mac so:

$ wget https://github.com/hashicorp/consul-template/releases/download/v0.7.0/consul-template_0.7.0_darwin_amd64.tar.gz -O /tmp/consul-template.tar.gz
$ tar -xvzf /tmp/consul-template.tar.gz -C /tmp --strip-components=1

Next, we’ll write a simple template and run consul-template to parse it and generate the result. You can read all about the templates syntax and provided functions at the project’s documentation:

$ echo '{{range service "python-micro-service"}}\nserver {{.Address}}:{{.Port}}{{end}}' > /tmp/consul.ctmpl
$ /tmp/consul-template -consul $DOCKER_IP:8500 -template /tmp/consul.ctmpl:/tmp/consul.result -dry -once

> /tmp/consul.result

server 192.168.59.103:49162
server 192.168.59.103:49163
server 192.168.59.103:49164

By specifying the -dry parameter we tell consul-template to send the output to stdout instead of the file specified on the command (/tmp/consul.result in this case). The -once parameter tells Consul Template to query Consul and generate the output just once. If we don’t do this then the app will keep running in the foreground polling Consul at regular intervals (which is what we would want in a typical scenario). You can see that the result includes the 3 instances of the service with their respective ports.

To see what happens when we change the information registered in Consul, we are going to run consul-template again but this time we won’t specify the -once parameter in order to leave the daemon running:

$ /tmp/consul-template -consul $DOCKER_IP:8500 -template /tmp/consul.ctmpl:/tmp/consul.result -dry

With that running, we’ll go to a new terminal and stop one of the running python containers:

$ docker stop node3

You should almost instantly see the refreshed output in the terminal running consul-template that now only shows 2 entries. Conversely, if we run a new container:

$ docker run -d -P --name node4 -h node4 jlordiales/python-micro-service:latest

The consul-template output gets updated again with the new service.

Combining Consul Template and a reverse proxy

So we saw that we can use Consul Template to parse a template file and produce a new file with the information read from Consul. How can we use this from our consumer applications in order to have transparent service location and load balance? Well, one option is to front our services with Nginx or HAProxy, creating the config files for these with Consul Template. We’ll how this would work for Nginx. All the files that I’ll describe in the following section can be found in this repo if you just want to clone from it.

I’ll first show the Dockerfile that we’ll use for the Nginx image and then explain each section of it:

FROM nginx:latest

ENTRYPOINT ["/bin/start.sh"]
EXPOSE 80
VOLUME /templates
ENV CONSUL_URL consul:8500

ADD start.sh /bin/start.sh
RUN rm -v /etc/nginx/conf.d/\*.conf

ADD https://github.com/hashicorp/consul-template/releases/download/v0.7.0/consul-template_0.7.0_linux_amd64.tar.gz /usr/bin/
RUN tar -C /usr/local/bin --strip-components 1 -zxf /usr/bin/consul-template_0.7.0_linux_amd64.tar.gz

We are basing our image from the official Nginx image, available here. This gives us a ready to use, default Nginx installation. Then we say that the entrypoint will be the /bin/start.sh (will see that one in a bit) and that our container will expose port 80, where Nginx will be listening for new connections. Next we define a volume /templates, which is where we will mount our template files from the host. This way we can reuse the same image for different services and templates. In the following step we define and environment variable with the location of our Consul cluster. By default, it will try to resolve to consul:8500 which would be the behavior if we have Consul running as a container in the same host and we link it to this Nginx container (with the alias consul, of course). But this environment variable can also be overridden when we run the container if we want to point somewhere else. We then add the start up script (which is used as the entrypoint to our containers) and remove all default configurations from Nginx. On the last 2 lines we download the latest version of Consul Template (0.7.0 at the time of writing this) and we put it on /usr/local/bin.

The start.sh script is very simple:

#!/bin/bash
service nginx start
consul-template -consul=$CONSUL_URL -template="/templates/service.ctmpl:/etc/nginx/conf.d/service.conf:service nginx reload"

We just start the nginx service and then leave consul-template running on the foreground. Here we use the CONSUL_URL environment variable that we defined before. Consul template expects to find a service.ctmpl file in /templates. This is the template that we would mount as a volume from our host. The result is then placed in /etc/nginx/conf.d/service.conf where Nginx will be able to read from. Finally, every time the template is re-rendered we do a service nginx reload in order to read the new configuration.

Time to see this in action. If you still have the Consul, Registrator and Python containers running from the first part of this post then you only need to run the Nginx container (otherwise start them again).

The only thing you’ll need is a template file like the following, save it as /tmp/service.ctmpl for convenience:

upstream python-service {
  least_conn;
  {% raw %}{{range service "python-micro-service"}}server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60 weight=1;
  {{else}}server 127.0.0.1:65535; # force a 502{{end}}{% endraw %}
}

server {
  listen 80 default_server;

  charset utf-8;

  location ~ ^/python-micro-service/(.\*)$ {
    proxy_pass http://python-service/$1$is_args$args;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
  }
}

Now run the nginx container with:

$ docker run -p 8080:80 -d --name nginx --volume /tmp/service.ctmpl:/templates/service.ctmpl --link consul:consul jlordiales/nginx-consul

We can curl the service multiple times:

$ curl $DOCKER_IP:8080/python-micro-service/
$ curl $DOCKER_IP:8080/python-micro-service/
$ curl $DOCKER_IP:8080/python-micro-service/
$ curl $DOCKER_IP:8080/python-micro-service/

You should see a “Hello World from nodeX” where X alternates between 1, 2 and 3. We are effectively load balancing between the 3 nodes. But there’s something even cooler that you can try. Leave this running on a terminal:

$ while true; do curl $DOCKER_IP:8080/python-micro-service/; echo -----; sleep 1; done;

That will keep calling nginx every second, which in turn will send the request to one of the 3 nodes. Now from another terminal, stop node1 with:

$ docker stop node1

If you check the terminal running the while loop you’ll notice that the requests are now going to node2 and node3 only. You can play around with this (starting and stopping nodes) to see the configuration updated almost instantaneous and nginx adjusting which nodes it sends requests to. And, more importantly, all of this while keeping our service containers and our nginx container completely ignorant about the fact that we are using Consul as a service discovery mechanism!

Conclusion

This post completes the subject of transparent service discovery in Docker. We saw how we can use a reverse proxy sitting in front of our containers, accessible through a well known port. The proxy, in turn is kept updated with the information available in our Consul cluster thanks to a small and handy tool called Consul Template.

Combined with Registrator and Consul this gives us extreme flexibility and portability. Of course, as with almost everything else, there are other alternatives and approaches. I would be glad to hear other people’s experiences around this area.

Automatic container registration with Consul and Registrator

Tue, 03 Feb 2015 00:00:00 +0000

In the previous post we talked about Consul and how it can help us towards a highly available and efficient service discovery. We saw how to run a Consul cluster, register services, query through its HTTP API as well as its DNS interface and use the distributed key/value store. One thing we missed though was how to register the different services we run as docker containers with the Cluster. In this post I’m going to talk about Registrator, an amazing tool that we can run as a docker container whose responsibility is to make sure that new containers are registered and deregistered automatically from our service discovery tool.

Introduction

We’ve seen how to run a Consul cluster and we’ve also seen how to register services in that cluster. With this in place we could, in principle, start running other Docker containers with our services and register those containers with Consul. However, who should be responsible for registering those new containers?

You could let each container know how to register itself. There are some problems with this approach. First, you give up one of the main benefits of using containers: portability. If the logic of how the container needs to join the cluster is inside of it then suddenly you can not run that same container if you decide to use a different service discovery mechanism or if you decide to use no service discovery at all. Another potential issue is that containers are supposed to do just one thing and do that well. The container that runs your user service should not care about how that service will be discovered by others. The last problem is that you will not always be in control of all the containers you use. One of the strong points of Docker is the huge amount of already dockerized applications and services available in their registry. Those containers will have no idea about your Consul cluster.

Registrator

To solve these problems meet registrator. It is designed to be run as an independent Docker container. It will sit there quietly, watching for new containers that are started on the same host where it is currently running, extracting information from them and then registering those containers with your service discovery solution. It will also watch for containers that are stopped (or simply die) and will deregister them. Additionally, it supports pluggable service discovery mechanisms so you are not restricted to any particular solution.

Lets quickly see how we can run registrator together with our Consul cluster.

Setting up our hosts

So far we have always run our Consul cluster and all our services in just one host (the boot2docker VM). In this post I’ll try to simulate a more “production-like” environment were we might have several hosts, each running one or more docker containers with our services and each running a Consul agent.

In order to do this, we’ll use Vagrant to create 3 CoreOS VMS running locally. The Vagrantfile will look like this:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "yungsang/coreos"
  config.vm.network "private_network", type: "dhcp"

  number_of_instances = 3
  (1..number_of_instances).each do |instance_number|
    config.vm.define "host-#{instance_number}" do |host|
      host.vm.hostname = "host-#{instance_number}"
    end
  end
end

After you save the Vagrantfile you can start the 3 VMs with vagrant up. It might take a while the first time while it downloads the CoreOS image. At this point you should be able to see the 3 VMs running:

$ vagrant status

Current machine states:

host-1             running (virtualbox)
host-2             running (virtualbox)
host-3             running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.

We’ll now ssh into the first host and check that docker is installed and running (which happens by default when you use the CoreOS image):

$ vagrant ssh host-1

host-1$ docker info
Containers: 0
Images: 0
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.17.2
Operating System: CoreOS 494.5.0

Similarly for the second host:

$ vagrant ssh host-2

host-2$ docker info
Containers: 0
Images: 0
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.17.2
Operating System: CoreOS 494.5.0

And the third:

$ vagrant ssh host-3

host-3$ docker info
Containers: 0
Images: 0
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.17.2
Operating System: CoreOS 494.5.0

Now, before we start running all our different containers I wanted to show how our hosts look like from a networking point of view. Notice that we specified a “private_network” interface for our VMs in our Vagrantfile. This basically means that our VMs will be able to communicate with each other as if they were inside the same local network. We can see this if we check the network configuration on each one:

host-1$ ifconfig enp0s8 | grep 'inet ' | awk '{ print $2 }'

172.28.128.3

host-2$ ifconfig enp0s8 | grep 'inet ' | awk '{ print $2 }'

172.28.128.4

host-3$ ifconfig enp0s8 | grep 'inet ' | awk '{ print $2 }'

172.28.128.5

Each VM has other network adapters, but for now we’ll focus on this particular one. We can see that all 3 machines are part of the 172.28.128.0/24 network. On a production setup the different machines are probably not going to be on the same private network but we can still achieve this using virtual networks most of the time (VPC on AWS for instance). This is usually a very good idea because the public facing IP shoud be firewalled but we don’t need that while we communicate between our internal services.

Starting the Consul cluster

The first thing we’ll do is to start our Consul cluster. We are going to use a 3 node cluster, similarly to how we did it in our previous post. I’ll show the full docker run commands here, but don’t run those yet. I’ll show a more concise form later on:

host-1$ docker run -d -h node1 -v /mnt:/data \
-p 172.28.128.38300 \
-p 172.28.128.38301 \
-p 172.28.128.38301/udp \
-p 172.28.128.38302 \
-p 172.28.128.38302/udp \
-p 172.28.128.38400 \
-p 172.28.128.38500 \
-p 172.17.42.153/udp \
progrium/consul -server -advertise 172.28.128.3 -bootstrap-expect 3

In this docker run command, we are binding all Consul’s internal ports to the private IP address of our first host, except for the DNS port (53) which is exposed only on the docker0 interface (172.17.42.1 by default). The reason why we use the docker bridge interface for the DNS server is that we want all the containers running on the same host to query this DNS interface, but we don’t need anyone from outside doing the same. Since each host will be running a Consul agent, each container can query its own host. We also added the -advertise flag to tell Consul that it should use the host’s IP instead of the docker container’s IP.

On the second host, we’d run the same thing, but passing a -join to the first node’s IP:

host-2$ docker run -d -h node2 -v /mnt:/data \
-p 172.28.128.48300 \
-p 172.28.128.48301 \
-p 172.28.128.48301/udp \
-p 172.28.128.48302 \
-p 172.28.128.48302/udp \
-p 172.28.128.48400 \
-p 172.28.128.48500 \
-p 172.17.42.153/udp \
progrium/consul -server -advertise 172.28.128.4 -join 172.28.128.3

Same for the third one:

host-3$ docker run -d -h node3 -v /mnt:/data \
-p 172.28.128.58300 \
-p 172.28.128.58301 \
-p 172.28.128.58301/udp \
-p 172.28.128.58302 \
-p 172.28.128.58302/udp \
-p 172.28.128.58400 \
-p 172.28.128.58500 \
-p 172.17.42.153/udp \
progrium/consul -server -advertise 172.28.128.5 -join 172.28.128.3

Since the docker run command for each host can be quite large and error prone to type in manually, the progrium/consul image comes with a convenient command to generate this for you. You can try this on any of the 3 hosts:

$ docker run --rm progrium/consul cmd:run 172.28.128.3 -d -v /mnt:/data

eval docker run --name consul -h $HOSTNAME  \
-p 172.28.128.38300   \
-p 172.28.128.38301   \
-p 172.28.128.38301/udp \
-p 172.28.128.38302 \
-p 172.28.128.38302/udp       \
-p 172.28.128.38400  \
-p 172.28.128.38500\
-p 172.17.42.153/udp \
-d -v /mnt:/data  progrium/consul -server -advertise 172.28.128.3 -bootstrap-expect 3

Note that this is the exact command we ran on our first host to bootstrap the cluster. You can also try the following:

$ docker run --rm progrium/consul cmd:run 172.28.128.4:172.28.128.3 -d -v /mnt:/data

eval docker run --name consul -h $HOSTNAME      \
-p 172.28.128.48300 \
-p 172.28.128.48301       \
-p 172.28.128.48301/udp   \
-p 172.28.128.48302      \
-p 172.28.128.48302/udp   \
-p 172.28.128.48400       \
-p 172.28.128.48500       \
-p 172.17.42.153/udp      \
-d -v /mnt:/data progrium/consul -server -advertise 172.28.128.4 -join 172.28.128.3

Here we passed 2 IPs to the cmd:run command, first the node’s own address (the one that will be used for the -advertise) and the second the IP of one of the nodes that is already in the cluster (the IP in the -join part). Note also that by specifying a second IP the cmd:run command now removed the -bootstrap-expect parameter, which makes sense because otherwise each node would start a different cluster.

We can use the 2 forms of the “cmd:run” command above to bootstrap our cluster with a lot less typing. First, stop and remove all running containers on each host with the following command:

$ docker rm -f $(docker ps -aq)

Now, on the first host:

host-1$ $(docker run --rm progrium/consul cmd:run 172.28.128.3 -d -v /mnt:/data)

For the second node:

host-2$ $(docker run --rm progrium/consul cmd:run 172.28.128.4:172.28.128.3 -d -v /mnt:/data)

And the third node:

host-3$ $(docker run --rm progrium/consul cmd:run 172.28.128.5:172.28.128.3 -d -v /mnt:/data)

If you take a look at the logs in host-1 with docker logs consul you would see both nodes joining and finally Consul starting the cluster and setting the 3 nodes as healthy.

Working with Registrator

Now that we have our Consul cluster up and running we can start the registrator container with:

host-1$  export HOST_IP=$(ifconfig enp0s8 | grep 'inet ' | awk '{ print $2  }')
host-1$  docker run -d \
-v /var/run/docker.sock:/tmp/docker.sock \
--name registrator -h registrator \
progrium/registrator:latest consul://$HOST_IP:8500

Notice that we are mounting our “/var/run/docker.sock” file to the container. This file is a Unix socket, where the docker daemon listens for events. This is actually how the docker client (the docker command that you usually use) and the docker daemon communicate, through a REST API accessible from this socket. If you want to learn more about how you can interact with the docker daemon through this socket take a look here. The important thing to know is that by listening on the same port as Docker, Registrator is able to know everything that happens with Docker on that host.

If you check the logs of the “registrator” container you’ll see a bunch of stuff and a message in the end indicating that it is waiting for new events. You should run the same commands on the other 2 containers to start registrator on those.

To summarize what we have done so far, we have 3 different hosts each running a Consul agent and a registrator container. The registrator instance on each host watches for changes in docker containers for that host and talks to the local Consul agent.

Starting our containers

Let’s see what happens when we run our python service from the first post in this Docker series. You can do this following the step by step guide on that post, getting the code from this repo and building the docker image yourself or using the image that is already on the public registry jlordiales/python-micro-service. I will go with the latter option here. We’ll first run our python container on host-1:

host-1$ docker run -d --name service1 -P jlordiales/python-micro-service

Lets see what happened in our registrator container:

host-1$ docker logs registrator

2015/02/02 1826 registrator: added: a8dc2b849d99 registrator5000

Registrator saw that a new container (service1) was started, exposing port 5000 and it registered it with our Consul cluster. We’ll query our cluster now to see if the service was really added there:

host-1$ curl 172.28.128.3:8500/v1/catalog/services

{
  "consul":[],
  "consul-53":["udp"],
  "consul-8300":[],
  "consul-8301":["udp"],
  "consul-8302":["udp"],
  "consul-8400":[],
  "consul-8500":[],
  "python-micro-service":[]
}

There it is! Lets get some more details about it:

host-1$ curl 172.28.128.3:8500/v1/catalog/service/python-micro-service

[
  {
    "Node":"host-1",
    "Address":"172.28.128.3",
    "ServiceID":"registrator5000",
    "ServiceName":"python-micro-service",
    "ServiceTags":null,
    "ServicePort":49154
  }
]

One important thing to notice here, as it caused a lot of frustration to people before. You can see that Registrator used the IP of the host as the service IP rather than the IP address of the container. The reason for that is explained in this pull request to update the FAQ (which should be merged IMHO).

In a nutshell, registrator will always use the IP you specified when you run your consul agent with the -advertise flag. At first, this seems wrong, but it is usually what you want. A service in a Docker based production cluster typically has 3 IP addresses. The service itself is running in a Docker container, which has an IP address assigned by Docker. The host that it’s running on will have 3 IP addresses: one for the Docker network, an internal private IP address for all hosts in the cluster, and a public address on the Internet. Unless you’ve bridged your docker networks, the IP address of the service container is not accessible from other hosts in the cluster. Instead you use the “-P” or “-p” option to Docker to map the service port onto the host. You then advertise a Host IP as the service IP. The public IP address should be firewalled, so you want the internal private IP to be advertised.

Going back to the output of our last curl, we get the private IP of our “host-1” which is where our docker container is running with an exposed port (49154 in this case). With that information we could call our service from any other node in any host, as long as they are able to reach “host-1” through its private IP that is.

So what would happen now if we run a second “python-micro-service” container from our second host?

host-2$ docker run -d --name service2 -P jlordiales/python-micro-service

As we saw on the last post, whenever we have a Consul cluster running we can query any node (client or server) and the response should always be the same. Since we are running our containers in host-1 and host-2, lets query the Consul node on host-3:

host-3$ curl 172.28.128.5:8500/v1/catalog/service/python-micro-service

[
  {
    "Node":"host-1",
    "Address":"172.28.128.3",
    "ServiceID":"registrator5000",
    "ServiceName":"python-micro-service",
    "ServiceTags":null,
    "ServicePort":49154
  },
  {
    "Node":"host-2",
    "Address":"172.28.128.4",
    "ServiceID":"registrator5000",
    "ServiceName":"python-micro-service",
    "ServiceTags":null,
    "ServicePort":49153
  }
]

We now have two containers offering the same service. Using this information we could call either one from host-3:

host-3$ curl 172.28.128.3:49154

Hello World from a8dc2b849d99

host-3$ curl 172.28.128.4:49153

Hello World from c9ca6addfdb0

Integrating our containers with Consul’s DNS

Lets try one more thing: using Consul’s DNS interface from a different container to ping our service. We’ll run a simple busybox container in host-3:

host-3$  docker run --dns 172.17.42.1 --dns 8.8.8.8 --dns-search service.consul
--rm --name ping_test -it busybox

The “–dns” parameter allows us to use a custom DNS server for our container. By default the container will use the same DNS servers as its host. In our case we want it to use the docker bridge interface (172.17.42.1) first and then, if it can not find the host there go to Google’s DNS (8.8.8.8). Finally, the “dns-search” option makes it easier to query for our services. For instance, instead of querying for “python-micro-service.service.consul” we can just query for “python-micro-service”. Let’s try to ping our service from the new busybox container:

$ ping -qc 1 python-micro-service

PING python-micro-service (172.28.128.4): 56 data bytes

--- python-micro-service ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 2.391/2.391/2.391 ms

It effectively resolved our service name to one of the hosts where it is currently running. If we keep running the same “ping” command multiple times we will eventually see that it will resolve the hostname to 172.28.128.3, which is the other host where our service is running. This is well explained in the documentation but Consul will load balance between all nodes running the same service as long as they are healthy.

Of course, if we stop a running container Registrator will notice it and also remove the service from Consul. We can see that if we stop the container running in host-1:

host-1$ docker stop service1

And then query again from host-3 like we did before (you can do the same from host-1, it doesn’t matter):

host-3$ curl 172.28.128.5:8500/v1/catalog/service/python-micro-service

[
  {
    "Node":"host-2",
    "Address":"172.28.128.4",
    "ServiceID":"registrator5000",
    "ServiceName":"python-micro-service",
    "ServiceTags":null,
    "ServicePort":49153
  }
]

Conclusion

In this post we have seen an approach that allows to have our containers registered with the service discovery solution of our choice without the need to couple both. Instead, an intermediary tool called Registrator manages this for all the containers running on a particular host.

We used Vagrant to create 3 different virtual hosts all under the same private network. We started our 3 nodes Consul cluster, one consul container running on each host. We did the same thing for Registrator, one container running on each host pointing to its local consul container. Then we ran the container with our python endpoint. This container had no idea about Consul or registrator. We used exactly the same docker run command that we would’ve used if we were running that container alone. And yet registrator was notified about this new container and automatically registered it with the correct IP and port information on Consul. Moreover, when we ran another container in another host from the same docker image Consul saw that it was the same service and started to load balance between them. When we stopped our container registrator also saw that and automatically deregistered it from the cluster.

This is amazing because we can keep our containers completely ignorant about how they will be discovered or any other piece of infrastructure information. We can keep them portable and we move the logic of registration to a separate component running in a separate container.

The capability to run multiple containers of the same service and have Consul automatically load balancing between them, together with its health-checks and its DNS interface allow us to deploy and run really complex configurations of services in an extremely transparent and simplified way.

Where are my containers? Dockerized service discovery with Consul

Fri, 23 Jan 2015 00:00:00 +0000

In the previous post I talked a bit about Docker and the main benefits you can get from running your applications as isolated, loosely coupled containers. We then saw how to “dockerize” a small python web service and how to run this container in AWS, first manually and then using Elastic Beanstalk to quickly deploy changes to it. This was really good from an introduction to Docker point of view but in real life one single container running on a host will not cut it. You will need a set of related containers running together and collaborating, each with the ability to be deployed independently. This also means that you need a way to know which container is running what and where. In this post I wanted to talk a bit about service discovery. Particularly, I’m going to show how you can use Consul running as a container to achieve this goal in a robust and scalable way.

Consul

Consul came out of Hashicorp, the same company behind popular tools like Vagrant and Packer. They are pretty good at creating DevOps friendly tools so I take some time to play around with anything they come up with. Consul has several components that provide different functionalities but in a nutshell is a highly distributed and highly available tool for service discovery. Clients can register new services with Consul, specifying a name and additional information in the form of tags and then query Consul for services that match their criteria using either HTTP or DNS. We’ll see an example later on.

In addition to clients specifying the services they want to register they can also specify any number of health checks. The health check can be made against your application (e.g., the REST endpoint is listening to connections on port X) or on the physical node itself (e.g., the CPU utilization is above 90%). Consul will use these health checks to know which nodes it should exclude when a client queries for a specific service.

Finally, Consul also provides a highly scalable and fault tolerant Key/Value store, which your services can use for anything they want: dynamic configuration, feature flags, etc.

So how does it work? The main thing you need is a Consul agent running as a server. This Consul server is responsible for storing data and replicating it to other servers. You can have a fully functioning Consul system with just 1 server but that is usually a bad idea for a production deployment. Your server becomes your single point of failure and you can not discover your services if that server goes down. The Consul documentation recommends setting up a cluster with 3 or 5 Consul servers running to avoid data loss. More than that and the communication starts to suffer from progressively increasing overhead. In addition to running as a server, an agent can also run in client mode. These agents have a lot less responsibilities than servers and are pretty much stateless components.

Usually, nodes wanting to register services running on them with Consul do so by registering them with their local running Consul agent. However, you can also register external services so you don’t need to run a Consul agent on every node that is hosting your services.

Queries can be made against any type of Consul agent, either running as a server or as a client. Unlike servers, you can have thousands or tens of of thousands of Consul clients without any significant impact on performance or network overhead.
I would strongly suggest taking a look at its documentation to get a more detailed explanation of how all of this works.

Running a single node cluster

And now, the fun part! Lets see how we can bootstrap a Consul cluster using Docker containers. We’ll first run a Consul cluster consisting of a single server to see how it works. We’ll use the amazing image built by Jeff Lindsay:

$ docker run -p 8400:8400 -p 8500:8500 -p 8600:53/udp \
-h node1 progrium/consul -server -bootstrap

You should see something like:

==> WARNING: bootstrap mode enabled! do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'node1'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 0.0.0.0 (HTTP: 8500, DNS: 53, RPC: 8400)
      Cluster Addr: 172.17.0.66 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2014/12/04 1930 [INFO] serf: EventMemberJoin: node1 172.17.0.66
    2014/12/04 1930 [INFO] serf: EventMemberJoin: node1.dc1 172.17.0.66
    2014/12/04 1930 [INFO] raft: Node at 172.17.0.66:8300 [Follower] entering Follower state
    2014/12/04 1930 [INFO] consul: adding server node1 (Addr: 172.17.0.66:8300) (DC: dc1)
    2014/12/04 1930 [INFO] consul: adding server node1.dc1 (Addr: 172.17.0.66:8300) (DC: dc1)
    2014/12/04 1930 [ERR] agent: failed to sync remote state: No cluster leader
    2014/12/04 1931 [WARN] raft: Heartbeat timeout reached, starting election
    2014/12/04 1931 [INFO] raft: Node at 172.17.0.66:8300 [Candidate] entering Candidate state
    2014/12/04 1931 [INFO] raft: Election won. Tally: 1
    2014/12/04 1931 [INFO] raft: Node at 172.17.0.66:8300 [Leader] entering Leader state
    2014/12/04 1931 [INFO] consul: cluster leadership acquired
    2014/12/04 1931 [INFO] consul: New leader elected: node1
    2014/12/04 1931 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2014/12/04 1931 [INFO] consul: member 'node1' joined, marking health alive
    2014/12/04 1933 [INFO] agent: Synced service 'consul'

The -server -bootstrap tells Consul to start this agent in server mode and not wait for any other instances to join. Notice how Consul actually warns you about this when you start the server: bootstrap mode enabled! do not enable unless necessary.

We can now query Consul through its REST API. Since I’m running boot2docker I need to get the VM IP first:

$ export DOCKER_IP=$(boot2docker ip)
$ curl $DOCKER_IP:8500/v1/catalog/nodes

[{"Node":"node1","Address":"172.17.0.66"}]

You get a JSON response specifying the nodes that are currently part of the Consul cluster, which in our case so far is just one. You can also go to http://192.168.59.103:8500/ (replace the IP by whatever your Docker host IP is) in your browser to see a nice UI with information about the currently registered services and nodes.

Lets now add a new service. We usually want to register all the services that are under our control. But what about the external ones? It is seldom the case where we don’t use any third party services. It would certainly be nice to treat both types equally from a service discovery point of view. We’ll start by adding an external service, following the example given in the documentation:

$ curl -X PUT -d \
'{"Datacenter": "dc1", "Node": "google", "Address": "www.google.com", "Service": {"Service": "search", "Port": 80}}' \
http://$DOCKER_IP:8500/v1/catalog/register

Here we registered the “google” node as offering the “search” service. But what if google is down for some reason? (can that happen?). We can register multiple search services:

$ curl -X PUT -d \
'{"Datacenter": "dc1", "Node": "bing", "Address": "www.bing.com", "Service": {"Service": "search", "Port": 80}}' \
http://$DOCKER_IP:8500/v1/catalog/register

We can now query Consul through its HTTP API to see all the services that are currently registered with it:

$ curl $DOCKER_IP:8500/v1/catalog/services

{"consul":[],"search":[]}

We can see that the “search” service that we added before is registered. Note that we don’t see any mention about the 2 specific services we added. If we want to get more information about any particular service we can also do that:

$ curl $DOCKER_IP:8500/v1/catalog/service/search

[
  {"Node":"google","Address":"www.google.com","ServiceID":"search","ServiceName":"search","ServiceTags":null,"ServicePort":80},
  {"Node":"bing","Address":"www.bing.com","ServiceID":"search","ServiceName":"search","ServiceTags":null,"ServicePort":80}
]

We can also use the DNS interface to query for services:

dig @$DOCKER_IP -p 8600 search.service.consul.

; <<>> DiG 9.8.3-P1 <<>> @192.168.59.103 -p 8600 search.service.consul.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1330
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;search.service.consul.         IN      A

;; ANSWER SECTION:
search.service.consul.  0       IN      CNAME   www.google.com.
www.google.com.         255     IN      A       173.194.42.243
www.google.com.         255     IN      A       173.194.42.242
www.google.com.         255     IN      A       173.194.42.241
search.service.consul.  0       IN      CNAME   www.bing.com.
any.edge.bing.com.      375     IN      A       204.79.197.200

;; Query time: 133 msec
;; SERVER: 192.168.59.103#8600(192.168.59.103)
;; WHEN: Thu Jan 22 1728 2015
;; MSG SIZE  rcvd: 258

Running a Consul cluster

Ok, so we were able to run a single Consul agent in server mode and register an external service. But, as I mentioned before, this is usually a very bad idea for availability reasons. So lets see how we could run a cluster with 3 servers, all of them running locally on different Docker containers.

We’ll start the first node similarly to the way we did it before:

$ docker run --name node1 -h node1 progrium/consul -server -bootstrap-expect 3

==> WARNING: Expect Mode enabled, expecting 3 servers
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'node1'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 0.0.0.0 (HTTP: 8500, DNS: 53, RPC: 8400)
      Cluster Addr: 172.17.0.75 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

Note here that instead of passing the -bootstrap flag we are passing a -bootstrap-expect 3 flag, which tells Consul that it should wait until 3 servers join to actually start the cluster. In order to join the cluster a node only needs to know the location of 1 node that is already part of it. So to join the second node we will need the IP of the first one (the only node we know of so far). We can get this IP using docker inspect and looking for the IPAddress field. Or you can just export that to an environment variable with:

$ JOIN_IP="$(docker inspect -f '{{.NetworkSettings.IPAddress}}' node1)"

We can now start our 2 remaining servers and join them with the first one:

$ docker run -d --name node2 -h node2 progrium/consul -server -join $JOIN_IP
$ docker run -d --name node3 -h node3 progrium/consul -server -join $JOIN_IP

After doing that you should see something like this on the node1 logs:

2014/12/06 0854 [INFO] serf: EventMemberJoin: node2 172.17.0.76
2014/12/06 0854 [INFO] consul: adding server node2 (Addr: 172.17.0.76:8300) (DC: dc1)
2014/12/06 0858 [ERR] agent: failed to sync remote state: No cluster leader
2014/12/06 0815 [INFO] serf: EventMemberJoin: node3 172.17.0.77
2014/12/06 0815 [INFO] consul: adding server node3 (Addr: 172.17.0.77:8300) (DC: dc1)
2014/12/06 0815 [INFO] consul: Attempting bootstrap with nodes: [172.17.0.75:8300 172.17.0.76:8300 172.17.0.77:8300]
2014/12/06 0816 [WARN] raft: Heartbeat timeout reached, starting election
2014/12/06 0816 [INFO] raft: Node at 172.17.0.75:8300 [Candidate] entering Candidate state
2014/12/06 0816 [WARN] raft: Remote peer 172.17.0.77:8300 does not have local node 172.17.0.75:8300 as a peer
2014/12/06 0816 [INFO] raft: Election won. Tally: 2
2014/12/06 0816 [INFO] raft: Node at 172.17.0.75:8300 [Leader] entering Leader state
2014/12/06 0816 [INFO] consul: cluster leadership acquired
2014/12/06 0816 [INFO] raft: pipelining replication to peer 172.17.0.77:8300
2014/12/06 0816 [INFO] consul: New leader elected: node1
2014/12/06 0816 [WARN] raft: Remote peer 172.17.0.76:8300 does not have local node 172.17.0.75:8300 as a peer
2014/12/06 0816 [INFO] raft: pipelining replication to peer 172.17.0.76:8300
2014/12/06 0816 [INFO] consul: member 'node3' joined, marking health alive
2014/12/06 0816 [INFO] consul: member 'node1' joined, marking health alive
2014/12/06 0816 [INFO] consul: member 'node2' joined, marking health alive
2014/12/06 0818 [INFO] agent: Synced service 'consul'

Basically, after joining the second node Consul tells us that it can not yet start the cluster. But after joining the third node, it tries to bootstrap the cluster, elects a leader node and marks the 3 nodes as healthy.

So now we have our 3 servers cluster up and running. Note however, that we did not specify any port mapping information on any of the three nodes. This means that we would have no way of accessing the cluster from outside. Luckily this is not a problem because with our cluster running we can now join any number of nodes in client mode and interact with the cluster through those clients. Lets join the first client node with:

$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp \
--name node4 -h node4 progrium/consul -join $JOIN_IP

Note that we didn’t pass the -server parameter this time and we added the port mapping information. We can now interact with the cluster through our client node. We could, for instance, use the REST API to see all the nodes that are currently part of the cluster:

$ curl $DOCKER_IP:8500/v1/catalog/nodes

[
  {"Node":"node1","Address":"172.17.0.7"},
  {"Node":"node2","Address":"172.17.0.8"},
  {"Node":"node3","Address":"172.17.0.9"},
  {"Node":"node4","Address":"172.17.0.10"}
]

It is important to understand that we only need to know the address of 1 of the nodes (either server or client) to join. Until now we have used the JOIN_IP variable which contains the IP of node1 but we could just as easily add a new node using the IP of node4 for instance, which is a client:

$ docker run -d -p 8401:8400 -p 8501:8500 -p 8601:53/udp \
--name node5 -h node5 progrium/consul -join 172.17.0.10

Similarly, we can send our queries to any node in the cluster and the answer will be always the same thanks to Consul’s replication algorithms. Here we’ll use port 8501, which is the port exposed by the last client we joined:

$ curl $DOCKER_IP:8501/v1/catalog/nodes

[
  {"Node":"node1","Address":"172.17.0.7"},
  {"Node":"node2","Address":"172.17.0.8"},
  {"Node":"node3","Address":"172.17.0.9"},
  {"Node":"node4","Address":"172.17.0.10"},
  {"Node":"node5","Address":"172.17.0.11"}
]

This combined with the fact that we can have thousands of clients in the cluster without any performance impact makes Consul an extremely highly available service discovery solution.

Key/Value store

In addition to its service discovery and health check capabilities, Consul offers a key/value store for whatever you may need. We can easily access it through its REST API. We’ll keep using the 5 node cluster we got running before. First, lets make sure that there is nothing currently saved there:

$ curl -v  $DOCKER_IP:8500/v1/kv/key1

 About to connect() to 192.168.59.103 port 8500 (#0)
   Trying 192.168.59.103...
 Adding handle: conn: 0x7fa72b811a00
 Adding handle: send: 0
 Adding handle: recv: 0
 Curl_addHandleToPipeline: length: 1
 - Conn 0 (0x7fa72b811a00) send_pipe: 1, recv_pipe: 0
 Connected to 192.168.59.103 (192.168.59.103) port 8500 (#0)
> GET /v1/kv/key1 HTTP/1.1
> User-Agent: curl/7.30.0
> Host: 192.168.59.103:8500
> Accept: */*
>
< HTTP/1.1 404 Not Found
< X-Consul-Index: 50
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Tue, 20 Jan 2015 0607 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
 Connection #0 to host 192.168.59.103 left intact

We got back a 404 because the key doesn’t exist yet, great! Let’s now add a value for key1 and query again:

$ curl -X PUT -d 'test' http://$DOCKER_IP:8500/v1/kv/key1

$ curl -v  $DOCKER_IP:8500/v1/kv/key1

 About to connect() to 192.168.59.103 port 8500 (#0)
   Trying 192.168.59.103...
 Adding handle: conn: 0x7fb9a3817e00
 Adding handle: send: 0
 Adding handle: recv: 0
 Curl_addHandleToPipeline: length: 1
 - Conn 0 (0x7fb9a3817e00) send_pipe: 1, recv_pipe: 0
 Connected to 192.168.59.103 (192.168.59.103) port 8500 (#0)
> GET /v1/kv/key1 HTTP/1.1
> User-Agent: curl/7.30.0
> Host: 192.168.59.103:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 55
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Tue, 20 Jan 2015 0631 GMT
< Content-Length: 93
<
 Connection #0 to host 192.168.59.103 left intact
[{"CreateIndex":50,"ModifyIndex":55,"LockIndex":0,"Key":"key1","Flags":0,"Value":"dGVzdA=="}]%

Note that the Value field is base64 encoded. According to the documentation this is to allow non UTF-8 characters.

Before we saw that we could query any node in the cluster for registered services or a list of nodes and the answer would be the same. It’s no surprise that this also applies to the key/value store. We can add a key and query for one from any node. In our example, we could use curl -v $DOCKER_IP:8501/v1/kv/key1 (changing the port to 8501 to query a different node that the one we used on the PUT) and we would get exactly the same answer from Consul.

Conclusion

In the last post we saw an overview of Docker and its benefits. This is really easy to see when you consider a service running on a single container. But when you start to throw in hundreds or thousands of containers things start to get a bit more complicated. One of the first things you need is to know where each container lives and what services it offers. You also need some basic form of health-check to be sure that you don’t try to send requests to containers that are either not able to reply or are not there anymore. Consul, a highly scalable and efficient service discovery tool, solves these problems in a very elegant way. Of course, there are many other alternatives out there with different capabilities like etcd or SkyDns. I haven’t had a chance to play around with those yet so I don’t have an informed opinion about them.

One thing that we haven’t talked about yet is how would you go about registering your containers. By this I don’t mean the Consul-specific way of registering services but rather a more general question: who is responsible for doing this? Should the container know how to register itself with the cluster? Should the operator running the container do this? Someone else? All these approaches have pros and cons. In the next post I’ll discuss these options as well as showing a really amazing tool from Jeff Lindsay that makes it incredibly easy and transparent to deal with container registration.

Running Docker in AWS with Elastic Beanstalk

Sun, 07 Dec 2014 00:00:00 +0000

By now I would image that Docker needs no introduction, given that is one of the hottest technologies and indeed buzzwords in the industry today. But just in case, we’ll see the basics of it. We’ll also see how you can quickly run a Docker container in AWS and how you can easily deploy your changes to it.

Introduction

The official documentation defines Docker as “an open platform for developing, shipping, and running applications”. What does that really mean? In simple terms it means that instead of thinking about your application as only the code that you write and then somehow gets deployed into some server in the “cloud”, you can start thinking about your application and those things that it needs to run as a single isolated container that you can just throw at any server and it will work, regardless of what that server already had installed or not.

When you hear about isolation the first thing that probably comes to mind are Virtual Machines. The problem with VMs is that they are usually a bit heavyweight. Even on a pretty decent laptop it usually takes a few minutes for a VM to start. A Docker container, on the other hand, starts in the order of seconds. On a very simplified view of the world, you can see Docker containers as lightweight VMs (although in reality they are much much more than that). Each container can run its own OS, have its own files, run its own processes and so on. Also, unlike VMs where you can probably run just a few of them on a regular piece of hardware, you can easily run dozens of Docker containers on your laptop.

The benefits from using containers for your applications are varied and people are still finding new and innovative ways to put them to good use. Perhaps one of the main benefits is related to deployment and portability. You know the dreaded “It works in my machine” phrase, don’t you? Imagine the following workflow:

You develop and test in your local box running your application and all the services it requires on their own isolated environment.
Once you are happy with your code you push your changes and the same set of containers that you were running locally are now running in a testing environment.
Once you validate your changes in this testing environment you push the same container to be run in Production.

Because of the underlying principles of Docker you are guaranteed that regardless of the differences in environment between your local box, the testing environment and the production environment, your containers (and therefore your application) will run exactly in the same way.

What happens if you decide to move to a different provider for your production environment? Say you were running in AWS and suddenly all your company migrates to OpenStack. As long as the new server is able to run Docker containers, it doesn’t matter. Your containers will still run exactly in the same way as before.

Another huge benefit that I see in Docker is the fact that not only your code runs in a container but also the infrastructure that your code needs, usually in different containers. This, combined with the fact that the community has already created thousands of Docker images for all sort of popular applications (publicly available in Docker Hub) means that you can save yourself a lot of trouble. Say your application needs to use Redis as its backing store. Do you install Redis locally in your box to test while you develop and then make sure that the same version of Redis with the same configuration is installed in each and every new environment? Or do you get an official Redis image and run that same image on every environment with just one command?

A really simple Docker app

Due to the way that Docker works, it needs a Linux kernel to run on. This is obviously not a problem if you are actually using a Linux distribution but it can be an issue if you are on Windows or Mac. The official documentation covers Docker installation on a lot of different platforms. Personally, since I use a Mac mostly these days, I really recommend the boot2docker. It provides a very tiny VM (literally, is based in Tiny Core Linux) where you can run Docker from your Mac terminal (almost) as if you were running it locally.

Enough introductions, lets see an example. We are going to develop a really simple REST service and run it inside a Docker container. We are going to use Python with Flask for this, simply because it’s really easy to get up an running in no time but the language and framework of choice are not really important for this. We could have used Java with Dropwizard or Ruby with Sinatra and the result would be the same. If you don’t want to write the code you can just clone the app from here.

So our application will consist of one app.py file that looks like the following:

from flask import Flask
import os
import socket
app = Flask(__name__)

@app.route('/')
def hello():
    return 'Hello World from %s' % socket.gethostname()

if __name__ == "__main__":
    app.run(host="0.0.0.0", debug=True)

In order to get the dependencies we need for our app (just flask in this case) and comply with the 12 factor app we’ll also create a requirements.txt file with just one line:

flask

With these 2 files we can already run our application with:

$ pip install -r requirements.txt
$ python app.py

The application will start a server listening in port 5000. If you run curl localhost:5000 from a different shell you should see a hello world message as a response.

Lets now “dockerize” our application. The easiest way to this is to write a Dockerfile with the steps we want to take. I won’t go into a lot of details about Dockerfiles but you can read about them here. I’m rather going to show you the Dockerfile that we’ll use and describe what each step is doing:

FROM python:2.7
EXPOSE 5000
ADD . /code
WORKDIR /code
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

On the first line we are saying that our new image will use the python official image as a base image. The 2.7 part is called a TAG and in this case represents the version of python that we want installed in our image. Next we tell Docker that the container will expose port 5000 to the external world (which is the port where our endpoint will be listening). We’ll see how this is useful in a moment. The third instruction tells Docker that it needs to copy the current directory (where the Dockerfile is placed) and copy it to /code inside the new container. Then, in the following step we tell Docker that we want to cd to that /code directory so that all commands we run after that are executed from within that path. Next we run pip install to install our dependencies into the container and finally we tell Docker that the default command to run should be python app.py.

Now that we have our Dockerfile, we can build an image from it using the docker build command: docker build -t python_service .. This step can take a while the first time you run it because it will need to download the python base image first (which is currently around 850 MB, a bit too much if you ask me). When the command finishes you should be able to see your new shiny image after running docker images.

So far you have an image, but not a running container. To run this new image you’ll have to do a docker run --rm --name service1 -P python_service. The --rm parameter tells Docker to delete the container after it has stopped running, which is useful for cases where you are creating lots of different containers to do quick tests because it will save you quite a bit of disk space. Next, we give our container a name. This is completely optional and if we don’t specify a name then Docker will assign the container one by default. The -P parameter means that we want to map every port exposed by the container (in our case just port 5000) into a port in our host. By not specifying any specific port on the host Docker will randomly assign one. Another alternative, if we want to explicitly tell Docker which port to use would be to pass the parameter -p $HOST_PORT:$CONTAINER_PORT. But it is usually a good idea not to do that because we might want to run multiple instances of the same container and they all have to map to different ports. So its usually better to let Docker decide.

After running the docker run command the container will start and you’ll see that it will run our app which is going to be waiting for connections to it. If you now run docker ps on a different shell you’ll see something like this:

CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS              PORTS                     NAMES
86b331209845        python_service:latest   "python app.py"     2 days ago          Up 12 hours         0.0.0.0:49162->5000/tcp   python1

Pay special attention to the PORTS part. In your Dockerfile you told Docker that you were exposing port 5000. But remember that when you ran the container with docker run you specified the -P, which meant that the ports exposed by the container would be mapped to randomly high ports on the host. In our case, the 49162->5000/tcp part is telling us that port 49162 on the host will map to port 5000 on the container.

So how do we curl our service now? If we were running Docker natively on a Linux machine we could just do a curl localhost:49162. But since I’m running on boot2docker, I need to get the IP of the boot2docker VM first. This can be easily done with boot2docker ip. Unless you’ve done something to change the default values this IP will be 192.168.59.103. So if we now run curl 192.168.59.103:49162 we get back our awesome hello world message from our running container.

Running manually in AWS

By now you have an independent Docker image with your python service which you can easily run and query with a simple curl. But so far we have only run the container locally, which doesn’t seem like a big improvement over just running the application directly. And indeed it wouldn’t make a lot of sense to do all this work if we are only going to run things locally. So lets see what happens when we want to run the same container somewhere else, like AWS for instance given that they give us a free year with their free tier. This AWS server could be a QA environment for instance.

Before we go into AWS we are going to use the Docker registry to push the image we created before, which is based off the python image and has our app and dependencies all bundled together. The documentation is pretty clear about how you work with the registry so I’ll skip that part (hint: you should use the docker push command).

Now that our image is on the public Docker registry, lets create a micro instance on EC2. Create a new account if you don’t have one yet, go to your EC2 dashboard, instances and click on Launch instance. Choose the Amazon Linux AMI that shows up on the first step, select t2.micro for the instance type, leave all the default options on the next steps until Step 6 where you’ll create a security group for your instance specifying which ports you are going to leave open to the internet. In our case we want port 22 to be able to ssh into our instance. We also want to add a new custom rule to open the high ports that Docker usually uses. So we’ll create a new TCP rule with Port Range between 30000-50000. Finally review your instance and Launch!

Once the instance is running, ssh into it using its public IP, install Docker with sudo yum install -y docker and start it with sudo service docker start. Now that we have Docker installed in our new instance we can use it in exactly the same way that we did locally. Since we pushed our image into the Docker registry before, we can do a sudo docker pull your_user/repo_name and then run our container similarly to how we did before with sudo docker run --rm --name service1 -p 45000:5000 your_image_name. Our container is now running and the host (the EC2 instance in this case) will map its port 45000 to the container’s port 5000 thanks to the -p argument. You can now go to your local shell, run curl your_ec2_public_ip:45000 and… get a “Hello World” message back.

Lets take a moment to think about what just happened. We first created a Docker image and ran it locally to test our app. We then pushed that image into the public registry, pulled it from a freshly created EC2 instance and ran it in exactly the same way. Since all the dependencies were already included in the image, we didn’t need to do anything on the server apart from installing Docker. Same application, same Docker image, same version, same behavior and two completely different environments. What would happen now if we wanted to run the application on a different server? Maybe an OpenStack server, or DigitalOcean or an old laptop at home that is not doing anything else. It would be exactly the same! The only thing that we need in every case is a Linux kernel and the Docker daemon running.

This is certainly great progress but what would happen if we want to make some changes to our code and then redeploy the container again? We would have to push the image again into the public repository, ssh into our instance, stop the running container, pull the new image and finally run it again. Wouldn’t it be great if Amazon could handle all of this for us? Meet Elastic Beanstalk!

Using Elastic Beanstalk

Elastic Beanstalk is a service provided by Amazon to quickly deploy and manage your applications. You don’t need to worry about creating instances or things like load balancing and auto scaling. It can be used to run Web Applications in a variety of languages and, of course, Docker containers.

We are going to use Elastic Beanstalk to run our app and see how we can deploy new versions of it. We’ll first need the eb tool, which can be installed using brew with brew install aws-elasticbeanstalk. Next we’ll need to initialize a git repository with the app code (git init .) and commit what we have so far (git commit -a -m "Initial commit"). Now we can run eb init to configure Elastic Beanstalk. This will be a one time process that will tell EB how to deploy our application. The steps are pretty self-explanatory. Just make sure that you select the “64bit Amazon Linux 2014.09 v1.0.10 running Docker” when you are asked for the solution stack.

Once that is done you can run eb start. It will ask you if you want to deploy the last commit of your app, to which you’ll respond yes. You can then follow the logs shown on the console to get an idea of what eb is doing. This will include creating a S3 bucket, creating an Elastic IP address, a security group for your instance and finally launching your instance. In the end, you will see a message indicating that your application was deployed and is ready to be accessed. Something like: “Application is available at …” an some URL. Go to that URL and see the glorious Hello World message once again.

Now, let change the message on our app to say something different. Edit the app.py file and change the return line for return 'Hello Docker World from %s' % socket.gethostname(). Save and commit your changes to git. Now we’ll use eb to deploy our app once again. Run eb push and Elastic Beanstalk will deploy the new version of your app. When that finishes you go back to the same URL as before and see the updated message.

Interestingly, if you go to your AWS Dashboard and then to Elastic Beanstalk you will be able to see, among other things, all the different versions of your app that you have ever deployed so rollbacks are trivial. And this is all thanks to the fact that EB created a new Docker image every time you deployed a new version. When you are done you can do a eb delete to clean up all the resources that were created.

And that is all there is to it. Your changes get easily deployed with just one command and, as an added bonus, you get to keep a record of all the different versions of your app for easy rollback.

Conclusion

Docker is revolutionizing the way we think about development and deployment. Specially at a time were loosely coupled and small services are the way more and more applications are getting architected. This is still a pretty new area with lots of exciting tools under heavy development and increasing support from big players in the industry. Amazon, for instance, announced its EC2 Container Service in the last re:invent conference to offer easy support for Docker containers, treating them as first class citizens inside the AWS ecosystem. Similarly, in the last DockerCon Europe 2014 they announced, among other things, support for Docker hub in the enterprise and a set of Alpha tools to support easy Docker host provisioning, clustering and orchestration. As well as Docker, other container runtime technologies are making their way to the scene. Where all these technologies and tools will end no one knows but one thing is for sure, containers are here to stay and they will have a bigger and bigger impact in the future. So start playing around with them!

Lambdas and Functional interfaces in Java 8

Tue, 18 Nov 2014 00:00:00 +0000

In the previous post we saw an overview of what functional programming is and how the new features of Java 8 allow developers to write their applications using a more functional style. One of the main points in this new version of the language was the introduction of lambdas. Together with lambdas came the use of functional interfaces and methods references. This post will explore these features in more detail, showing when to use them, the restrictions around them and how you can use them to make your code more readable and concise.

Lambdas

First things first, what is a lambda (or lambda expression)? A lambda is an anonymous method that doesn’t have a name but it has a list of parameters, a body, a return type and potentially a list of exception that the lambda can throw. Unlike regular class methods, lambdas are not actually associated with any class. They can also be assigned to variables or passed as arguments to other methods. The name lambda expression comes from the field of mathematics.

We saw an example of a lambda expression in the previous post, using an example from the File class to list csv files:

File[] csvFiles = new File(".")
                    .listFiles(pathname -> pathname.getAbsolutePath().endsWith("csv"));

Here, we are passing a lambda expression to the listFiles method that takes one input parameter and returns a boolean value. I also mentioned that you can assign lambdas to variables, so the previous code is functionally equivalent to:

FileFilter csvFilter = pathname -> pathname.getAbsolutePath().endsWith("csv");
File[] csvFiles = new File(".").listFiles(csvFilter);

How did we use to do that before Java 8? Like this:

File[] csvFiles = new File(".").listFiles(new FileFilter() {
    @Override
    public boolean accept(File pathname) {
      return pathname.getAbsolutePath().endsWith("csv");
    }
});

You have to admit that the snippet using a lambda expression looks much more concise and cleaner. In the last snippet we have to create an anonymous class with an accept method (with all the verbosity that it implies). In the first one, we just need to specify our logic.

This brings an interesting question, if the listFiles method takes a parameter of FileFilter type (which is an interface), how come we can pass a lambda instead? We can do this because the FileFilter interface is a functional interface.

Functional interfaces

In a nutshell, a functional interface is an interface that specifies exactly one abstract method. So the FileFilter interface we saw before is specified as:

@FunctionalInterface
public interface FileFilter {
  boolean accept(File pathname);
}

Another example, is the Runnable interface:

@FunctionalInterface
public interface Runnable {
  public abstract void run();
}

You can see that both of these interfaces have a @FunctionalInterface annotation. So what does that do? First it informs people who look at that interface that it is intended to be a functional interface and that they can use lambdas and method references wherever they are expected. Second, it works as a compile-time check to make sure that the interface is indeed functional. If you add this annotation to your interface and it is not in fact functional then you will get a nice compile error letting you know this. It is worth noting that the annotation is not actually required but it is usually a good idea to have it there for the reasons I mentioned before.

There’s one small caveat here. We briefly discussed default methods in the previous post, which are methods whose implementation code can be written in an interface. Default methods do not count for the “exactly one abstract method” rule of functional interfaces so you can effectively have a functional interface with one abstract method and one or more default methods.

Some useful functional interfaces

Now that we know what a functional interface is and how it can be used, lets look at some pretty useful interfaces provided by Java in its java.function.util package

Predicate

The predicate interface defines a simple test method that takes an object and returns a boolean. It looks something like:

@FunctionalInterface
public interface Predicate<T> {
    boolean test(T t);
}

This is pretty useful for things like filtering. You could have a generic method to filter a list (this is just an example, you don’t need to write this logic and we’ll see why when we go into streams):

public static <T> List<T> filter(List<T> list, Predicate<T> predicate) {
  List<T> result = new ArrayList<>();
  for (T elem : list) {
    if (predicate.test(elem)) {
      result.add(elem);
    }
  }
  return result;
}

And then create a predicate for your particular object. For example, a predicate that given a User returns true if his age is greater than or equal to 18:

public enum Sex {
  MALE, FEMALE
}

public class User {
  private final int age;
  private final String name;
  private final Sex sex;

  public User(int age, String name, Sex sex) {
    this.age = age;
    this.name = name;
    this.sex = sex;
  }

  public int getAge() {
    return age;
  }

  public boolean isMale() {
    return MALE.equals(sex);
  }
}

Predicate<User> predicate = user -> user.getAge() >= 18;

A pretty useful functionality about predicates is that they can be composed together to form more complex ones. For instance, what do you do if suddenly you want a new User predicate that returns true for all users who are less than 18? Do you create a new predicate like the previous one but changing the >= by <? Luckily, you don’t have to because the Predicate interface provides 3 methods to compose several predicates: and, or and negate. So the previous example could be written as:

Predicate<User> older = user -> user.getAge() >= 18;
Predicate<User> younger = older.negate();

Similarly, if we want a predicate that returns true for all the male users older or equal to 18, we could write it as:

Predicate<User> older = user -> user.getAge() >= 18;
Predicate<User> adultMales = older.and(User::isMale);

That last example shows that we can use method references where a Predicate is expected. In fact, we can use a method reference wherever a functional interface is expected. We quickly saw method references in the previous post but we’ll discuss more about them later on.

Function

The java.util.function.Function interface is defined as:

@FunctionalInterface
public interface Function<T, R> {
  R apply(T t);
}

What this basically does is take an input of type T and transform it somehow to return an object of type R. Note that the Predicate interface can be seen as a special case of a Function where R is always a boolean value. Following our User examples, imagine we want a function that given an User instance it returns that user’s name length. We could write this function like this:

Function<User,Integer> nameLength = user -> user.getName().length();

Like predicates, the Function interface also has some useful methods to compose several functions. The two methods offered are compose and andThen. The difference between them is subtle but important. To understand this better, imagine we have the following 2 functions:

Function<Integer,Integer> sumOne = number -> number + 1;
Function<Integer,Integer> duplicate = number -> number * 2;

We can then create 2 new functions in the following way:

Function<Integer, Integer> composed = sumOne.compose(duplicate);
Function<Integer, Integer> andThen = sumOne.andThen(duplicate);

System.out.println(composed.apply(2));
System.out.println(andThen.apply(2));

The composed function will first apply duplicate and then apply sumOne on the result. In other words, composing sumOne with duplicate will result in sumOne(duplicate(x)) and the first System.out will print 5. The andThen function will do exactly the opposite, it will first apply sumOne and then apply duplicate on the result. In this case the second System.out will print 6.

Consumer

The java.util.function.Consumer interface defines an accept method that takes a paramter of type T and returns no value. In other words:

@FunctionalInterface
public interface Consumer<T> {
    void accept(T t);
}

This interface is useful when you want to access an element and perform some operation on it. For instance, starting with Java 8, lists have a forEach method where you can pass a Consumer and this function will be applied to each element on the list.

So imagine that you want to print to System.out each element on a list. You could do that in the following way:

List<String> users = Arrays.asList("java","8","rocks");
users.forEach(elem -> System.out.println(elem));

The implementation of the forEach method is actually quite straightforward:

void forEach(Consumer super T> action) {
  for (T t : this) {
    action.accept(t);
  }
}

Primitive functional interfaces

We saw a couple of generic, quite useful functional interfaces provided by the language: Predicate, Function and Consumer. This is great for most cases where you want to use this interfaces for your own classes. But what happens when you need something like this for primitive types: int, double or boolean for instance?

In Java, each primitive type has a corresponding wrapper class. So int has an Integer class and boolean has a Boolean. Additionally, Java can handle conversions between these types for you automatically. This concept, known as autoboxing/unboxing is what allows you to write code like this:

List<Integer> numbers = new ArrayList<>();
for (int i = 0; i < 10; i++) {
  numbers.add(i);
}

This lets the developer write less code because he doesn’t need to worry about explicitly converting one type to the other. However, there is a performance impact involved. Is probably not a big deal if you do it occasionally here and there but when you are doing a boxing or unboxing on every iteration in a big list you will see a difference.

Going back to our functional interfaces, say you want to define a predicate that takes an int and returns a boolean telling us whether the number is odd or not. You can not define a Predicate because int is not a class but you could do something like this:

Predicate<Integer> isOdd = i -> i % 2 == 1;
isOdd.test(15);

What happens when you call this predicate with an int is that this parameter gets autoboxed into an Integer. Again, this might not really be an issue if you are not using this Predicate in critical areas of your application or inside big loops.

If you don’t want your parameters boxed automatically for you and want to really use primitive types instead, Java 8 provides primitive specializations of its functional interfaces. In our example, we could use the IntPredicate interface, whose accept method only takes int parameters:

@FunctionalInterface
public interface IntPredicate {
  boolean test(int value);
}

Therefore, our previous example could be rewritten as:

IntPredicate isOdd = i -> i % 2 == 1;
isOdd.test(15);

Now, the parameter to the test method is treated as a primitive int all the way avoiding boxing and unboxing operations.

This primitive specializations extend to other types with similar names. So you are going to find DoublePredicate, IntFunction, LongConsumer and so on.

Method references

Lambda expressions are undoubtedly a great construct to make your code more compact. However, some times all you do in your lambda is to call an individual method potentially passing some parameter to it. In these cases you can often replace your lambda expression by a method reference.

Method references are compact ways to create lambda expressions for methods that already have a name. For instance, in the previous section we saw an example of the forEach method:

List<String> users = Arrays.asList("java","8","rocks");
users.forEach(elem -> System.out.println(elem));

Here, our lambda expression is only calling the System.out.println method. Therefore, we could rewrite it like this:

List<String> users = Arrays.asList("java","8","rocks");
users.forEach(System.out::println);

Conclusion

Lambdas are one of the main additions to Java 8. And while you can still write code the way you used to do it before (using anonymous classes) chances are that you will start to see more and more lambdas going around other people code. So you should at least know they exist and how they can be used effectively.

Functional interfaces are not a small addition to the language but the fact that you can use a lambda expression or method reference every time you expect an interface is a huge deal. Is not only the fact that you remove a lot of boilerplate code but also that by doing that you are actually making your code easier to read and maintain. Having this concept applied to a lot of the existing language interfaces will also help a lot.

Take advantage of the interfaces defined for you in java.util.function. They are abstractions that come up quite frequently in practice and are very powerful given the way you can combine them. If you need to use them for primitive types like int or double remember that you have the option to use primitive specializations of these interfaces to avoid the performance cost of autoboxing.