One year monitoring a small startup with Prometheus

I was invited to the first DevOps Gathering edition to talk about the lessons learned after one year of monitoring ShuttleCloud with Prometheus.

Video of the presentation:

Using Ansible at ShuttleCloud

[this is a post I wrote for ShuttleCloud’s blog]

It’s no secret that ShuttleCloud uses Ansible for managing its infrastructure, thus we think it would be useful to explain on how we use it.

Infrastructure

At ShuttleCloud, we take automation as a first order component of Software Engineering. We aim to have everything automated in production: application deployment, monitoring, provisioning, etc. Ansible is a key component in this process.

Our infrastructure is almost entirely in Amazon Web Services (AWS), although we are in the process of moving to Google Compute Engine (GCE). The stack is distributed across several regions in US and Europe. Due to the fact that we have SLAs up to 99.7%, we need to have every component, service and database operating in a High Availability manner.

We maintain a dedicated private network (VPC) for most of our clients (GMail, Contacts+, Comcast, etc). This adds complexity because we have to handle more separated infrastructure domains in the US regions in AWS. If we were able to share VPCs between clients, things would be certainly easier.

The migration platform is designed using a microservices architecture. We have a few of them:

  • autheps: Takes care of requests and refreshes OAuth2 tokens.
  • capabilities: Registry of supported providers and operations.
  • migration-api: API exposing our migration service.
  • migration-robot: Takes care of moving data between providers.
  • security-locker: Storage of sensitive data.
  • stats: Gathering and dispatching of KPI data.

These microservices are mostly written in Python, but there is some Ruby too.

The software stack is very diverse. Between development and operations we have to manage:

apache, celery, corosync, couchdb, django, dnsmasq, haproxy, mysql, nginx, openvpn, pacemaker, postgresql, prometheus, rabbitmq, selenium, strongswan.

In total, we manage more than 200 instances, all of them powered by Ubuntu LTS.

Naming Hosts and Dynamic Inventory

Naming Hosts

All our instances have a human friendly, fully qualified domain name; a string like auth01.gmail.aws that is reachable from inside the internal network. These names are automatically generated based on tags associated to corresponding instances.

Dynamic Inventory

Although Ansible provides dynamic inventories for both ec2 and gce, we developed our own in order to have more freedom for grouping hosts. This has proved to be very useful, especially during the migration from AWS to GCE, when you can have machines from both clouds inside the same group.

This custom inventory script also allows us to include a host in multiple groups by specifying a comma separated value in a tag (we named it _ansiblegroups). This seems to be a popular requested feature (1, 2, 3).

The script also automatically creates some dynamics groups that are useful for specifying group_vars.

We use two main dimensions for selecting hosts: project and role. Project can have values like gmail, comcast or twc, while role‘s values can be haproxy or couchdb. The inventory script takes care of generating composite groups in the form gmail_haproxy or twc_couchdb, thus we have the freedom of targeting haproxies in gmail like:

$ ansible gmail_haproxy -a 'cat /proc/loadavg'

or setting variables in any of the following group vars files:

group_vars/gmail.yml
group_vars/gmail_haproxy.yml
group_vars/haproxy.yml

Limitations of Built-in EC2 Inventory

If the composite groups didn’t exist, selection can still be achieved by intersecting groups with the & operator:

$ ansible --limit 'gmail:&haproxy' all -a 'cat /proc/loadavg’

or, with the built-in ec2.py (assumming that a Role tag exists and project gmail corresponding to the VPC vpc-badbeeff):

$ ansible -i ec2.py --limit 'vpc_id_vpc-badbeeff:&tag_Role_haproxy'

but in contrast there is not an evident alternative to settings variables for hosts in both groups vpc_id_vpc-badbeeff and tag_Role_haproxy.

Playbooks and Roles

We use roles for everything and separate them between provision roles (for example couchdb) and deployment roles (for example migration-api).

  • Provisioning roles take care of leaving the machine ready to run the required application.
  • Deployment roles take care of deployment and rollback actions.

Dependencies between Roles

Right now, deployment roles have an explicit dependency on provisioning roles. For example, role migration-api depends on role django-server and role django-server depends on roles apache and django.

This model is useful because you can apply migration-api to a raw instance and it will be adequately provisioned. However, it has the drawback that once provisioned, it might be a waste of time running the provisioning role each time you want to deploy a new version.

Using Tags

Tags are being slowly added to the tree. It’s better to not abuse them and keep a good organization, otherwise you might end up forgetting which tags to select and when.

Custom Modules

We have developed some custom modules, in order to manage couchdb users and replications (there is a PR submitted for couchdb_user).

CouchDB exposes a REST API, giving us several options for managing users:

  • use the command module combined with curl calls.
  • use the uri module.
  • write a custom module.

Writing a custom module can be daunting at first, but it pays off. It’s not comparable to the flexibility that Python offers when looking at leveraging command or shell modules.

When it’s not trivial to guarantee idempotency with current modules, a custom one is the way to go.

Managing Secrets

We use git-crypt for managing secrets. It’s a general purpose tool for selectively encrypting parts of a git repo. We have been using it long before ansible-vault become popular, and it is working well.

With git-crypt, all sensitive data is kept inside a top level directory call secrets/, and with proper configuration, you tell git-crypt to encrypt everything inside it. By combining the magic variable inventory_dir variable and password lookup you can express:

lookup(‘password’, inventory_dir + ‘/secrets/’ + project + ‘/haproxy-admin.key’)

Script common Invocations

With time, your Ansible tree gains new features, modules and playbooks. Your inventory grows both in machines and group names. It becomes more difficult to memorize the correct parameters you have to pass to ansible-playbook, especially if you don’t run it everyday. In some cases it may be necessary to run more that one playbook to achieve some operational goal.

A simple solution to this is to store the appropriate parameters inside scripts that will call ansible-playbook or ansible for you. For example:

$ cat bin/stop-gmail-migration-robots
#!/bin/sh
ansible-playbook --limit gmail -t stop email-robots.txt

Yes, it is as useful as it is simple.

Using callback plugins

One handy feature of Ansible is its callback plugins. You just drop a python file in the ./callback_plugins directory containing a class called CallbackModule and the magic happens.

In that class, you can add code Ansible to run whenever some events take place. Examples of these events are:

  • playbook_on_stats: Called before printing the RECAP (source).
  • runner_on_failed: Called when a task failed (source).
  • runner_on_ok: Called when a task succeeded (source).

You can find more of them by looking at the examples or searching for call_callback_module in the source code.

We have created our own callbacks plugins for logging and enforcing usage policy.

Logging

Javier had the great idea of using callback plugins to log playbook executions and set up everything needed for make it work.

Each time a playbook is run, a plugin gathers relevant data such as:

  • which playbook was run.
  • against which hosts.
  • who ran it.
  • from where it was run.
  • what was the final result.
  • which revision of Ansible tree was active.
  • optional message (requested on terminal).
  • and sends it to a REST service. The service takes care of writing everything into a database which you can query later for things like who deployed migration-api in gmail project between last tuesday and wednesday.

Enforcing Usage Policy

Another useful usage of the callback plugins is the ability to enforce that Ansible is run using an updated repo. Before starting a playbook, a git fetch can be run behind the scenes and the current branch is validated against a list of rules.

You can use this as you wish, for example, to enforce that HEAD be on top of origin/master and the tree is clean.

Conclusions

Overall we are very satisfied with the tool. Everybody on the team is using it on a daily basis, and both Dev and Ops people commit stuff to the same repo.

We would like to further investigate:

  • Splitting the tree in two repos: provisioning and deployment. Using the provisioning repo to generate images with the help of Packer of and let the deployment stuff just assume that everything is in place.
  • Using security group names to generate inventory groups (like ec2.py does). This will improve compatibility between AWS and GCE instances as the later only has one set of values (Tags) for both tagging and firewalling instances.
  • Considering vault for storing sensitive data.
  • Replacing scripts for common invocations with python scripts using Ansible as a library.

Autorestoring previous pane zoom in tmux

Excerpt from a message I sent to tmux-users mailing list:

I just began to use the recently added zoom feature and came across an anoying thing.

You have panes A and B. A is zoomed, if you switch to pane B, then A is forcibly unzoomed. This is expected, of course, otherwise B would remain invisible.

If you switch back to pane A I expect to get A zoomed again because it wasn’t me who unzoomed it.

I submitted a patch but upstream disliked it. I admit that a secretly zoomed pane somehow violates the rule of least surprise.

See it in action:

To left or not to left GNOME

I’m just another user who loves GNOME and suffers the blessing of its developers.

I’ve had the Close Button in the left since it was set by default in Ubuntu 10.04 and I liked it, and the button stayed in the left until GNOME 3.8.

GNOME 3.10 introduced Client Side Decoration (CSD), ie, now the application can paint the window border and buttons. Quoting from the linked site above:

A disadvantage of CSD is the inconsistency that brings between Apps that support them (mostly GNOME Apps) and Apps that don’t (3rd party Apps, like Firefox). However this is mostly in theory, because in practice, you won’t really be bothered from it.

A new widget called GtkHeaderBar was added in the process, and it was decided that the GtkHeaderBar will forcibly put the Close Button in the right, and then bug 706708 was filled, of course.

A fix was commited a month after the bug was filled and it entered in GTK+ 3.10.3. Now I can set the placement of the Close Button again, so let’s create a ~/.config/gtk-3.0/gtk.css with the following content:

GtkWindow {
  -GtkWindow-decoration-button-layout: "close:";
}

and see what happens.

Clocks

Gnome Clocks

this is gnome-clocks, honoring the setting.

Nautilus

Nautilus

and this is nautilus, not honoring the setting. It turned out that it isn’t using a GtkHeaderBar after all. You can see in the source code that a separator an a close button are manually added to the end of the top bar.

Gnome Tweak Tool

Nautilus

and this is gnome-tweak-tool, honoring the setting, only that it has two GtkHeaderBar. The one to the left is not displaying the Close Button.

Inconsistency they said?

Looking up encrypted passwords in ansible

Ansible 1.2 is out of the door. Go and check the changelog to see how many new features and fixes this version brings, my favorites being the new {{ }} syntax for variable substitution and support for roles. This version also includes a patch I submitted for adding encryption support to password lookup plugin.

In case you weren’t aware, ansible 1.1 gained support for generating random passwords as a lookup plugin. A useful trick that allowed to generate a random password in any point of a playbook without losing idempotence. An example of its use (taken from official docs):

---
- hosts: all
  tasks:
    # create a mysql user with a random password
    - mysql_user: name={{ client }}
                  password="{{ lookup('password', 'credentials/' + client + '/' + tier + '/' + role + '/mysqlpassword length=15') }}"
                  priv={{ client }}_{{ tier }}_{{ role }}.*:ALL

but there are some modules, most notably user, that expect an encrypted password. For such modules the password lookup was unusable because it always returned plaintext.

With ansible 1.2 you can pass the encrypt parameter to password lookup to get an encrypted password instead of a plain one. In this mode the salt will be save along the password itself to ensure the same hash is returned each time a lookup is requested. An example:

---
- hosts: all
  tasks:
    # create an user with a random password
    - user: name=guestuser
            uid=5000
            password={{ item }}
      with_password: credentials/{{ hostname }}/userpassword encrypt=sha256_crypt

I expect the main use case for this feature is feeding user module for defining users. If this is the case, you should use one of the standard unix schemes of passlib, I personally recommend sha256_crypt or sha512_crypt.

A trivial contribution to linux (the kernel)

Old and trivial, just fixing a build error, but anyway it made me proud at the time.

Years ago I acquired a TV-Tuner. When I bought it I was aware that both is wasn’t working in Linux and a driver was being written for it.

Having not enough knowledge and free time for anything besides testing the driver, and needing the device working, I began to pull, compile and see-if-it-works in a daily basis. One day the building was failing, I fixed it and submitted the patch and the patch got merged by the maintainer. Two years later it entered mainline.

At the end I couldn’t get the device fully working and gave it to a friend who had a supported OS so all I’ve left from that tuner was another fun experience of contributing to free software.

Python gets a new ignored context manager

I was reading What makes Python Awesome? presentation and saw the following construction in slide 22:

with ignore(OSError):
    os.remove(somefile)

This construction is more concise without being less readable than the typical try ... catch ... pass. I had never seen that ignore before and got curious about where is it defined. It isn’t a CPython keyword neither part of the contextlib module.

After some search I found that an ignored (note the trailing d) context manager was recently added to the upcoming python 3.4.

Managing prebuilt OS images with Ansible

Prebuilt OS images

Prebuilt OS images are usually available for virtualization environments, for example see lists for OpenVZ, Vagrant, EC2, VirtualBox (also here) or Proxmox. As you can guess, some machinery is needed for building and maintaining them all. There is veewee for Vagrant, template creation guides for OpenVZ, dab for Proxmox, and so on.

The chroot connection

Ansible got support recently for executing tasks chrooted inside a local directory. The implementation was straightforward given Ansible agentless nature. chroot was added as a new connection type and that had two nice side effects:

  1. chroot directories can be added to the inventory like any other hostname.
  2. any existing playbook will potentially run chrooted by just setting the chroot connection type.

     - include: tasks/foo.yml ansible_connection=chroot
    

Putting things together

So you can now bootstrap your distro as usual, run a playbook chrooted to it and them archive the directory. That’s exactly what this example playbooks do.

$ git clone https://github.com/mmoya/ansible-playbooks.git
$ cd ansible-playbooks/image-creation
$ sudo ansible-playbook image-creation-stage1.yml
$ sudo ansible-playbook image-creation-stage2.yml

when the cow stops mooing you will have these tar.gz in /var/tmp/built-images:

  • squeeze-raw-amd64.tar.gz: this is a barely Debian bootstrap.
  • debian-6.0-64.tar.gz: this is just squeeze-raw-amd64.tar.gz plus kernel, openssh-server and other customizations.
  • 32bits versions of the above.

as you can guess from the inventory, it’s trivial to add another debootstrap’able distribution.

[images-stage1]
/var/tmp/squeeze-raw-amd64          suite=squeeze       arch=amd64
/var/tmp/squeeze-raw-i386           suite=squeeze       arch=i386

[images-stage2]
/var/tmp/debian-6.0-64              baseimage=squeeze-raw-amd64.tar.gz
/var/tmp/debian-6.0-32              baseimage=squeeze-raw-i386.tar.gz

Two stages

I’ve identified two image categories:

  1. bare image: distro bootstrap with zero or minimal configuration.
  2. custom image: bare image with needed customization.

think of bare images as one per distro (squeeze, wheezy, precise, fedora18, etc…) and custom images as the ones typically published (debian-6.0-lamp-server, precise-nagios-server, fedora18-jboss, etc…). One bare image is used as base for building multiple custom images. Each stage builts one said category, this way it’s easier to cache the distro bootstrap for later reuse.

Missing stuff

I’m missing two things from the example:

  1. Make the playbook work with non Debian distros (febootstrap vs. deboostrap, file location differences, etc…)
  2. Specify extra software to install, ideally in the inventory (see 2290). Thinking in something like:

    [images-stage2]
    /var/tmp/debian-6.0-drupal baseimage=squeeze-raw-amd64.tar.gz install=lamp,drupal
    

Why bother?

There are some advantages of using Ansible instead of a custom script:

  • You can reuse with little or zero effort existing playbooks.
  • You can separate data from code making things easier to audit.
  • … and the others things from Ansible you get for free:
    • a powerful templating system (more flexible customization of files inside image).
    • parallel execution (build images faster).
    • idempotent changes (run again and again from any point in the process).
    • more and more modules available.
  • Integrate image maintenance in your workflow if you’re already using Ansible for managing your servers.

Installing ruby/rbenv with Ansible

Ansible is a relatively new kid in the town of Configuration Management. If you don’t know it already go see this introductory video: System Provisioning with Ansible.

Having a Puppet background, I got impressed by how state is defined in Ansible with simple YAML files without necessarily sacrificing powerfulness. I actually find very pleasant both reading and writing playbooks (and that is not to mention the cow). Using the current ssh infrastructure is a bonus point, you can instantly get going.

I need to install ruby+rbenv in some Debian servers so I wrote a playbook for it. It’s basicly a translation of this script.

Playbook for installing ruby/rbenv

Note: Building ruby takes its time, be patient.

Xen guests isolation and ebtables concurrency

Concurrency is evil

When using bridging for Xen Networking and your guests machines (domUs in Xen parlance) are fully managed by third parties, some sort of isolation is specially needed. A rogue admin can change the IP and/or MAC address(es) assigned to its domU and potentially cause an IP address conflict.

Xen provides an script called vif-bridge that takes care of adding domU’s virtual interfaces to dom0’s bridge, bring them up and add iptables rules allowing datagrams whose source is one of the assigned IP address(es) coming in through domU’s virtual interfaces.

Those iptables rules might be not enough. They don’t enforce usage of the assigned MAC addresses and could interfere with current deployed firewall. Another point, in my opinion, is that these addresses policies belong to Link Layer (bridge decision) instead of Network Layer (see PacketFlow), so I prefer to have them enforced with ebtables.

I picked then one of the existing scripts of vif-bridge with ebtables and adapted it to only allow flow of assigned IP/MAC pairs and ARP requests/replies.

After deploying the adapted vif-bridge, domU creation began to fail randomly.  Some debug code added at the beginning of the script threw some bizarre errors:

+ ebtables -F veth2250_IN
ebtables v2.0.9-2:communication.c:388:--BUG--:
Couldn't update kernel counters
++ sigerr

+ ebtables -N veth639a_IN
+ ebtables -P veth639a_IN DROP
Chain 'veth639a_IN' doesn't exist.
++ sigerr

+ ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.143.100 -j ACCEPT
+ ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.144.100 -j ACCEPT
The kernel doesn't support a certain ebtables extension, consider recompiling your kernel or insmod the extension.
++ sigerr

As you can see those ebtables errors are triggered by correct trivial calls. To make it worse, chain, interface and rule names varied from one error to other. Looking some help for “Couldn’t update kernel counters” or “communication.c:388:–BUG–:” didn’t help at all.

While debugging, I learned that an instance of vif-bridge is run by Xen for each defined network interface and they all are run in parallel. All my domU have two virtual network interfaces defined.

At that point I had no clue about the problem’s cause. I decided to upgrade ebtables to discard those “make sure you’re running the last version” support advises (squeeze’s version is 2.0.9.2, upstream is 2.0.10). With the new version I began to see this new error in logs:

+ ebtables -A FORWARD -o veth2450 -p ip4 -d 00:16:3d:1c:26:4a --ip-dst 10.49.216.50 -j ACCEPT
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
   userspace tool doesn't by default support multiple ebtables programs running
   concurrently. The ebtables option --concurrent or a tool like flock can be
   used to support concurrent scripts that update the ebtables kernel tables.
2. The kernel doesn't support a certain ebtables extension, consider
   recompiling your kernel or insmod the extension.

After reading this I did immediately understand what was happening. That error description couldn’t be more clear and I thank upstream author for it. I never considered any concurrency problem in ebtables, not even after seeing random illogical errors generated by trivial rules.

--concurrent is available in 2.0.10 so I took the flock way, the fixed script is here.

Later I found the problem description in ebtables’ basic examples page:

Updating the ebtables kernel tables is a two-phase process. First, the userspace program sends the new table to the kernel and receives the packet counters for the rules in the old table. In a second phase, the userspace program uses these counter values to determine the initial counter values of the new table, which is already active in the kernel. These values are sent to the kernel which adds these values to the kernel’s counter values. Due to this two-phase process, it is possible to confuse the ebtables userspace tool when more than one instance is run concurrently. Note that even in a one-phase process it would be possible to confuse the tool.

It might be very difficult to reproduce the errors shown above if you don’t have more than one network interface in your domUs and your vif-bridge script have more than a few ebtables rules.

Summarizing. If you:

  • are calling ebtables from your Xen scripts.
  • have an ebtables prior to 2.0.10 (as the one in Debian squeeze or Ubuntu precise).
  • are facing seemingly random ebtables errors.
  • are not being helped by logs or $SEARCHENGINE.

high chances are that your scripts are running ebtables concurrently. Just fix them.

es_CU locale landing in GNU libc

Just reading GNU libc 2.15 release announcement and found this nice surprise:

New locales: bho_IN, unm_US, es_CU, ta_LK

a quick search got me to the merge request. The locale was merged last december 22nd (commit) and it seems it is being maintained by the UCI.

Google didn’t give out any result searching for es_CU glibc other than the few merge request messages (thread) and the bug report itself.

Besides, I wasn’t able to find an annoucement in gutl-l archive (the mailing list of the national FLOSS community), I did a search in the archive from past october up to now. It would be just nice to have an announcement posted to gutl-l in first place about such an achievement.

I hold no hope to see a project’s page and source code repository published. Nova, the UCI flagship product, have had a very hard time getting enough resources and permissions to publish their repositories.

In any case, I’m very happy about the merge.

PD: It calls my attention that contact phones are prefixes by +45 (Denmark) instead of +53 (Cuba), a typo?

III Taller Internacional de Software Libre (part 2)

If you haven’t read the part 1, I think you should.

In the afternoon the opening was a panel with people of Infomed, INFOSOC, UCI, Joven Clubs, MINED and MES, talking about the steps Cuba has given toward Software Libre inclusion. (I didn’t forget the link for UCI, it actually don’t have one)

Some steps I can remember:

  • Creation of a national group which will manage the migration.
  • Creation of the Software Libre Cuban web site. It will live in http://www.softwarelibre.cu/. (archived) It isn’t available yet.
  • Presentation of a LiveCD of Nova Linux, a distro cooked on UCI, based on Gentoo.
  • Presentation of CubaForge, a Cuban Forge based in GForge.

At last, Astorazegui, the technical director of the Aduana General de La República, cause the Aduana belongs to a International Organization, they should migrate as soon as they can. It seems that use of Software Libre is a rule of the International Organization. He commented out the experiences of the migration.

A potential Achilles’ heel could be the fact that INFOSOC are already talking of migration and I haven’t heard of official claims of a sustainable plan for teaching people in the culture of Software Libre and GNU/Linux. Yes, we need a lot to learn, it have been several years under the umbrella of Microsoft products.

The more people trained the country have the better we can afford the migration, of course. So, I think it’s necessary to have, as of today, a plan running for massive teaching. Would be wise to write a letter to INFOSOC suggesting the creation of “Plan de Maestros Emergentes de la Cultura de Software Libre y de GNU/Linux”.

See Strategy for use of Free (Libre) Software in Cuba (in spanish).

PD: While writing this blog I received a message posted by Ali to linux-l which claims that a meeting was held by INFOSOC with coordinators of some local LUG It’s wonderful to know that INFOSOC are taking in account the community.

III Taller Internacional de Software Libre (part 1)

Last friday I attended the III Taller Internacional de Software Libre, one of the events of the Fair Informática 2005, held in Havana.

I got firstly impressed with the size of the assigned room. I thought there was pretty more than 300 people. Last year, the Taller was in a little room, with about 60 attendants.

The first talk was given by an executive of Telecom Italia, basically publicizing his company, talking about the big changes that is experimenting the Telecom Italy Network. I wonder what that talk has to do with Software Libre.

In addition, its presentation had tons of slides. While the executive where talking, “accidentally” there was a short blackout in the room (oops), he wisely said:

…“It’s a polite way to suggest I should stop, I’ll finish in two minutes.”

Then follows a talk by Roberto del Puerto, director of Oficina de Informatización de la Sociedad (INFOSOC). He talked about advances, goals and plans towards Software Libre inclusion in Cuba. Worths noting that he didn’t forget the sniny Windows XP in his laptop because he claims that:

…“The first PC INFOSOC should migrate is my laptop”.

And then, the most exciting thing, a speech by Djalma Valois. He is assesor of Instituto Nacional de Tecnologia da Informação. For those of you who don’t know what ITI is, it’s the dependency of the Brazilian Goverment that is heading the migration to Software Libre in Brazil.

He was in Havana not to attending the fair but the organizing committee contacted and invited him to hear the experiences Brazil have had with the migration.

The speech, was very nice, prepared in a hurry, it was basically a “Get the facts of Brazil’s migration”, that was precisely what’s supposed to be. I really like when Brazilians (last year I did hear Sergio Amadeus) talk about Software Libre, they never forget to emphazise on the social impact asociated with the sharing of knowledge as the main advantage of the migration, over the economical and technical advantages.

It was enjoyable too that the organizing committee scheduled an hour for question and answers with Valois. Someone in the public give his best impressions and gratitude for Kurumin. Someone from PDVSA asked Valois for advises about official supporting. There was questions about costs, business model and so.

And then, lunch.

To be continued…