I hate YAML too!

An analysis of why everybody uses YAML and hates it at the same time

With the recent rage of YAML hating I want to chip in: I hate YAML too! Now that I have this out of my system lets analyze why. I will focus mainly on my domain of interest: configuration management and orchestration. Recently more and more tools use YAML as their input language. For example, the TOSCA orchestration standard from OASIS, Cloudify, Ansible, OpenStack HEAT, CloudFormation, … Each of the examples above uses YAML as a means to input a model into the tool. A different type of use case is tools and services that use YAML as a configuration file format. Lets analyse both.

Text-based inputs are superior to graphical systems, because they are much easier to automate, version, compare and so on. Although the input is text based, it does not excuse you from designing the input, just like you should do with a graphical user interface. Designing user interfaces is hard. You need to think about how users use it and optimize the language for it. Such a language developed for a specific purpose, is typically called a domain specific language (DSL).

In computer science a language consists of two distinct elements: a syntax (what it looks like) and the semantics (what is means). The semantics are often specific for a system. (However it does not hurt to look at the previous 50 years and not repeat mistakes: yes I am looking at you TOSCA, but that is for another article). Once you have the semantics, you design a syntax to support the end user. When you have a syntax, you need to define it formally and write a lexer and parser to go from the syntax to something we call an abstract syntax tree (AST). This AST is than validated and evaluated, or compiled depending on the language.

The lexer and parser is where the trouble starts. Not many developers like designing a syntax for a language, let alone writing it down in (E)BNF form. And granted, it is not that easy but in my opinion not something that you can skip.

YAML, however, allows you to skip the design step. You define lists and maps and load it in your tool. These maps and lists are actually your AST. This is great for prototyping and getting the semantics right. Once you reach that point, a good DSL is really important to support your users. There is reason that many constructs in programming languages are called syntactic sugar. It makes it less bitter! Syntactic sugar optimizes a language for the tasks users have to do often.

A good starting point to design and write your own DSL is ANTLR and the book “Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages” written by the ANTLR author. ANTLR is not the most performant parser but it is rather user friendly.

Configuration languages are a totally different category: it is the other way around. The YAML syntax is kind of ok to input configuration data. Configuration data often only consists of lists, tables and maps. That is actually what YAML is good at. However, YAML only gives you an AST: you should still validate the input. An even better alternative which is not so error prone is TOML.

To conclude, YAML is nothing more than Yet Another Markup Language. It is OK to use it as a configuration language, if you validate the data. When the configuration language becomes more complex and offers templating, type coercion, … use a better language or create your own, with its own syntax and validate the input. For programming and modelling languages YAML can be used for a prototype. Whenever it becomes more serious create a DSL. And remember, a good engineer is a lazy engineer, but not too lazy!

Tuur!

Friday at 01:50 our second son Tuur was born. He was born 6 weeks early at 34 weeks, but was already 49cm and 2.8kg. Although it is a preterm birth everything currently goes by the best case scenarios the doctors laid out upfront. This scenario does include wires and tubes and staying in an incubator. Gradually he has been losing wires and tubes and today he moved to a normal bed in the NICU.

tuur

In 2005 I really like the album “Tourist” of Athlete. One of the best known songs on this album is “Wires”, which is about the preterm birth of the singers daughter. This gave the lyrics some meaning but it never really got to me. Until now. Every word in this song is spot on about how it feels to have your newborn child born too early.

OpenStack neutron CLI

The OpenStack neutron CLI allows you to control almost all aspects of neutron that you can control with the REST api. For the more advanced command line operations, it is not always clear how to structure the command line arguments. Examples include clearing an attribute, setting a list of key-values, … You need this to set routes for a router, host routes on a subnet, the gateway of a subnet, … It took me quite some time to figure this out, so this might be helpful as a reference.

In complex network deployments you need to set routes on subnets (these get distributed through dhcp to the vm’s) or additional routes on routers in the host_routes and routes property respectively. Each route consists of a cidr destination address and a next hop. The syntax for this is the following:
neutron router-update router1 --routes type=dict list=true destination=0.0.0.0/0,nexthop=10.0.0.1 destination=10.100.128.0/20,nexthop=10.100.2.254

This command sets a default route and a route to an other router connected to router1. The “magic” here is that you need to specify that it is a list of dictionaries. The CLI tool transforms this to the following JSON:
"routes": [{"nexthop": "10.0.0.1", "destination": "0.0.0.0/0"}, {"nexthop": "10.100.2.254", "destination": "10.100.128.0/20"}]

If you want to clear these routes you need use the following command:
neutron router-update router1 --routes action=clear

You can use the action=clear syntax to clear other attributes as well, such as the gateway of a subnet.

sshd crypto configuration on CentOS 7

It is possible to restrict the crypto that SSH uses both on the server side and the client side. I control virtually all ssh clients that have access to the servers I manage so I have the freedom to use more restrictive ssh crypto than configured by default.

Mozilla has an excellent guide on their wiki. The servers I manage run CentOS 7 which includes OpenSSH 6.3. The mozilla guideliness are either for a very recent release or for the older CentOS 6. On github the user stribika published a list of ciphers that are considered secure and hard to break by the NSA. The main difference between these two lists are the removal of all EC (elliptic curve) based functions from the Mozilla list.

This brings me to the following configuration for my CentOS 7 machines:

# Supported HostKey algorithms
HostKey /etc/ssh/ssh_host_rsa_key

## Algorithms based on Mozilla guideliness and
## https://stribika.github.io/2015/01/04/secure-secure-shell.html [1]

# Mozzila guideliness
# KexAlgorithms ecdh-sha2-nistp521 ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256
# NIST EC algorithms removed [1]
KexAlgorithms diffie-hellman-group-exchange-sha256

# Combination of Mozzila and [1] (look at gcm ciphers for beter scp performance)
Ciphers aes256-ctr,aes192-ctr,aes128-ctr

# List of Mozilla because it is more restrictive
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com

# KeyRegenerationInternal is halved from the default as a precaution (optional). 1800 seconds is 30 minutes.
KeyRegenerationInterval 1800

# Password based logins are disabled - only public key based logins are allowed.
AuthenticationMethods publickey

On CentOS 7 the only KexAlgorithm left is diffie-hellman-group-exchange-sha256. To make sure the the available exponents are large enough stribika recommends removing al exponents smaller than 2000 with the following commands:

awk '$5 > 2000' /etc/ssh/moduli > "${HOME}/moduli"
wc -l "${HOME}/moduli" # make sure there is something left
mv "${HOME}/moduli" /etc/ssh/moduli

If no exponents are left, generate new ones with (this can take a long time!):

ssh-keygen -G "${HOME}/moduli" -b 4096
ssh-keygen -T /etc/ssh/moduli -f "${HOME}/moduli"
rm "${HOME}/moduli"

I tested this configuration from ssh clients running Fedora 21, CentOS 7 and CentOS 6, Ubuntu 12.04 and Ubuntu 14.04.

Configuration management camp 2015

Monday and Thursday the 2nd and 3rd of February it is cfgmgmtcamp in Gent. Although I have been doing research in this area since 2008 I have never attended this event (it is free and only 1 hour by train from where I live).

This year is different. I am not only attending it, but I will also present Impera and why we developed it. It is the name of the tool which is part of my PhD. It has been available on Github for more than two years. However I never made any publicity. Now is different.

It is available as an open source tool, including many configuration model on Github. On readthedocs there is some preliminary documentation available. In the next weeks we will release more documentation and configuration modules.

At the same time I am working with two colleagues and our lab (DistriNet) to create a University spin-off Impera that will focus on cloud management. The tool that I will present on Monday is part of what we will offer. More on that later.

So, if you want a sneak preview for Monday you can look at the tutorial in the documentation. If you have any comments or questions, please let us know!

OpenStack Fundamentals training

Next month we organize a course on OpenStack. This course gives an introduction to private cloud and the OpenStack architecture and components. The full day training concludes with specific deployment architectures and the supporting technologies that OpenStack requires.

For more information and registration go to the event page of our research lab: https://distrinet.cs.kuleuven.be/events/2014/OpenStackFundamentals.jsp

What’s next?

In June I obtained my PhD and now it is time for the next step. Together with other partners we are working on a venture, more details will follow later. Currently I am finishing up the research at DistriNet and the projects I was involved with.

In the meanwhile I am available for freelance work and consulting. I have build up expertise in the design and management (deployment and monitoring) of complex distributed systems in hybrid and multi-cloud environments. The technologies I am specialized in:

  • OpenStack, Ceph and Open vSwitch
  • Puppet and other tools
  • Monitoring and logging: Metrics, Collectd, Graphite, OpenTSDB, Nagios, Logstash Kibana, …
  • JBoss Application Server
  • Ubuntu, Fedora and CentOS
  • Cassandra, MongoDB and HBase
  • Python3 development
  • Redmine and automated git hosting
  • Alfresco

My LinkedIn profile.