An analysis of why everybody uses YAML and hates it at the same time
With the recent rage of YAML hating I want to chip in: I hate YAML too! Now that I have this out of my system lets analyze why. I will focus mainly on my domain of interest: configuration management and orchestration. Recently more and more tools use YAML as their input language. For example, the TOSCA orchestration standard from OASIS, Cloudify, Ansible, OpenStack HEAT, CloudFormation, … Each of the examples above uses YAML as a means to input a model into the tool. A different type of use case is tools and services that use YAML as a configuration file format. Lets analyse both.
Text-based inputs are superior to graphical systems, because they are much easier to automate, version, compare and so on. Although the input is text based, it does not excuse you from designing the input, just like you should do with a graphical user interface. Designing user interfaces is hard. You need to think about how users use it and optimize the language for it. Such a language developed for a specific purpose, is typically called a domain specific language (DSL).
In computer science a language consists of two distinct elements: a syntax (what it looks like) and the semantics (what is means). The semantics are often specific for a system. (However it does not hurt to look at the previous 50 years and not repeat mistakes: yes I am looking at you TOSCA, but that is for another article). Once you have the semantics, you design a syntax to support the end user. When you have a syntax, you need to define it formally and write a lexer and parser to go from the syntax to something we call an abstract syntax tree (AST). This AST is than validated and evaluated, or compiled depending on the language.
The lexer and parser is where the trouble starts. Not many developers like designing a syntax for a language, let alone writing it down in (E)BNF form. And granted, it is not that easy but in my opinion not something that you can skip.
YAML, however, allows you to skip the design step. You define lists and maps and load it in your tool. These maps and lists are actually your AST. This is great for prototyping and getting the semantics right. Once you reach that point, a good DSL is really important to support your users. There is reason that many constructs in programming languages are called syntactic sugar. It makes it less bitter! Syntactic sugar optimizes a language for the tasks users have to do often.
A good starting point to design and write your own DSL is ANTLR and the book “Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages” written by the ANTLR author. ANTLR is not the most performant parser but it is rather user friendly.
Configuration languages are a totally different category: it is the other way around. The YAML syntax is kind of ok to input configuration data. Configuration data often only consists of lists, tables and maps. That is actually what YAML is good at. However, YAML only gives you an AST: you should still validate the input. An even better alternative which is not so error prone is TOML.
To conclude, YAML is nothing more than Yet Another Markup Language. It is OK to use it as a configuration language, if you validate the data. When the configuration language becomes more complex and offers templating, type coercion, … use a better language or create your own, with its own syntax and validate the input. For programming and modelling languages YAML can be used for a prototype. Whenever it becomes more serious create a DSL. And remember, a good engineer is a lazy engineer, but not too lazy!