What is YAML?

Blog post cover

What is YAML?

YAML is a serialization language that was created in 2001, although it would take another few years before it became super popular. The acronym originally referred to Yet Another Markup Language but this was changed a few years later to YAML Ain’t Markup Language, to emphasize that developers should use it for storing data, instead of creating documents (like HTML or Markdown, for example).

To find out why YAML was created, we need to go back to XML, which was the common configuration language before JSON.

XML, characterized by the use of < > blocks, was used in traditional client-server applications to exchange data. Java applications, AJAX development, and PHP projects all used XML. Although it was primarily developed to pass data, it was also used to store configuration settings and was later extended to allow more logic directly within XML itself. Unfortunately, it was never really built for this purpose.

That’s where JSON came in as a replacement for XML, mainly to pass data to JavaScript in a browser. Storing configuration settings was not really a good use case for JSON, and probably its biggest limitation was not offering any mechanism to store comments. JSON syntax is characterized by its { } blocks and strict quoting requirements.

Out of the limitations of both XML and JSON, YAML was born. It still provided the same purpose, passing data, but also helped fix some of the issues and limitations of both XML and JSON.

In this article, I’ll explore YAML in more detail, examine why developers like you should learn this handy (non-markup) language, and present some sample code demonstrating YAML in action.

What Exactly is YAML?

YAML is a data serialization language. When you think of a data serialization language you might think of JavaScript Object Notation (JSON). In fact, YAML is a simpler, more human-readable derivative of JSON.

Possibly the biggest advantage of using YAML is that nearly every programming language can read and write to it. This makes it easy to incorporate YAML into your application, no matter your preferred language.

One interesting characteristic of YAML is that it doesn’t use complex characters like <> or {}, but relies on - and indentation. For example, items in a list start with a hyphen, and a map of configuration parameters are identified as a string/keyword followed by a : with spaces for indentation. Note that there are no tabs; it’s all based on spaces.

Later in this article, a few examples will clarify this syntax use.

Where is YAML Used?

Some describe YAML as a development language, which in reality it is not at all. Remember that none of the three data structure formats (XML, JSON, and YAML) were developed to have any logic within the file structure itself. This should be handled by a programming language.

YAML became the go-to scenario for storing configuration settings across different development languages and DevOps platforms.

One of the first major YAML adopters was Red Hat’s Ansible, a configuration management solution. Ansible enables system administrators and DevOps teams to write a machine or software configuration definition file, or install software packages to be picked up by a deployment script (called a playbook). The YAML structure enables automating what used to be manual tasks.

More recently, along with several others, YAML is heavily used for:

  • Docker (Dockerfile)
  • Kubernetes
  • Puppet
  • AWS CloudFormation
  • Azure DevOps Pipelines
  • GitHub Actions
  • Relay

In each of these tools, the corresponding YAML (.yml) file outlines the configuration parameters and settings for a specific deployment. A Kubernetes YAML file defines the application service layout, what container image it should use, how many replicas to deploy, if the service should run behind a load balancer, what authorization connects the container registry, and so on.

As another example, Azure DevOps is adopting the YAML format to migrate away from the classic editor, which was the main tool for creating DevOps pipeline tasks in Visual Studio Team Services (VSTS) and Team Foundation Server (TFS).

Another example relying fully on YAML is GitHub Actions, another DevOps pipeline engine. Using YAML as the configuration file structure created a new concept, pipelines as code. This creates powerful pipeline deployment workflows, storing task details, together with related configuration parameters, in a yml-text-based file. One of the bigger benefits is that now your pipeline can be integrated as part of the (typically application source code-only) version control or source control component, like Git.

YAML in Action

In the previous paragraph, I highlighted several of the more popular tools and solutions where YAML is adopted as a standard for configuration and settings. In this section, I will share some “generic” examples of such YAML syntax.

Kubernetes

A sample Kubernetes.yml file, to deploy a Docker container to a Kubernetes cluster and run an application service, could look like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secsample
spec:
  replicas: 5
  selector:
    matchLabels:
      app: secsample
  template:
    metadata:
      labels:
        app: secsample
    spec:
      containers:
      - name: secsample
        image: myazureacr.azurecr.io/sampledotnet31app:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: secsample
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: secsample

The configuration contains the following information:

  • Deploy an application service (called secsample)
  • Based on five replicas (meaning there are five Kubernetes Pods running within the cluster at all times, across the different Kubernetes cluster worker nodes)
  • Use the Docker image “sampledotnet31app:latest”, available in an Azure Container Registry called “myazureacr”
  • Run the container on port 80 within the cluster
  • Expose the running containers/application service behind a load balancer and listen on port 80

GitHub Actions

GitHub Actions is a rather new, but already popular, “pipeline as a service” offering within Github (public and private) repositories. Where GitHub is source control, storing application source code files and configuration files, you typically need another DevOps tool to run the actual continuous integration and continuous deployment (CI/CD) pipeline. You can now trigger the pipeline directly within GitHub using Actions.

A sample pipeline, deploying a Node.js application, could look like this:

# This workflow will do a clean install of node dependencies, build the source code, and run tests across different versions of node
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-nodejs-with-github-actions

name: Node.js CI
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [10.x, 12.x, 14.x, 15.x]
# See supported Node.js release schedule at #https://nodejs.org/en/about/releases/
    steps:
    - uses: actions/checkout@v2
    - name: Use Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v1
      with:
        node-version: ${{ matrix.node-version }}
    - run: npm ci
    - run: npm run build --if-present
    - run: npm test

To help familiarize you with the syntax characteristics, here is what happens in some of the steps:

  • The “push” and “pull request” settings are the GitHub Action pipeline’s starting point (trigger). Whenever a DevOps engineer commits code to the main branch of source control or initiates a pull request to commit changes, the Action will be triggered.
  • The “runs-on” parameters specify that an Ubuntu build agent will run the pipeline itself. This is a pre-configured virtual machine (VM) as a service.
  • The “steps” section contains the actual step-by-step tasks to execute, starting with running any of the specified Node versions, initiating an npm package build, followed by initiating an npm test, and validating if the application build is successful.

Note the # symbol, which you can use to insert comments on a single line or multiple lines.

Relay.sh Workflows

Puppet’s Relay is an automation workflow as a service solution, running automated tasks against target environments across different private and public cloud platforms. The tasks themselves are containerized steps, written in any language, but the workflows that tie the steps together are based on YAML as well. Let’s look at an example from the sample workflow library:

apiVersion: v1
summary: Stop untagged EC2 instances
description: This workflow looks at all of the EC2 instances in a given account and region and stops the ones that are untagged. Requires an AWS account with permissions to delete stop EC2 instances.
homepage: https://github.com/puppetlabs/relay-workflows/tree/master/ec2-stop-untagged-instances
tags:
  - security
parameters:
  awsRegion:
    description: The AWS region to run in
    default: us-east-1
  dryRun:
    description: True if you don't want to perform actual changes
    default: 'true'
steps:
- name: describe-instances
  image: relaysh/aws-ec2-step-instances-describe
  spec:
    aws: &aws
      connection: !Connection { type: aws, name: my-aws-account }
      region: !Parameter awsRegion
- name: filter-instances
  image: relaysh/core:latest-python
  spec:
    instances: !Output {from: describe-instances, name: instances}
  inputFile: https://raw.githubusercontent.com/puppetlabs/relay-workflows/master/ec2-stop-untagged-instances/filter-instances.py
- name: approval
  description: Wait for approval to stop instances
  type: approval
  dependsOn: filter-instances
  when:
    - !Fn.equals [!Parameter dryRun, 'false']
- name: stop-instances
  dependsOn: approval
  image: relaysh/aws-ec2-step-instances-stop
  when:
    - !Fn.equals [!Parameter dryRun, 'false']
  spec:
    aws: *aws
    instanceIDs: !Output {from: filter-instances, name: instanceIDs}

Here are some additional details for some sections of this YAML file:

  • “Summary”, “description”, and “homepage” are defined as variables, but they are actually an easy way to provide additional descriptions. You can also add descriptions using # as comments.
  • The “parameters” section has two settings, the AWS region, and Dryrun, telling the Relay workflow what AWS region to target, where the DryRun Boolean option identifies impacted resources without making any changes yet (true or false).
  • The “steps” section runs four different tasks:

    • Identify all impacted instances, using a connection service defined earlier within the Relay platform.
    • Filter any impacted instances, based on a Python script available from a public GitHub repository.
    • Wait for approval to actually clean up the EC2 resources.
    • Stop the impacted EC2 instances, based on the “false” definition of the DryRun parameter.

Next Steps

We’ve seen that YAML is a widely-used data serialization language. It’s easy to learn and easy to read. Since YAML is already everywhere, if you’re a developer or DevOps practitioner, it won’t be long before you have to create or edit YAML yourself. For a deeper dive into YAML, and to see how to read and write it from every popular programming language, see the official YAML website.

About the Author - Peter De Tender

Peter has been working as an IT expert for 24+ years, with a background in Microsoft datacenter technologies. In early 2012, Peter started shifting to cloud technologies (Office 365, Intune), and quickly jumped onto the Azure platform, working as cloud solution architect and trainer, out of his own company. In 2019, Peter took on an FTE position as Azure Technical Trainer within Microsoft Corp, providing Azure Readiness Workshops to larger customers and partners within the EMEA Region and global, with a focus on Azure DevOps, Apps and Infra, and SAP workloads.

Peter was an Azure MVP for 5 years, is a Microsoft Certified Trainer (MCT) for 12+ years, and is still actively involved in the community as a public speaker, technical writer, book author, and publisher.