How to Provision Cloud Infrastructure

Blog post cover

Provisioning Cloud Infrastructure

One of the best things about cloud computing is how it converts technical efficiencies into cost-savings. Some of those efficiencies are just part of the tool kit, like pay-per-use Lambda jobs. Good DevOps brings a lot of savings to the cloud, as well. It can smooth out high-friction state management challenges. Sprucing up how you provision cloud services, for example, speeds up deployments. That’s where treating infrastructure the same as workflows from the rest of your codebase comes in.

Treating infrastructure as code opens the doors to tons of optimization opportunities. One standout approach is standardization, which can simplify operational challenges. When you deploy from a configuration document, you decrease risk and speed up development. You also can employ those configuration files in automated DevOps workflows. In this post, we’ll give some examples of how you can leverage these benefits using Terraform for the deployment of cloud resources and Bolt for configuring them.

Deploy From Documentation

Terraform is great for building and destroying temporary resources. It can simplify an ad-hoc data processing workflow, for example. Let’s say you’re doing on-demand data processing in AWS. You need to spin up an EMR cluster, transform your data, and destroy the cluster immediately. This transient cluster workflow pattern saves you a ton. But manually deploying the cluster for each job slows down development time. With Terraform, you can write that cluster’s specifications once and check it into git to ensure you deploy the same version each time.

Terraform configurations are incredibly easy to write and read. They can also be easily modularized for reuse. Rather than plugging all of the configurations into one file, templatize the resource and the value for each argument from a tfvars file, which acts as a config.

Here is a truncated example of a templatized EMR resource that you might put in your main file.

resource "aws_emr_cluster" "cluster" {
  # required args:
  name                      = "${var.name}"
  release_label             = "${var.release_label}"
  applications              = "${var.applications}"
  service_role              = "${var.service_role}"

  master_instance_group {
    instance_type           = "${var.master_instance_type}"
  }

  core_instance_group {
    instance_type           = "${var.core_instance_type}"
    instance_count          = "${var.core_instance_count}"
  }
}

The vars are referenced from a terraform.tfvars file that inherits variable declarations from a variables.tf file.

terraform.tfvars:

name                        = "spark-app"
release_label               = "emr-5.30.0"
applications                = ["Hadoop", "Spark"]
master_instance_type        = "m3.xlarge"
core_instance_type          = "m3.xlarge"
core_instance_count         = 1

variables.tf:

variable "name" {}
variable "release_label" {}
variable "applications" {
  type = "list"
}
variable "master_instance_type" {}
variable "core_instance_type" {}
variable "core_instance_count" {}

Notice how easy it is to modify an instance type. They’re all well-documented and centrally managed in the code. No one has to look up a Wiki or previous version of the application. Just check it out of git and refer to a single, deployable config. Note that this is an incomplete list of arguments. For a full list of optional and required arguments see Terraform’s aws_emr_cluster documentation.

Furthermore, by storing your Terraform repo in git, you can leverage event-driven automation workflows, such as redeploying the resource on merges into your master branch.

Automate Config Management

Now let’s look at how to conveniently update persistent infrastructure such as a fleet of always-on EC2 instances. Applying new provisioning actions to each one can be time-consuming. Bolt by Puppet helps you manage multiple remote resources at once. You can use it to perform scheduled uptime monitoring or you can run one-off patching tasks. In either case, Bolt tools can be captured within your projects and maintained in git. That allows you to apply the benefits of infrastructure as code to your configuration and maintenance programs.

Bolt actions are either tasks or plans. Tasks are on-demand actions. Plans are orchestration scripts. Let’s start with a simple task. Suppose your development team needs a Docker engine installed on a suite of EC2 instances. It would look like this:

bolt task run package action=install name=docker --targets my-ec2-fleet

The installation will be applied to all of the resources declared as targets in the projects inventory file.

Plans are declarative workflows written in YAML that run one or more tasks. That makes them easy to read and modify. A simple plan to provision newly deployed web servers with nginx would look like this:

parameters:
  targets:
    type: TargetSpec

steps:
  - resources:
    - package: nginx
      parameters:
        ensure: latest
    - type: service
      title: nginx
      parameters:
        ensure: running
    targets: $targets
    description: "Set up nginx on the web servers"

Notice that targets is parameterized. That allows you to dynamically apply a list of resources when the plan is executed. You can leverage that further by integrating Bolt with other DevOps workflows.

Consolidate Into a Workflow

Now we’ve covered provisioning with both Terraform and Bolt. Both are great tools that help you standardize infrastructure and configuration processes as code. You can even string them together in a modular event-driven workflow to reliably reuse and modify. Relay, a workflow automation tool from Puppet, provides integrations with Terraform, Bolt, and AWS. For example, declaratively map successful Terraform deployment as triggers that pass AWS resource IDs to Bolt for further configuration.

Check out other integrations and see how Relay can streamline your cloud provisioning workflow.