Intro

This guide will walk you through how to use the Gruntwork Infrastructure as Code Library to define and manage a production-grade tech stack as code on top of AWS.

What is the Gruntwork Infrastructure as Code Library?

gruntwork infrastructure as code library
Figure 1. The Gruntwork Infrastructure as Code Library

The Gruntwork Infrastructure as Code Library is a collection of over 300,000 lines of reusable, battle-tested, production-ready infrastructure code for AWS that lets you:

Go to prod in days, not months

Most teams have the same basic infrastructure needs—e.g., Docker cluster, load balancer, database, cache, CI/CD, monitoring, secrets management, and so on—so instead of wasting months reinventing the wheel and building these same pieces from scratch, you get to leverage a library of reusable, battle-tested, off-the-shelf infrastructure that has been built by a team of DevOps experts and proven in production at hundreds of companies.

Customize everything using infrastructure as code

Everything in the Gruntwork Infrastructure as Code Library is defined as code (primarily in Terraform, Go, Python, and Bash) and as a Gruntwork Subscriber, you get access to 100% of this code. You can combine and compose this code in any way you wish, see how everything works under the hood, debug any issues you run into, and customize and modify the code to fit your exact needs.

Learn best practices

The Infrastructure as Code Library includes both thorough documentation and access to the DevOps Training Library, a collection of video training courses that teach a variety of DevOps topics, including infrastructure as code, Terraform, Docker, Packer, AWS, security, and more.

Keep everything up to date

The Gruntwork team is constantly updating the Gruntwork Infrastructure as Code Library with the latest best practices, new features, and bug fixes. Instead of spending months fighting with Terraform or Kubernetes updates, better infrastructure is just a version number bump away (see the monthly Gruntwork Newsletter).

Get commercial support

Work with a team of DevOps experts who can help you set up your infrastructure, design highly available and scalable systems, automate your builds and deployments, troubleshoot issues, and avoid gotchas and pitfalls. Reach out to Gruntwork Support via Slack, email, and phone/video calls, get code reviews, prioritized bug fixes, and SLAs on response times.

What you’ll learn in this guide

This guide consists of three main sections:

Core concepts

An overview of the core concepts you need to understand to use the Gruntwork Infrastructure as Code Library, including a look into how the Infrastructure as Code Library is designed, how the Reference Architecture is designed, how we build production-grade infrastructure, and how to make use of infrastructure as code, Terraform, Terragrunt, Packer, Docker, immutable infrastructure, versioning, automated testing, and more.

How to use the Gruntwork Infrastructure as Code Library

A step-by-step walkthrough of how to use the Gruntwork Infrastructure as Code Library.

Next steps

What to do next once you’ve finished reading this guide.

Feel free to read the guide from start to finish or skip around to whatever part interests you!

Core concepts

The Gruntwork Infrastructure as Code Library

The Gruntwork Infrastructure as Code Library is a collection of over 300,000 lines of reusable, battle-tested infrastructure code that is organized into 40+ GitHub repos, some public and open source, and some private and only accessible to Gruntwork customers. Each repo is focused on one type of infrastructure: e.g., there is one repo that contains code for deploying and managing Kubernetes on AWS, one repo with code for deploying and managing the ELK stack (Elasticsearch, Logstash, Kibana), one repo that contains a collection of CI / CD code, and so on.

The code in the Gruntwork Infrastructure as Code Library is written using a combination of:

Terraform

Used to define and manage most of the basic infrastructure, such as servers, databases, load balancers, and networking.

Go

Used to build cross-platform CLI applications (e.g., ssh-grunt is a Go app you can run on your EC2 instances to manage SSH access to those instances via IAM groups) and to write automated tests for all of the modules (using the open source Go library Terratest).

Bash

Used for small scripts on Linux and macOS, including:

  • Install scripts: used to install and configure a piece of software. Example: the install-elasticsearch script can be used to install Elasticsearch on Linux.

  • Run scripts: used to run a piece of software, typically during boot. Example: you can execute the run-elasticsearch script while a server is booting to auto-discover other Elasticsearch nodes and bootstrap an Elasticsearch cluster.

Python

Used for more complicated scripts, especially those that need to run on other operating systems (e.g., Windows) and/or those that need to be called directly from Terraform (e.g., to fill in some missing functionality).

Helm

Used to define and manage Kubernetes applications and resources. Example: k8s-service is a helm chart that packages your application containers into a best practices deployment for Kubernetes.

Why these tools? We wrote a detailed blog post on why we use Terraform; as for Go, Bash, and Python, we use them because they work just about everywhere, with few or no external dependencies, and they can be integrated with almost any configuration management approach: e.g., you can use Bash scripts with Chef, Puppet, Ansible, Packer, and Docker.

We use helm because it has a strong community (including official support from the Cloud Native Computing Foundation) with many vendors officially packaging their applications into a Helm Chart (e.g. Hashicorp uses Helm to package Vault and Consul for Kubernetes).

The code in each repo is organized into three primary folders, modules (or charts for Helm), examples, and test, as described in the following sections.

Modules

elk modules
Figure 2. The modules folder of the ELK (Elasticsearch, Logstash, Kibana) repo

Each repo in the Gruntwork Infrastructure as Code Library contains a modules (or charts) folder that contains the main implementation code, broken down into multiple standalone, orthogonal, reusable, highly configurable modules. For example, the ELK repo (Elasticsearch, Logstash, Kibana) isn’t one giant module that deploys the entire ELK stack, but a bunch of separate modules for installing, running, and deploying Elasticsearch, Kibana, Logstash, Elastalert, Beats, Collectd, and so on.

This allows you to combine and compose the modules in many different permutations to fit your exact needs: e.g., some use cases need only Elasticsearch, while others need the full ELK stack, and for those that use the full stack, you may run each component in separate clusters in some environments (e.g., in prod, for high availability and scalability) and all in a single cluster in other environments (e.g., in dev, to save money).

Examples

elk examples
Figure 3. The examples folder of the ELK (Elasticsearch, Logstash, Kibana) repo

Each repo in the Gruntwork Infrastructure as Code Library contains an examples folder that shows you how to assemble the modules from the modules folder into different permutations. This lets you try the modules out in minutes, without having to write a line of code. In other words, this is executable documentation.

Automated tests

elk tests
Figure 4. The test folder of the ELK (Elasticsearch, Logstash, Kibana) repo

Each repo in the Gruntwork Infrastructure as Code Library contains a test folder that contains automated tests for the examples in the examples folder. These are mostly integration tests, which use Terratest under the hood to deploy the examples into real environments (e.g., real AWS accounts), validate that everything works, and then tear everything down.

For example, after every commit to the ELK repo, we spin up a dozen ELK clusters, perform a variety of validation steps (e.g., read data, write data, access Kibana, etc.) and then tear it all down again. This is how we build confidence that the code does what we say it does—and that it continues to do it over years of updates.

Versioning

newsletter
Figure 5. An example of all the versioned updates announced in the monthly Gruntwork Newsletter

All of the code in the Gruntwork Infrastructure as Code Library is versioned. Every time we make a change, we put out a new versioned release, and announce it in the monthly Gruntwork Newsletter.

When you use the code from the Gruntwork Infrastructure as Code Library (a topic we’ll cover in How to use the Gruntwork Infrastructure as Code Library), you pin yourself to a specific version of the code. That way, you are not accidentally affected by any subsequent changes in the Gruntwork Infrastructure as Code Library until you explicitly choose to pull those changes in. And when you do want to pull the changes in, it’s just a matter of bumping the version number!

We use version numbers of the form MAJOR.MINOR.PATCH (e.g., 1.2.3), following the principles of semantic versioning. In traditional semantic versioning, you increment the:

  1. MAJOR version when you make incompatible API changes,

  2. MINOR version when you add functionality in a backwards compatible manner, and

  3. PATCH version when you make backwards compatible bug fixes.

However, much of the Gruntwork Infrastructure as Code Library is built on Terraform, and as Terraform is still not at version 1.0.0 (latest version as of August, 2019, was 0.12.6), most of the Gruntwork Infrastructure as Code Library is using 0.MINOR.PATCH version numbers. With 0.MINOR.PATCH, the rules are a bit different, where you increment the:

  1. MINOR version when you make incompatible API changes

  2. PATCH version when you add backwards compatible functionality or bug fixes.

The Gruntwork Reference Architecture

ref arch full
Figure 6. An example of the Gruntwork Reference Architecture

The Gruntwork Reference Architecture is a production-grade, end-to-end tech stack built on top of the modules from the Gruntwork Infrastructure as Code Library.

What’s included

The Reference Architecture includes just about everything the typical company needs: multiple environments, each configured with server orchestration (e.g., Kubernetes), load balancers, databases, caches, network topology, monitoring, alerting, log aggregation, CI/CD, user management, secrets management, SSH management, VPN management, and much more. For a detailed walkthrough, see How to Build an End to End Production-Grade Architecture on AWS. We wire all these pieces together according to your needs, deploy everything into your AWS accounts, and give you 100% of the code—all in about one day.

Opinionated code

Whereas the Gruntwork Infrastructure as Code Library is relatively unopinionated, allowing you to combine and compose modules, tools, and approaches however you want ("à la carte"), the Gruntwork Reference Architecture is more opinionated, giving you a pre-defined, standardized set of modules, tools, and approaches to choose from ("prix fixe"). If the opinionated design of the Reference Architecture looks like a good fit for your company, you may wish to purchase it as a way to save months of having to wire everything together and deploy it yourself. If the opinionated design is not a good fit, then you can use the Gruntwork Infrastructure as Code Library directly instead.

See an example Reference Architecture

You can find the code for an example Reference Architecture for a fictional Acme corporation in the following repos:

Important
You must be a Gruntwork subscriber to access these example repos.
  • Walkthrough documentation (start here!): The Reference Architecture comes with end-to-end documentation that walks you through all of the code so you know how to run things in dev, how to deploy changes to prod, how to find metrics and logs, how to connect over VPN and SSH, and so on. This is a great starting point for exploring the Reference Architecture.

  • infrastructure-modules: In this repo, you’ll find the reusable modules that define the infrastructure for the entire company (in this case, for Acme). These are like the blueprints for a house.

  • infrastructure-live: This repo uses the modules from infrastructure-modules to deploy all of the live environments for the company (dev, stage, prod, etc). These are like the real houses built from the blueprints.

  • sample-app-frontend: This repo contains a sample app that demonstrates best practices for a Docker-based frontend app or microservice, including examples of how to talk to backend apps (i.e., service discovery), manage secrets, use TLS certs, and render HTML and JSON. This app is written in Node.js but the underlying patterns apply to any language or technology.

  • sample-app-backend: This repo contains a sample app that demonstrates best practices for a Docker-based backend app or microservice, including examples of how to talk to a database, do schema migrations, manage secrets, and use TLS certs. This app is written in Node.js but the underlying patterns apply to any language or technology.

The production-grade infrastructure checklist

The Gruntwork Infrastructure as Code Library is a collection of production-grade infrastructure—that is, the type of reliable, secure, battle-tested infrastructure that you’d bet your company on. Every time you deploy something to production, you’re betting that your infrastructure won’t fall over if traffic goes up; you’re betting that your infrastructure won’t lose your data if there’s an outage; you’re betting that your infrastructure won’t allow your data to be compromised when hackers try to break in; and if these bets don’t work out, your company may go out of business. That’s what’s at stake when we talk about "production-grade."

Building production-grade infrastructure requires taking into account a long list of details, which we have captured in The Production-Grade Infrastructure Checklist:

Table 1. The Production-Grade Infrastructure Checklist
Task Description Example tools

Install

Install the software binaries and all dependencies.

Bash, Chef, Ansible, Puppet

Configure

Configure the software at runtime. Includes port settings, TLS certs, service discovery, leaders, followers, replication, etc.

Bash, Chef, Ansible, Puppet

Provision

Provision the infrastructure. Includes EC2 instances, load balancers, network topology, security groups, IAM permissions, etc.

Terraform, CloudFormation

Deploy

Deploy the service on top of the infrastructure. Roll out updates with no downtime. Includes blue-green, rolling, and canary deployments.

Scripts, Orchestration tools (ECS, k8s, Nomad)

High availability

Withstand outages of individual processes, EC2 instances, services, Availability Zones, and regions.

Multi AZ, multi-region, replication, ASGs, ELBs

Scalability

Scale up and down in response to load. Scale horizontally (more servers) and/or vertically (bigger servers).

ASGs, replication, sharding, caching, divide and conquer

Performance

Optimize CPU, memory, disk, network, GPU, and usage. Includes query tuning, benchmarking, load testing, and profiling.

Dynatrace, valgrind, VisualVM, ab, Jmeter

Networking

Configure static and dynamic IPs, ports, service discovery, firewalls, DNS, SSH access, and VPN access.

EIPs, ENIs, VPCs, NACLs, SGs, Route 53, OpenVPN

Security

Encryption in transit (TLS) and on disk, authentication, authorization, secrets management, server hardening.

ACM, EBS Volumes, Cognito, Vault, CIS

Metrics

Availability metrics, business metrics, app metrics, server metrics, events, observability, tracing, and alerting.

CloudWatch, DataDog, New Relic, Honeycomb

Logs

Rotate logs on disk. Aggregate log data to a central location.

CloudWatch logs, ELK, Sumo Logic, Papertrail

Backup and Restore

Make backups of DBs, caches, and other data on a scheduled basis. Replicate to separate region/account.

RDS, ElastiCache, ec2-snapper, Lambda

Cost optimization

Pick proper instance types, use spot and reserved instances, use auto scaling, and nuke unused resources.

ASGs, spot instances, reserved instances

Documentation

Document your code, architecture, and practices. Create playbooks to respond to incidents.

READMEs, wikis, Slack

Tests

Write automated tests for your infrastructure code. Run tests after every commit and nightly.

Terratest

Most other collections of infrastructure code (e.g., AWS Quick Starts, Bitnami Application Catalog, the Terraform Registry, Ansible Galaxy, Chef Supermarket, etc) are useful for learning and example code, but they do not take most of this checklist into account, and therefore are not a good fit for direct production use. On the other hand, every module in the Gruntwork Infrastructure as Code Library goes through the production-grade checklist and is explicitly designed for use directly in production.

Infrastructure as code

Everything in the Gruntwork Infrastructure as Code Library is designed to allow you to define your infrastructure as code (IaC). That is, instead of deploying infrastructure manually (e.g., by clicking around a web page), the idea behind IaC is to write code to define, provision, and manage your infrastructure. This has a number of benefits:

Self-service

Most teams that deploy code manually have a small number of sysadmins (often, just one) who are the only ones who know all the magic incantations to make the deployment work and are the only ones with access to production. This becomes a major bottleneck as the company grows. If your infrastructure is defined in code, then the entire deployment process can be automated, and developers can kick off their own deployments whenever necessary.

Speed and safety

If the deployment process is automated, it’ll be significantly faster, since a computer can carry out the deployment steps far faster than a person; and safer, since an automated process will be more consistent, more repeatable, and not prone to manual error.

Documentation

Instead of the state of your infrastructure being locked away in a single sysadmin’s head, you can represent the state of your infrastructure in source files that anyone can read. In other words, IaC acts as documentation, allowing everyone in the organization to understand how things work, even if the sysadmin goes on vacation.

Version control

You can store your IaC source files in version control, which means the entire history of your infrastructure is now captured in the commit log. This becomes a powerful tool for debugging issues, as any time a problem pops up, your first step will be to check the commit log and find out what changed in your infrastructure, and your second step may be to resolve the problem by simply reverting back to a previous, known-good version of your IaC code.

Validation

If the state of your infrastructure is defined in code, then for every single change, you can perform a code review, run a suite of automated tests, and pass the code through static analysis tools, all practices that are known to significantly reduce the chance of defects.

Happiness

Deploying code and managing infrastructure manually is repetitive and tedious. Developers and sysadmins resent this type of work, as it involves no creativity, no challenge, and no recognition. You could deploy code perfectly for months, and no one will take notice—until that one day when you mess it up. That creates a stressful and unpleasant environment. IaC offers a better alternative that allows computers to do what they do best (automation) and developers to do what they do best (coding).

Reuse

You can package your infrastructure into reusable modules, so that instead of doing every deployment for every product in every environment from scratch, you can build on top of known, documented, battle-tested pieces. You can build these reusable modules yourself or use an existing collection of modules, such as the Gruntwork Infrastructure as Code Library.

Some of the main IaC tools you’ll see used and referenced in the Gruntwork Infrastructure as Code Library are Terraform, Terragrunt, Packer, Docker, and Helm, each of which we’ll discuss in the next several sections.

Terraform

Terraform is an open source provisioning tool that allows you to define and manage as code a wide variety of infrastructure (e.g., servers, load balancers, databases, network settings, and so on) across a wide variety of providers (e.g., AWS, GCP, Azure). For example, here’s some example Terraform code you can use to deploy an EC2 instance (a virtual server) running Ubuntu 18.04 into the us-east-2 region of AWS:

terraform-example.tf
# Deploy to the us-east-2 region of AWS
provider "aws" {
  region = "us-east-2"
}

# Deploy an EC2 instance running Ubuntu 18.04
resource "aws_instance" "example" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
}

You can deploy this server by running terraform init and terraform apply. Check out the Comprehensive Guide to Terraform for a thorough introduction to the language.

A large percentage of the infrastructure code in the Gruntwork Infrastructure as Code Library is defined using Terraform. We even wrote the book on it!

Terraform Cloud and Terraform Enterprise

Terraform Cloud and Terraform Enterprise are HashiCorp’s commercial Terraform products. They include many additional features for Terraform, including plan and apply workflows with approvals, role-based access control for teams, policy as code using Sentinel, and more.

tfc

The Gruntwork module library and open source tools are compatible with Terraform Cloud and Terraform Enterprise.

Terragrunt

Terragrunt is a thin, open source wrapper for Terraform. It is designed to fill in some missing features in Terraform, such as allowing you to define your Terraform backend configuration in one terragrunt.hcl file, rather than having to copy/paste the same config over and over again:

terragrunt.hcl
remote_state {
  backend = "s3"
  config = {
    bucket         = "my-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "my-lock-table"
  }
}

Once you’ve created your terragrunt.hcl configuration, you can run all the usual Terraform commands, but with terragrunt as the binary: e.g., terragrunt plan, terragrunt apply, terragrunt destroy. Check out Terragrunt: how to keep your Terraform code DRY and maintainable for a thorough introduction.

Note that while the Gruntwork Reference Architecture relies on Terragrunt as one of its opinionated tools, the Gruntwork Infrastructure as Code Library does NOT require Terragrunt; you can use the Terraform modules in the Gruntwork Infrastructure as Code Library with plain Terraform, Terraform Enterprise, Atlantis, Terragrunt, or any other tools you prefer.

Packer

Packer is an open source tool you can use to define machine images (e.g., VM images, Docker images) as code. For example, here is how you can use Packer to define an Ubuntu 18.04 Amazon Machine Image (AMI) that has Node.js installed:

packer-example.json
{
  "builders": [{
    "type": "amazon-ebs",
    "region": "us-east-2",
    "source_ami": "ami-0c55b159cbfafe1f0",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "packer-example-{{timestamp}}"
  }],
  "provisioners": [{
    "type": "shell",
    "inline": [
      "curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -",
      "sudo apt-get update -y",
      "sudo apt-get install -y nodejs"
    ]
  }]
}

You can run packer build packer-example.json to build an AMI from this code and then deploy this AMI to your AWS account using other tools. For example, the Gruntwork Infrastructure as Code Library contains several Terraform modules that can deploy AMIs across one or more servers (e.g., into an AWS Auto Scaling Group), with support for auto scaling, auto healing, zero-downtime deployments, etc.

The Gruntwork Infrastructure as Code Library contains a number of scripts and binaries that you can run on your servers: e.g., the ELK code includes scripts you run during boot on Elasticsearch servers to bootstrap the cluster, and the security code includes an ssh-grunt binary you can run on each server to manage SSH access to that server using IAM groups (i.e., IAM users in specific IAM groups will be able to SSH to specific servers using their own usernames and SSH keys).

To get these scripts and binaries onto your virtual servers (e.g., onto EC2 instances in AWS), we recommend using Packer to build VM images that have these scripts and binaries installed. You’ll see an example of how to do this in How to use the Gruntwork Infrastructure as Code Library. Note that Gruntwork Infrastructure as Code Library does NOT require that you use Packer (e.g., you could also use Ansible or Chef to install the scripts and binaries), but the Gruntwork Reference Architecture does use Packer as one of its opinionated tools.

Docker

Docker is an open source tool you can use to run containers and define container images as code. A container is a bit like a lightweight VM, except instead of virtualizing all the hardware and the entire operating system, containers virtualize solely user space, which gives you many of the isolation benefits of a VM (each container is isolated in terms of memory, CPU, networking, hard drive, etc), but with much less memory, CPU, and start-up time overhead. For example, here is how you can define an Ubuntu 18.04 Docker image that has Node.js installed:

Dockerfile
FROM ubuntu:18.04

RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - && \
    sudo apt-get update -y && \
    sudo apt-get install -y nodejs

You can run docker build -t example-image . to build a Docker image from this code, push the image to a Docker Registry (e.g., ECR or Docker Hub), and then deploy the Docker image using other tools. For example, the Gruntwork Infrastructure as Code Library contains a number of modules for running container orchestration tools such as Kubernetes, ECS, and Nomad that you can use to deploy and manage Docker images.

Helm

Helm is a package and module manager for Kubernetes that allows you to define, install, and manage Kubernetes applications as reusable packages called Charts. Helm provides support for official charts in their repository that contains various applications such as Jenkins, MySQL, and Consul to name a few.

Charts are written in go templates and allow templatization of Kubernetes manifests to allow deployment across varying inputs. This allows vendors to simplify distribution of their applications for Kubernetes so that users can deploy and upgrade their application in a single command.

Each vendor distributes their charts using chart repositories, which is a server that serves chart packages. You can access any chart in a repository once you add it to your local configuration.

For example, you can use the helm client to add the official Elastic repository to install ElasticSearch onto Kubernetes:

helm repo add elastic https://helm.elastic.co
helm install elastic/elasticsearch

The Gruntwork Infrastructure as Code Library contains a number of Helm charts that help you manage your deployments of your applications: e.g the k8s-service helm chart packages your application containers into a best practices deployment for managing on Kubernetes. You can access the Gruntwork Helm charts by adding the Gruntwork Helm Chart Repository. You will see an example of how to do this in How to use the Gruntwork Infrastructure as Code Library.

Immutable infrastructure

With mutable infrastructure, you deploy a set of servers, and you continuously update those servers in place. Every new update gets installed on top of the previous updates, either manually (e.g., by SSHing to each server and running commands), or via tools like Ansible, Chef, or Puppet. The idea behind immutable infrastructure is that once you deploy a server, you never change it again. If you need to roll out an update, you deploy a new server with that update, and undeploy the old one. This paradigm is built for use with (a) the cloud, where you can easily spin up or tear down servers on-demand and (b) machine images, as every time there’s a change, you can use tools like Packer or Docker to build a new, immutable, versioned machine image (e.g., VM image or Docker image), and deploy new servers with that image.

The advantages of immutable infrastructure are:

Easier to reason about servers

With mutable infrastructure, each server builds up a unique history of changes, so each one is a little different, which (a) makes it difficult to reason about what’s actually installed and (b) leads to tricky bugs that only show up on some servers, and not on others. With immutable infrastructure, you avoid these sorts of bugs, and you always know what’s installed on any server, as you know the exact image each server is running, and that the image never changes.

You can run the same images in all environments

Whereas it’s rare to run mutable infrastructure tools such as Ansible, Chef, or Puppet in your local dev environment, it’s common to run the same Docker or VM image in all environments, including your laptop, staging, and production. This helps to reduce "works on my machine" and environment-specific bugs, and makes it easier to debug those issues when they do happen.

Easier scaling and rollback

With immutable images, you can quickly and easily spin up 100 or 1,000 servers, with no need to worry about how long it’ll take to configure all those servers (e.g., via Ansible, Chef, or Puppet), as all the configuration has already happened and is captured in the VM or Docker image. Rollback is easier too, as you can quickly jump back to a previous image, without having to wait for and worry about running a bunch of older install commands (which may no longer work, e.g., if certain packages have been removed from APT or YUM).

How to use the Gruntwork Infrastructure as Code Library

With all the core concepts out of the way, let’s now discuss how to use the Gruntwork Infrastructure as Code Library to build production-grade infrastructure.

Learning resources

The first step is to learn! You’ll need to learn about the AWS cloud, infrastructure (e.g., VPCs, Kubernetes, Kafka, ELK), tools (e.g., Terraform, Docker, Packer), and DevOps practices (e.g., CI, CD). Here are some useful resources:

  1. Gruntwork DevOps Training Library: a collection of video training courses that teach a variety of DevOps topics, such as infrastructure as code, Terraform, Docker, Packer, AWS, security, and more.

  2. Gruntwork Production Deployment Guides: a collection of guides that do step-by-step walkthroughs of how to go to production. You’re reading one now!

  3. Gruntwork DevOps Resources: a collection of blog posts, talks, books, and checklists for learning about DevOps, AWS, Terraform, Docker, Packer, and more.

Get access to the Gruntwork Infrastructure as Code Library

The next step is to get access to the Gruntwork Infrastructure as Code Library.

  1. To get access, you must become a Gruntwork subscriber.

  2. As part of the sign up process, we’ll ask for your GitHub user ID. The Gruntwork Infrastructure as Code Library lives in 40+ GitHub repos, most of them private, so you’ll need to send us a GitHub user ID so we can grant you access. If you don’t already have a GitHub user, you can create one for free on github.com.

  3. If you haven’t already, create an SSH key, add it to ssh-agent, and associate it with your GitHub user (instructions). You’ll need a working SSH key to access Terraform modules in the Gruntwork Infrastructure as Code Library (you’ll see examples of this later in the guide).

  4. If you haven’t already, create a GitHub personal access token (instructions). You’ll need a working GitHub personal access token to access scripts and binaries in the Gruntwork Infrastructure as Code Library (you’ll see examples of this later in the guide).

Find the modules you want to use

The next step is to find the modules you want to use. Head over to the Gruntwork Infrastructure as Code Library and find the repos that you wish to use. Browse the modules folder each the repo to see what modules are available and the examples folders to see the various ways to combine those modules. You can also browse the example Reference Architecture to find production-grade code to use as a starting point.

Within the Infrastructure as Code Library, you’ll find two types of modules: (1) Terraform modules and (2) scripts and binaries. The next two sections of the guide will walk you through how to use each of these.

Using Terraform Modules

This section will show you how to use Terraform modules from the Gruntwork Infrastructure as Code Library. As an illustrative example, we’ll deploy the vpc-app Terraform module from module-vpc.

Important
You must be a Gruntwork subscriber to access module-vpc.

You can use this module to deploy a production-grade VPC on AWS. For full background information on VPCs, check out How to deploy a production-grade VPC on AWS.

Create a wrapper module

The Terraform modules in the Gruntwork Infrastructure as Code Library are intentionally designed to be unopinionated, so they do not configure provider or backend settings. Moreover, you will often use multiple modules from the Infrastructure as Code Library, rather than just one at a time. Therefore, the canonical way to consume a Terraform module from the Gruntwork Infrastructure as Code Library is to create a wrapper module in one of your own Git repos.

Let’s assume you have a repo called infrastructure-modules and create a vpc-app wrapper module in it:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf

Configure your providers

Inside of main.tf, configure whatever Terraform providers you’re using. Since the vpc-app module you’re using in this guide is an AWS module, you’ll need to configure the AWS provider:

infrastructure-modules/networking/vpc-app/main.tf
provider "aws" {
  # The AWS region in which all resources will be created
  region = var.aws_region

  # Require a 2.x version of the AWS provider
  version = "~> 2.6"

  # Only these AWS Account IDs may be operated on by this template
  allowed_account_ids = [var.aws_account_id]
}

This configures the AWS provider as follows:

Use a specific AWS region

The AWS region is configured via the aws_region input variable (you’ll declare this shortly). This allows you to deploy this module in multiple regions.

Pin the AWS provider version

The code above ensures that you always get AWS provider version 2.x and won’t accidentally get version 3.x in the future, which would be backwards incompatible. We recommend pinning the versions for all providers you’re using.

Pin AWS account IDs

The code above will only allow you to run it against the AWS account with ID passed in via the aws_account_id input variable (you’ll declare this shortly). This is an extra safety measure to ensure you don’t accidentally authenticate to the wrong AWS account while deploying this code—e.g., so you don’t accidentally deploy changes intended for staging to production (for more info on working with multiple AWS accounts, see How to Configure a Production Grade AWS Account Structure).

Let’s add the corresponding input variables in variables.tf:

infrastructure-modules/networking/vpc-app/variables.tf
variable "aws_region" {
  description = "The AWS region in which all resources will be created"
  type        = string
}

variable "aws_account_id" {
  description = "The ID of the AWS Account in which to create resources."
  type        = string
}

Configure Terraform

Next, configure Terraform itself in main.tf:

infrastructure-modules/networking/vpc-app/main.tf
terraform {
  # Partial configuration for the backend: https://www.terraform.io/docs/backends/config.html#partial-configuration
  backend "s3" {}

  # Only allow this Terraform version. Note that if you upgrade to a newer version, Terraform won't allow you to use an
  # older version, so when you upgrade, you should upgrade everyone on your team and your CI servers all at once.
  required_version = "= 0.12.6"
}

This configures Terraform as follows:

Configure a backend

The code above configures a backend, which is a shared location where Terraform state can be stored and accessed by your team. You can use any of the supported backends (the example above uses S3, which is a good choice for AWS users). See How to manage Terraform state for more info.

Partial configuration

The backend uses a partial configuration, which means most of the backend configuration (e.g., which S3 bucket and path to use) will be specified from outside of the code. You’ll see an example of this soon.

Pin the Terraform version

The code above will ONLY allow you to run it with a specific Terraform version. This is a safety measure to ensure you don’t accidentally pick up a new version of Terraform until you’re ready. This is important because (a) Terraform is a pre 1.0.0 tool, so even patch version number bumps (e.g., 0.12.60.12.7) are sometimes backwards incompatible or buggy and (b) once you’ve upgraded to a newer version, Terraform will no longer allow you to deploy that code with any older version. For example, if a single person on your team upgrades to 0.12.7 and runs apply, then you’ll no longer be able to use the state file with 0.12.6, and you’ll be forced to upgrade everyone on your team and all your CI servers to 0.12.7. It’s best to do this explicitly, rather than accidentally, so we recommend pinning Terraform versions.

Use the modules from the Gruntwork Infrastructure as Code Library

Now you can pull in the Terraform modules you want from the Gruntwork Infrastructure as Code Library as follows:

infrastructure-modules/networking/vpc-app/main.tf
module "vpc" {
  # Make sure to replace <VERSION> in this URL with the latest module-vpc release
  source = "git@github.com:gruntwork-io/module-vpc.git//modules/vpc-app?ref=<VERSION>"

  aws_region       = var.aws_region
  vpc_name         = var.vpc_name
  cidr_block       = var.cidr_block
  num_nat_gateways = var.num_nat_gateways
}

This code does the following:

Terraform module support

This code pulls in a module using Terraform’s native module functionality. For background info, see How to create reusable infrastructure with Terraform modules.

SSH Git URL

The source URL in the code above uses a Git URL with SSH authentication (see module sources for all the types of source URLs you can use). If you followed the SSH instructions in Get access to the Gruntwork Infrastructure as Code Library, this will allow you to access private repos in the Gruntwork Infrastructure as Code Library without having to hard-code a password in your Terraform code.

Versioned URL

Note the ?ref=<VERSION> at the end of the source URL. This parameter allows you to pull in a specific version of each module so that you don’t accidentally pull in (potentially backwards incompatible code) in the future. You should replace <VERSION> with the latest version from the releases page of the repo you’re using (e.g., here’s the releases page for module-vpc).

Module arguments

Below the source URL, you’ll need to pass in the module-specific arguments. You can find all the required and optional variables defined in vars.tf (old name) or variables.tf (new name) of the module (e.g., here’s the variables.tf for vpc-app). The code above sets these to input variables (which you’ll define shortly) so that you can use different values in different environments.

Let’s add the new input variables in variables.tf:

infrastructure-modules/networking/vpc-app/variables.tf
variable "vpc_name" {
  description = "Name of the VPC. Examples include 'prod', 'dev', 'mgmt', etc."
  type        = string
}

variable "cidr_block" {
  description = "The IP address range of the VPC in CIDR notation. A prefix of /16 is recommended. Do not use a prefix higher than /27. Example: '10.100.0.0/16'."
  type        = string
}

variable "num_nat_gateways" {
  description = "The number of NAT Gateways to launch for this VPC. For production VPCs, multiple NAT Gateways are recommended."
  type        = number
}

You may also want to add useful output variables in outputs.tf:

infrastructure-modules/networking/vpc-app/outputs.tf
output "vpc_name" {
  description = "The VPC name"
  value       = module.vpc.vpc_name
}

output "vpc_id" {
  description = "The VPC ID"
  value       = module.vpc.vpc_id
}

output "vpc_cidr_block" {
  description = "The VPC CIDR block"
  value       = module.vpc.vpc_cidr_block
}

output "public_subnet_cidr_blocks" {
  description = "The CIDR blocks of the public subnets"
  value       = module.vpc.public_subnet_cidr_blocks
}

output "private_app_subnet_cidr_blocks" {
  description = "The CIDR blocks of the private app subnets"
  value       = module.vpc.private_app_subnet_cidr_blocks
}

output "private_persistence_subnet_cidr_blocks" {
  description = "The CIDR blocks of the private persistence subnets"
  value       = module.vpc.private_persistence_subnet_cidr_blocks
}

output "public_subnet_ids" {
  description = "The IDs of the public subnets"
  value       = module.vpc.public_subnet_ids
}

output "private_app_subnet_ids" {
  description = "The IDs of the private app subnets"
  value       = module.vpc.private_app_subnet_ids
}

output "private_persistence_subnet_ids" {
  description = "The IDs of the private persistence subnets"
  value       = module.vpc.private_persistence_subnet_ids
}

Manual tests for Terraform code

Now that the code is written, you may want to test it manually. We recommend testing in a sandbox environment where you can deploy infrastructure without affecting any other environments (especially production!). For example, if you’re using AWS, this should be a separate AWS account.

The easiest way to test is to create a testing/terraform.tfvars file:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf
      └ testing
        └ terraform.tfvars

Inside this file, you can set all the variables for your module to test-friendly values:

infrastructure-modules/networking/vpc-app/testing/terraform.tfvars
aws_region       = "us-east-2"
aws_account_id   = "555566667777"
vpc_name         = "example-vpc"
cidr_block       = "10.0.0.0/16"
num_nat_gateways = 1

You should also add a testing/backend.hcl file:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf
      └ testing
        └ terraform.tfvars
        └ backend.hcl

In this file, you can configure test-friendly settings for your backend. For example, if you’re using the S3 backend, you can specify:

infrastructure-modules/networking/vpc-app/testing/backend.hcl
bucket = "<YOUR-BUCKET-FOR-TESTING>"
key    = "manual-testing/<YOUR-NAME>/terraform.tfstate"
region = "us-east-2"

You can now test manually by authenticating to your sandbox environment (see A Comprehensive Guide to Authenticating to AWS on the Command Line) and running:

cd infrastructure-modules/networking/vpc-app/testing
terraform init -backend-config=backend.hcl ../
terraform apply ../

When you’re done testing, clean up by running:

terraform destroy ../

Automated tests for Terraform code

You may also want to create automated tests for your module. Automated tests for infrastructure code will spin up and tear down a lot of infrastructure, so we recommend a separate testing environment (e.g. yet another AWS account) for running automated tests—separate even from the sandboxes you use for manual testing. You can run a tool like cloud-nuke on a schedule to periodically clean up left-over resources in your testing environment (e.g., delete all resources that are older than 24h).

The only way to build confidence that your infrastructure code works as you expect is to deploy it into a real AWS account. That means you’ll primarily be writing integration tests that:

  1. Run terraform apply to deploy your module

  2. Perform a bunch of validations that the deployed infrastructure works as expected

  3. Run terraform destroy at the end to clean up

In short, you’re automating the steps you took to manually test your module!

You can make it easier to write tests of this format by leveraging Terratest, an open source Go library that contains helpers for testing many types of infrastructure code, including Terraform, Packer, and Docker.

You can define tests for your vpc-app module in a vpc_app_test.go file in a test folder:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf
      └ testing
        └ terraform.tfvars
        └ backend.hcl
  └ test
    └ vpc_app_test.go

Check out the Terratest install instructions for how to configure your environment for Go and install Terratest.

Next, write some test code in vpc_app_test.go that looks like this:

infrastructure-modules/test/vpc_app_test.go
package test

import (
	"testing"

	"fmt"
	"github.com/gruntwork-io/terratest/modules/random"
	"github.com/gruntwork-io/terratest/modules/terraform"
)

func TestVpcApp(t *testing.T) {
	// Run this test in parallel with all the others
	t.Parallel()

	// Unique ID to namespace resources
	uniqueId := random.UniqueId()
	// Generate a unique name for each VPC so tests running in parallel don't clash
	vpcName := fmt.Sprintf("test-vpc-%s", uniqueId)
	// Generate a unique key in the S3 bucket for the Terraform state
	backendS3Key := fmt.Sprintf("vpc-app-test/%s/terraform.tfstate", uniqueId)

	terraformOptions := &terraform.Options {
		// Where the Terraform code is located
		TerraformDir: "../networking/vpc-app",

		// Variables to pass to the Terraform code
		Vars: map[string]interface{}{
			"aws_region":       "us-east-2",
			"aws_account_id":   "111122223333", // ID of testing account
			"vpc_name":         vpcName,
			"cidr_block":       "10.0.0.0/16",
			"num_nat_gateways": 1,
		},

		// Backend configuration to pass to the Terraform code
		BackendConfig: map[string]interface{}{
			"bucket":   "<YOUR-S3-BUCKET>", // bucket in testing account
			"region":   "us-east-2", // region of bucket in testing account
			"key":      backendS3Key,
		},
	}

	// Run 'terraform destroy' at the end of the test to clean up
	defer terraform.Destroy(t, terraformOptions)

	// Run 'terraform init' and 'terraform apply' to deploy the module
	terraform.InitAndApply(t, terraformOptions)
}

The test code above implements a minimal test that does the following:

Configure variables

This is similar to the testing/terraform.tfvars used in manual testing.

Configure the backend

This is similar to the testing-backend.hcl used in manual testing.

Namespace resources

The code uses random.UniqueId() to generate unique identifiers for all the resources in this test. This allows multiple tests to run in parallel (e.g., on your computer, your teammates' computers, CI servers) without running into conflicts (e.g., without conflicts over resources that require unique names, such as VPCs).

Defer terraform destroy

The test code uses defer to schedule terraform.Destroy to run at the end of the test, whether or not the test passes.

terraform init and apply

The test runs terraform init and terraform apply on the module. If this hits any errors, the test will fail.

This is a minimal test that just makes sure your module can deploy and undeploy successfully. This is a great start, and will catch a surprising number of bugs, but for production-grade code, you’ll probably want more validation logic. Check out the real module-vpc tests to see how we validate VPCs by, for example, launching EC2 instances in various subnets and making sure that connections between some subnets work, and others are blocked, based on the networking settings in that VPC.

To run the test, authenticate to your testing environment and do the following:

cd infrastructure-modules/test
go test -v -timeout 45m

Note the use of the -timeout 45m argument with go test. By default, Go imposes a time limit of 10 minutes for tests, after which it forcibly kills the test run, causing the tests to not only fail, but even preventing the cleanup code (i.e., terraform destroy) from running. This VPC test should take closer to ten minutes, but whenever running a Go test that deploys real infrastructure, it’s safer to set an extra long timeout to avoid the test being killed part way through and leaving all sorts of infrastructure still running.

For a lot more information on writing automated tests for Terraform code, see:

  1. Terratest documentation, especially the many examples and corresponding tests in the examples and test folders, respectively, and the testing best practices section.

  2. Terraform: Up & Running, 2nd edition, has an entire chapter dedicated to automated testing for Terraform code, including unit tests, integration tests, end-to-end tests, dependency injection, running tests in parallel, test stages, and more.

Deploying Terraform code

Now that your module has been thoroughly tested, you can deploy it to your real environments (e.g., staging and production). There are many ways to deploy Terraform modules, so in this guide, we’ll focus on just the following ones:

Deploy using plain Terraform

One option is to deploy all of your environments using plain-old-Terraform. The approach is nearly identical to the way you did manual testing; let’s walk through it for the staging environment.

First, create a staging/terraform.tfvars file:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf
      └ testing
        └ terraform.tfvars
        └ backend.hcl
      └ staging
        └ terraform.tfvars
  └ test
    └ vpc_app_test.go

Inside the file, set the variables for this module to values appropriate for this environment:

infrastructure-modules/networking/vpc-app/staging/terraform.tfvars
aws_region       = "us-east-2"
aws_account_id   = "888888888888"
vpc_name         = "staging-vpc"
cidr_block       = "10.10.0.0/16"
num_nat_gateways = 1

Next, create a staging/backend.hcl file:

infrastructure-modules
  └ networking
    └ vpc-app
      └ main.tf
      └ outputs.tf
      └ variables.tf
      └ testing
        └ terraform.tfvars
        └ backend.hcl
      └ staging
        └ terraform.tfvars
        └ backend.hcl
  └ test
    └ vpc_app_test.go

Inside this file, configure the backend for staging:

infrastructure-modules/networking/vpc-app/staging/backend.hcl
bucket         = "<YOUR-BUCKET-FOR-STAGING>"
key            = "networking/vpc-app/terraform.tfstate"
region         = "us-east-2"
encrypt        = true
dynamodb_table = "<DYNAMODB-TABLE-FOR-STAGING>"

And now you can deploy to the staging environment as follows:

cd infrastructure-modules/networking/vpc-app/staging
terraform init -backend-config=backend.hcl ../
terraform apply ../

To deploy to other environments, create analogous .tfvars and .hcl files (e.g., production/terraform.tfvars and production/backend.hcl) and run terraform init and terraform apply with those files.

Benefits of this approach

  • No external tooling required.

  • Analogous to how you run manual and automated tests.

  • Quick feedback cycle.

  • Completely free and open source.

Drawbacks to this approach

  • You’re always deploying "latest" from a branch. No versioning or easy rollback.

  • Lots of command-line arguments to pass. Easy to make mistakes. Most teams end up creating hacky wrapper scripts.

  • Lots of backend configuration to copy for each module. Manually setting a unique key for each module is repetitive and error prone.

  • From a quick glance at the code, it’s not clear what accounts, environments, or regions you deploy to. Figuring this out requires digging through many folders.

Deploy using Terragrunt

Another option is to use Terragrunt, an open source wrapper for Terraform that helps alleviate some of the drawbacks mentioned in the previous approach.

The first step with Terragrunt is to version your code. You can do this by creating Git tags in infrastructure-modules:

cd infrastructure-modules
git tag -a "v0.0.1" -m "Created vpc-app module"
git push --follow-tags

This will allow you to deploy different versions of your module in different environments (e.g., v0.0.1 in prod and v0.0.2 in stage) and rollback to previous versions if necessary. With Terragrunt, we recommend defining your live environments in a separate repo called infrastructure-live that uses a folder structure with the following format:

infrastructure-live
  └ <account>
    └ terragrunt.hcl
    └ _global
    └ <region>
      └ _global
      └ <environment>
        └ <resource>
          └ terragrunt.hcl

Where:

<account>

A the top level, you have accounts (e.g., an AWS account).

<region>

Within each account, there will be one or more regions (e.g., in AWS, us-east-1, eu-west-1, etc). There may also be a _global folder that defines resources that are available across all the regions in this account, such as IAM users and DNS settings. Each account also has a root terragrunt.hcl file that defines common Terraform settings that apply to the entire account, such as what backend to use to store Terraform state.

<environment>

Within each region, there will be one or more environments, such as dev, stage, prod, mgmt, etc. There may also be a _global folder that defines resources that are available across all the environments in this region.

<resource>

Within each environment, you use Terraform modules to deploy one or more resources, such as servers, databases load balancers, and so on. Each module is configured via a terragrunt.hcl file.

For example, if you were using AWS, with separate accounts for staging and production (see How to Configure a Production Grade AWS Account Structure), and you wanted to deploy the vpc-app module in the us-east-2 region in each of these accounts, the folder structure would look like this:

infrastructure-live
  └ staging
    └ terragrunt.hcl
    └ us-east-2
      └ stage
         └ networking
           └ vpc-app
             └ terragrunt.hcl
  └ production
    └ terragrunt.hcl
    └ us-east-2
      └ prod
        └ networking
          └ vpc-app
            └ terragrunt.hcl
Note
Environments within accounts

In the folder structure above, it may seem redundant to have an environment, such as stage, within an account, such as staging. However, in real-world usage, it’s common to have multiple environments within a single account. For example, within the staging account, you may have not just a stage environment, which contains the VPC and some applications, but also a mgmt environment, which contains a separate VPC for running DevOps tooling (e.g., an OpenVPN server).

The terragrunt.hcl in the root of each account defines the backend settings for that account (including special helpers to automatically set the key value). Here’s an example of what staging/terragrunt.hcl might look like:

infrastructure-live/staging/terragrunt.hcl
remote_state {
  backend = "s3"
  config = {
    # Set defaults for all the backend settings for this environment
    bucket         = "<YOUR-BUCKET-FOR-STAGING>"
    region         = "us-east-2"
    encrypt        = true
    dynamodb_table = "<DYNAMODB-TABLE-FOR-STAGING>"

    # Automatically set the key parameter to the relative path between this root terragrunt.hcl file and the child
    # terragrunt.hcl file (e.g., for vpc-app, it'll end up us-east-2/stage/networking/vpc-app/terraform.tfstate).
    key = "${path_relative_to_include()}/terraform.tfstate"
  }
}

The terragrunt.hcl for each child module within an account specifies what module to deploy—including the version to use—and sets the variables to values appropriate for that environment. Here’s an example of what staging/us-east-2/stage/vpc-app/terragrunt.hcl might look like:

infrastructure-live/staging/us-east-2/stage/vpc-app/terragrunt.hcl
# Deploy the vpc-app module at a specific version (via the ref=xxx param)
terraform {
  source = "git@github.com:<ORG>/infrastructure-modules.git//networking/vpc-app?ref=v0.0.1"
}

# Set the variables for the vpc-app module in this environment
inputs = {
  aws_region       = "us-east-2"
  aws_account_id   = "888888888888"
  vpc_name         = "staging-vpc"
  cidr_block       = "10.10.0.0/16"
  num_nat_gateways = 1
}

# Automatically include settings from the root terragrunt.hcl in this account
include {
  path = find_in_parent_folders()
}

To deploy vpc-app in staging, you do the following:

cd infrastructure-live/staging/us-east-2/stage/vpc-app
terragrunt apply

When you run this command, Terragrunt will:

  1. Checkout the infrastructure-modules repo at version v0.0.1 into a scratch directory.

  2. Run terraform init in the scratch directory, configuring the backend to the values in the root terragrunt.hcl.

  3. Run terraform apply in the scratch directory, configuring the variables to the values in the inputs = { …​ } block.

You can deploy the production environment by creating an analogous infrastructure-live/production/us-east-2/prod/vpc-app/terragrunt.hcl file and running terragrunt apply in infrastructure-live/production/us-east-2/prod/vpc-app/. If you have multiple modules and you want to deploy all of them, you can use terragrunt apply-all. For example, to deploy everything in the production account, you would do the following:

cd infrastructure-live/production
terragrunt apply-all

Benefits of this approach

  • All your code is versioned. You can deploy different versions in different environments, and roll back to older versions if necessary.

  • Backend configuration is DRY. No more copy/pasting values, no more manually setting key for each module.

  • CLI arguments are DRY. No need for long commands or extra wrapper scripts.

  • The full structure of all accounts, environments, and regions is visible at a glance in the infrastructure-live repo.

  • Completely free and open source.

  • Deploy multiple modules using apply-all.

Drawbacks to this approach

  • You have to install, learn, and manage a new tool / abstraction layer.

  • The scratch directory can make debugging/troubleshooting tricky.

Updating

Now that you have your Terraform module deployed, you can pull in updates as follows:

  1. Subscribe to the monthly Gruntwork Newsletter to be notified of all updates to the Gruntwork Infrastructure as Code Library. Alternatively, you can "watch" repos in GitHub that you’re interested in.

  2. When you find an update you’d like for a specific module, update any code using that module in infrastructure-modules to the new version number. For example, if you were using module-vpc at v0.7.2 and you wanted to update to v0.7.3, you would change from:

    module "vpc" {
      source = "git@github.com:gruntwork-io/module-vpc.git//modules/vpc-app?ref=v0.7.2"
      # ...
    }

    to:

    module "vpc" {
      source = "git@github.com:gruntwork-io/module-vpc.git//modules/vpc-app?ref=v0.7.3"
      # ...
    }
  3. Pay close attention to the release notes for any additional instructions. In particular, if the MINOR version number was increased (e.g., v0.6.0v0.7.0), that implies a backwards incompatible change, and the release notes will explain what you need to do (e.g., you might have to add, remove, or change arguments you pass to the module).

  4. Tests your changes locally. You do this using the same process outlined in Manual tests for Terraform code and Automated tests for Terraform code.

  5. Deploy your changes to each environment. You do this using the same process outlined in Deploying Terraform code.

Integrating with Terraform Cloud and Enterprise

This section will cover how to use Gruntwork in conjunction with two popular HashiCorp products: Terraform Cloud (TFC) and Terraform Enterprise (TFE). Although the open source edition of Terraform is quite powerful and flexible as a standalone project, many organizations turn to TFC/TFE for the CLI/UI integration, approval-based workflow capabilities, Sentinel policy framework, and more. At its core, Terraform Enterprise is basically Terraform Cloud repackaged for a self-hosted environment. We’ll use "TFC" as short hand for both Terraform Cloud and Enterprise throughout this guide.

In the sections above, we discussed how to use Gruntwork with two VCS repositories: infrastructure-modules, containing your Terraform code that wraps the modules from Gruntwork Library, and infrastructure-live, containing Terragrunt configurations that enable you to manage Terraform easily across multiple accounts and environments. When using Gruntwork with TFC, you have two choices regarding these repositories:

Use TFC without Terragrunt

Using this approach, Terraform modules are still defined in an infrastructure-modules repository as discussed above. Each module has a dedicated TFC workspace. You create the workspace in advance, and then you can run the Terraform modules either from the terraform CLI or from the TFC UI. In essence, TFC replaces Terragrunt and infrastructure-live. You’ll be able to use the full TFC feature set, but you won’t be able to take advantage of the benefits provided by Terragrunt, such as keeping your code and variables, CLI args, etc DRY, and adding support for applying changes across multiple modules / environments.

Use TFC with Terragrunt

Alternatively, you can use both infrastructure-modules and infrastructure-live repositories as described above, storing the wrapper modules in infrastructure-modules, and using infrastructure-live and Terragrunt for deployments. In this approach, TFC is used as a remote backend for Terraform. You use Terragrunt to run deployments from the CLI, which in turn invokes Terraform on the TFC backend. The TFC UI is used for audit and tracking capabilities, but not for executing Terraform runs.

We cover each approach below, but first, you’ll need to complete a few one time set up steps.

One time set up

Important
If you’re using TFE, you’ll need to follow HashiCorp’s Enterprise installation instructions first. Most of the details below still apply to TFE, but you should adjust any URLs, such as app.terraform.io, to use your TFE installation instead.
Create an account

The first step is to establish an account. Visit the Terraform Cloud sign up page to create a free account.

tfc sign up
Figure 7. Sign up for a Terraform Cloud account
Create an organization

Once the account is established, you’ll verify your email address. It’s safe to skip the initial set up instructions. After doing so, you’ll be presented with the option to create a new organization. We suggest using lowercase organization names, optionally with - or _ characters, but without whitespace.

tfc create organization
Figure 8. Create an organization in TFC
Set up an SSH key

When you’re using Gruntwork modules, you use the source attribute on a module block to read a module from a Gruntwork code repository. As a simple example, if you wanted to create an SQS queue using the SQS module from package-messaging, you might create something like the following:

provider "aws" {
 region = "us-east-1"
}

module "sns" {
  source = "git::git@github.com:gruntwork-io/package-messaging.git//modules/sqs?ref=v0.3.2"
  name   = "my-queue"
}

The git::git@github.com:gruntwork-io portion of the source attribute indicates that this module is accessed over SSH. Thus, TFC will need access to the Gruntwork code repositories via SSH.

To set up this access, take the following steps:

  1. First, if you don’t have one already, you’ll need a machine user that has access to Gruntwork. A machine user is an account that is only used for automation, and is not used by humans. In this case, the "machine" in question is the TFC executor. Create a new Github user, and send the machine user’s username and email address to support@gruntwork.io. We’ll make sure the user has access to our repositories.

  2. Next, generate an SSH key pair, and add the public key to the new GitHub machine user. GitHub has easy-to-follow instructions.

  3. Now, add the private SSH key to TFC. You’ll find the option under SSH Keys in the TFC organization settings. We called ours Gruntwork access. TFC will use this key to clone Gruntwork code repositories.

tfc ssh key
Figure 9. Configuring an SSH key for the TFC organization

With the SSH key in place, the one time set up is complete.

Using TFC without Terragrunt

This section will demonstrate how to use TFC to deploy your infrastructure, leveraging Gruntwork’s Terraform modules to do the heavy lifting. With this method, you can run Terraform from your local CLI or from the TFC UI.

Workspaces store the state associated with the infrastructure that is managed by Terraform. The state is stored in TFC. You can use the remote_state data source to use the outputs of one workspace as the variable inputs for another workspace. In this manner, you can link multiple workspaces together to build an end-to-end infrastructure.

In the Using Terraform Modules section, we discussed the wrapper module pattern in which multiple Terraform modules are contained in a hierarchy of directories located under infrastructure-modules/modules. Using such a hierarchy, each workspace will use the same infrastructure-modules repository, but pointed at different subdirectories within the repository.

We’ll demonstrate how to set up a workspace for a simple SQS module. To get started, create a new workspace and connect in to your VCS provider.

Connect to a version control provider

Connect the workspace to the version control system of your choice. For example, GitHub, GitLab, or Bitbucket. This allows TFC to access your infrastructure-modules repository. Once your VCS is connected, select your infrastructure-modules repository from the list of repositories presented.

tfc create workspace
Figure 10. Connect a workspace to a VCS
Configure the workspace settings

In this step, you’ll choose a name and configure additional settings relating to the workspace. Be sure to follow the workspace naming guidelines to ensure a consistent yet flexible naming convention. For this example, we’ll choose sqs-example-dev-us-east-1.

We also need to inform TFC where to find the correct module. Under the Advanced options, locate the Terraform Working Directory field, and enter the path to the module. TFC will automatically enter the same value for Automatic Run Triggering to ensure that Terraform runs for this workspace only occur when changes are detected to the module path. For our SQS example, we enter the value /modules/networking/sqs.

tfc workspace settings
Figure 11. Configure workspace settings
Use the SSH key

In the one time set up steps described above, you created an SSH key. You’ll now need to choose this SSH key in the workspace settings.

tfc workspace ssh
Figure 12. Choose the private SSH key
Configure credentials and variables

The Terraform AWS provider used by the SQS module will need credentials to communicate with the AWS API. You can set this up using the standard AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables in the workspace. When setting the variables, be sure to mark the Sensitive checkbox so that the variables are write only.

tfc env vars
Figure 13. Configure AWS API credentials as environment variables in the workspace


You can also enter values for any Terraform variables for the module. For our simple SQS module we don’t define any variables.

Queue and apply the configuration

With all the configuration complete, it’s time to kick off the plan and apply. You can trigger the run by making a commit to a file in the working directory that you set up when configuring workspace settings (in our case, in /modules/networking/sqs), or by manually triggering the run using the Queue plan button in the TFC UI. The run will:

  • Clone your infrastructure-modules repository using the VCS connection

  • Download the AWS provider and set credentials using the environment variables

  • Download the Gruntwork SQS module using the SSH key

  • Run a terraform plan

  • Wait for confirmation

  • Once confirmed, run a terraform apply

tfc run
Figure 14. TFC run results
Final thoughts on integrating TFC with the Gruntwork library

It’s easy to use TFC with the Gruntwork library. When using the infrastructure-modules approach outlined in this guide, all of your Terraform wrapper modules will be in one place. You can configure one workspace per module, and you can link modules together with the remote_state data source. Note that you’ll need to set up the AWS credentials and SSH key within each workspace.

You can use this pattern for any of Gruntwork’s Terraform modules. This even works for modules like terraform-aws-eks which need to download external dependencies (like kubergrunt). Those modules use the special executable-dependency module to install the external dependency within the executor (the TFC node, in this case) at run time. In the future, Gruntwork may offer modules through a private Terraform registry to remove the SSH key requirement, further simplifying the process.

Once the workspace is set up, you can trigger runs through the UI or from the terraform CLI. To use the CLI, follow the CLI-driven Run Workflow instructions.

Using TFC with Terragrunt

The TFC UI runs only Terraform commands. Terragrunt expects you to run it, and it, in turn, will run Terraform. Therefore, it currently is not possible to use the TFC UI to trigger Terragrunt. However, you can set up Terraform to run remote operations such as plan and apply. That is, you use Terragrunt to organize your code and keep it DRY, and you can configure it so that when you run terragrunt apply on your computer (or on a CI server), it runs terraform apply in TFC rather than on your local system.

You’ll still need workspaces, as described in the sections above on using TFC without Terragrunt. You can create the workspace automatically or re-use a workspace that already exists, including variables and env vars already set in it, such as your AWS access keys, so you don’t need to have those locally. You’ll also be able to see the apply output in the web UI (but not trigger it from the web UI) and see the history of all apply calls in the web UI.

The steps involved to set all this up include:

  • Obtain an API token to permit Terraform to use TFC

  • Generate a backend.tf file to configure Terraform to point at TFC as a remote backend

  • Set up workspaces for each module managed by Terragrunt

  • Set input variables

Before digging in, we’ll review an example Terragrunt configuration.

Setting up

For this example, we’ll once again deploy an SQS queue from Gruntwork package-messaging repository.

Consider the following directory structure for an infrastructure-live repository:

.
├── dev
│   ├── account.hcl
│   └── us-east-1
│       ├── region.hcl
│       └── sqs
│           └── terragrunt.hcl
├── common.hcl
└── terragrunt.hcl

For simplicity, the example shows a single environment, dev, the sqs module, but this pattern could be expanded to more environments and regions as needed.

Obtain an API token

Terraform needs an API token to communicate with TFC as a backend. To obtain a token, open the user settings, and click on Tokens. If you’re running Terragrunt on a CI system, this token should be associated with a machine user rather than an individual user account.

tfc token
Figure 15. Creating a TFC API token


Now you’ll add the token to your ~/.terraformrc file in a credentials block.

# For TFE, substitute the custom hostname for your TFE host
credentials "app.terraform.io" {
  token = "xxxxxxyyyyyyyyyzzzzzzzzzzzz"
}
Generating the backend

For Terragrunt to use TFC as a remote executor, you’ll need a backend block. The backend is what determines where Terraform reads state and how execution is handled. The Terragrunt generate block can generate the backend block dynamically so you don’t need to include one in each module.

Terraform workspaces store the state associated with a given backend. In the Terragrunt configuration hierarchy defined above, each module (just sqs in this case) uses a separate workspace and must have its own name in the backend block. Therefore, we’ll need to gather all the component parts of our chosen workspace naming convention to use when setting up the backend.

Following the advice from HashiCorp, we’ll need:

  • The name of the component, such as sqs

  • The name of the environment, such as dev

  • The region, such as us-east-1

The snippet below shows one technique for gathering the requisite information and generating the backend:

infrastructure-live/terragrunt.hcl
locals {
  tfc_hostname     = "app.terraform.io" # For TFE, substitute the custom hostname for your TFE host
  tfc_organization = "gruntwork-io"
  workspace        = reverse(split("/", get_terragrunt_dir()))[0] # This will find the name of the module, such as "sqs"
  account_vars     = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  region_vars     = read_terragrunt_config(find_in_parent_folders("region.hcl"))
}

generate "remote_state" {
  path      = "backend.tf"
  if_exists = "overwrite_terragrunt"
  contents = <<EOF
terraform {
  backend "remote" {
    hostname = "${local.tfc_hostname}"
    organization = "${local.tfc_organization}"
    workspaces {
      name = "${local.workspace}-${local.account_vars.locals.account}-${local.region_vars.locals.region}"
    }
  }
}
EOF
}

This code shows off a few features of Terragrunt:

  • The generate block creates a file called backend.tf that will be created alongside the other *.tf files for the module that Terragrunt is operating on. This effectively adds a dynamically generated backend configuration to the module.

  • Terragrunt has a set of built-in functions, like find_in_parent_folders(), read_terragrunt_config(), and get_terragrunt_dir(). These functions are used to obtain values need to configure the backend.

  • Terragrunt can also use all of the native Terraform functions. In the configuration above, we’re using split() and reverse() to obtain the leaf directory name (such as sqs) that is used as the workspace suffix.

This configuration is at the root of the Terragrunt configuration tree so that it can be reused across each module, reducing code duplication. Note the use of get_terragrunt_dir() to obtain the workspace name. This function retrieves the current Terragrunt directory. If you run terragrunt apply from the sqs module, the value returned by get_terragrunt_dir() will be the fully qualified filesystem path to the sqs subdirectory. Although the function is in the root terragrunt.hcl, it runs in the context of the directory where terragrunt is invoked.

Creating a workspace and setting provider credentials

By default, if you configure the remote backend for a workspace that doesn’t yet exist, TFC will automatically create one when you run terraform init. This is known as implicit workspace creation. Once the workspace exists, your Terraform code will need API credentials to interface with the cloud provider.

There are a few ways to set these credentials:

  1. Create all the workspaces manually in advance, and set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in each workspace, as described in Configure credentials and variables.

  2. Create all the workspaces manually by running terragrunt init, and still set up the environment variables as previously mentioned.

  3. To set this up programmatically, you can use the tfe_workspace and tfe_variable resources to configure the workspaces with Terraform.

In all cases, you’ll need to ensure that your workspaces stay in sync with your Terragrunt configuration. Each time you add a new module in Terragrunt, you’ll need a corresponding workspace. Furthermore, if you rotate your AWS API keys, you’ll need to update them within each workspace. For that reason, the final option above is recommended.

Setting variables

In typical Terragrunt usage, variables are passed to Terraform using the inputs attribute. Anything defined in the inputs will be passed to Terraform using the environment variable mechanism. However, this mechanism is not supported by TFC. Instead, you can generate a *.auto.tfvars file containing the inputs.

Important
The generated terragrunt.auto.tfvars file will be present on local disk. Depending upon your configuration, this file may contain sensitive data. Do not commit this file to version control. We recommend adding the *.auto.tfvars to .gitignore.

The following code shows how to generate a terragrunt.auto.tfvars file.

infrastructure-live/dev/us-east-1/sqs/terragrunt.hcl
terraform {
  source = "git::ssh://git@github.com/gruntwork-io/package-messaging//modules/sqs?ref=v0.3.2"
}

include {
  path = find_in_parent_folders()
}

locals {
  common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
}

generate "tfvars" {
  path      = "terragrunt.auto.tfvars"
  if_exists = "overwrite"
  disable_signature = true
  contents = <<-EOF
name = "${local.common_vars.locals.name}"
EOF
}

The configuration has a few sections:

  1. The terraform block at the top uses the Gruntwork sqs module from package-messaging.

  2. The include block includes the configuration from the parent directories. This is how the remote backend block from the root terragrunt.hcl is included.

  3. The locals block reads the values from common.hcl in the root of the hierarchy, making them available for local reference.

  4. Finally, the generate block creates a file called terragrunt.auto.tfvars. Like the backend.tf file, this file will be generated alongside the rest of the *.tf files that Terragrunt downloads from the sqs module, making those inputs available for TFC to read when running terraform commands in the remote executor.

Any of the inputs needed by the module must be included in the generated tfvars file. In the configuration above, only the name variable is specified. Most modules will need more configuration.

Running Terragrunt

With all the pieces in place, you can run terragrunt init to initialize the workspace, if it isn’t already there, and then terragrunt apply, and watch as Terragrunt invokes Terraform, which executes the plan and apply stages on TFC, pausing in between to allow confirmation. Note that when running a terragrunt apply-all, Terragrunt adds the -auto-approve flag to Terraform to skip interactive approval. This means that with apply-all, there will be no confirmation step.

First, we run terragrunt init. Terragrunt generates the backend configuration, the tfvars file, and connects to the remote. Irrelevant details have been omitted from the output.

$ terragrunt init
[terragrunt] 2020/05/15 14:36:54 Reading Terragrunt config file at <redacted>/infrastructure-live/dev/us-east-1/sqs/terragrunt.hcl
... [snip] ...
[terragrunt] 2020/05/15 14:36:55 Generated file <redacted>/infrastructure-live/dev/us-east-1/sqs/.terragrunt-cache/m6q2Wdn6A9TxoqahZZQpvrPqzAU/owCGC2BufzNDW7KyzOE-oDrwatg/modules/sqs/backend.tf.
[terragrunt] 2020/05/15 14:36:55 Generated file <redacted>/infrastructure-live/us-east-1/sqs/.terragrunt-cache/m6q2Wdn6A9TxoqahZZQpvrPqzAU/owCGC2BufzNDW7KyzOE-oDrwatg/modules/sqs/terragrunt.auto.tfvars.
[terragrunt] 2020/05/15 14:36:55 Running command: terraform init

Initializing the backend...
Backend configuration changed!

Terraform has detected that the configuration specified for the backend
has changed. Terraform will now check for existing state in the backends.

Successfully configured the backend "remote"! Terraform will automatically
use this backend unless the backend configuration changes.

Next, we run terragrunt apply:

$ terragrunt apply
Running apply in the remote backend. Output will stream here. Pressing Ctrl-C
will cancel the remote apply if it's still pending. If the apply started it
will stop streaming the logs, but will not stop the apply running remotely.

Preparing the remote apply...

To view this run in a browser, visit:
https://app.terraform.io/app/gruntwork-io/sqs-dev-us-east-1/runs/run-esQKC9ATKPnDywN1

Waiting for the plan to start...

Terraform v0.12.24
Configuring remote state backend...
Initializing Terraform configuration...
2020/05/15 21:41:31 [DEBUG] Using modified User-Agent: Terraform/0.12.24 TFC/ba6190e398
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

Terraform will perform the following actions:

  # aws_sqs_queue.queue will be created
  + resource "aws_sqs_queue" "queue" {
      + arn                               = (known after apply)
      + content_based_deduplication       = false
      + delay_seconds                     = 0
      + fifo_queue                        = false
      + id                                = (known after apply)
      + kms_data_key_reuse_period_seconds = 300
      + max_message_size                  = 262144
      + message_retention_seconds         = 345600
      + name                              = "example-name"
      + policy                            = (known after apply)
      + receive_wait_time_seconds         = 0
      + visibility_timeout_seconds        = 30
    }

Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions in workspace "sqs-dev-us-east-1"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: <enter yes>

aws_sqs_queue.queue: Creating...
aws_sqs_queue.queue: Creation complete after 0s [id=https://sqs.us-east-1.amazonaws.com/0123456789012/example-name]
data.aws_iam_policy_document.limit_queue_access_by_ip_address: Refreshing state...

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

queue_arn = arn:aws:sqs:us-east-1:0123456789012:example-name
queue_name = example-name
queue_url = https://sqs.us-east-1.amazonaws.com/0123456789012/example-name

TFC runs a plan first, waits for confirmation, and then runs apply. The confirmation can be entered either on the command line or in the UI. Once complete, the results are visible in the TFC UI:

tfc terragrunt results
Figure 16. Viewing the Terragrunt command results in the TFC UI


Other commands, such as destroy, work in the same way.

Summarizing Terragrunt and TFC/TFE compatibility

Using the combination of features above, you can effectively use Terragrunt with TFC or TFE as a remote backend. Terragrunt can generate backend blocks and tfvars files. Commands like terragrunt apply-all will operate in the same way as they do without TFC/TFE, calling each module dependency in order, and allowing for passing outputs between modules. Workspaces must be created in advance so that you can set up credentials for access to the cloud.

Happy Terragrunting!

Using scripts and binaries

This section will show you how to use scripts and binaries from the Gruntwork Infrastructure as Code Library. As illustrative examples, we’ll deploy the ip-lockdown script and ssh-grunt binary from module-security.

Important
You must be a Gruntwork subscriber to access module-security.
ip-lockdown

ip-lockdown is a Bash script you can use to lock down specific outgoing IP addresses on a Linux server so only whitelisted OS users can access them. The main motivation is to lock down the metadata endpoint on EC2 instances so only specific users (e.g., root) can access it—and thereby access the attached IAM permissions—rather than just any user who happens to get access to the EC2 instance.

ssh-grunt

ssh-grunt is a binary (compiled from Go code) that allows you to manage SSH access to your EC2 instances using an identity provider such as AWS IAM or any SAML-based identity provider (e.g., Active Directory, Google, Okta, etc.). This allows you to grant access to specific EC2 instances by putting developers into certain groups (e.g., into specific IAM groups or AD groups) and for each developer to SSH to EC2 instances using their own username and SSH key.

Let’s walk through how to use these two tools.

Gruntwork Installer

The easiest way to install scripts and binaries from the Gruntwork Infrastructure as Code Library is to use the Gruntwork Installer. This is an open source Bash script that allows you to install scripts and binaries via a one-liner, similar to apt-get install or yum install (if you’re curious why we don’t use apt or yum directly, see Gruntwork Installer Motivation).

The first step is to install the Gruntwork Installer itself, which you can do as follows (make sure to replace <VERSION> below with the latest version from the Gruntwork Installer releases page):

curl -LsS https://raw.githubusercontent.com/gruntwork-io/gruntwork-installer/master/bootstrap-gruntwork-installer.sh | bash /dev/stdin --version <VERSION>

Now you can install any script XXX at version YYY from repo ZZZ of the Gruntwork Infrastructure as Code Library as follows:

gruntwork install --module-name XXX --tag YYY --repo ZZZ

And you can install any binary XXX at version YYY from repo ZZZ of the Gruntwork Infrastructure as Code Library as follows:

gruntwork install --binary-name XXX --tag YYY --repo ZZZ

Note that if repo ZZZ is a private GitHub repo (which module-security is), you’ll need to authenticate by setting your GitHub personal access token as the environment variable GITHUB_OAUTH_TOKEN:

export GITHUB_OAUTH_TOKEN=(YOUR_TOKEN)

Once you’ve set this token, to install the ip-lockdown script and ssh-grunt binary, you run the following (make sure to replace <VERSION> with the latest version from the module-security releases page):

gruntwork install \
  --module-name ip-lockdown \
  --tag <VERSION> \
  --repo https://github.com/gruntwork-io/module-security

gruntwork install \
  --binary-name ssh-grunt \
  --tag <VERSION> \
  --repo https://github.com/gruntwork-io/module-security

Now that you know how to use Gruntwork Installer, let’s talk about where to use it. Your goal is to install the scripts and binaries onto your servers (e.g., EC2 instances, compute instances). To do this, we recommend following Immutable infrastructure practices, and creating versioned, immutable images using Packer and/or Docker, as described in the next two sections.

Building images with Packer

If you want to build machine images with scripts and binaries from the Gruntwork Infrastructure as Code Library, such as an Amazon Machine Image (AMI) for use in AWS, you can use Packer.

Packer basics

Here’s an example of how to use gruntwork-install in a Packer template:

packer-example.json
{
  "variables": {
    "github_auth_token": "{{env `GITHUB_OAUTH_TOKEN`}}"
  },
  "builders": [{
    "type": "amazon-ebs",
    "ami_name": "packer-example-{{timestamp}}",
    "region": "us-east-2",
    "source_ami_filter": {
      "filters": {
        "virtualization-type": "hvm",
        "architecture": "x86_64",
        "name": "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*",
        "block-device-mapping.volume-type": "gp2",
        "root-device-type": "ebs"
      },
      "owners": [
        "099720109477"
      ],
      "most_recent": true
    },
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu"
  }],
  "provisioners": [{
    "type": "shell",
    "inline": [
      "curl -LsS https://raw.githubusercontent.com/gruntwork-io/gruntwork-installer/master/bootstrap-gruntwork-installer.sh | bash /dev/stdin --version v0.0.22",
      "gruntwork install --module-name ip-lockdown --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security",
      "gruntwork install --binary-name ssh-grunt --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security"
    ],
    "environment_vars": [
      "GITHUB_OAUTH_TOKEN={{user `github_auth_token`}}"
    ]
  }]
}

A few things to notice about this Packer template:

  • The type of the builder is amazon-ebs, so this Packer template can be used to build an AMI.

  • The source_ami_filter is configured to find the latest Ubuntu 18.04 AMI to use as the base for this image.

  • The GitHub personal access token is read from the environment variable GITHUB_OAUTH_TOKEN (it is NOT hard-coded into the Packer template itself, as you should never store secrets in plain text) and passed to the shell provisioner.

  • The shell provisioner installs the Gruntwork Installer and then uses it to install ip-lockdown and ssh-grunt at specific versions.

Manually testing a Packer template

To build a machine image from a Packer template, you authenticate to a sandbox environment for testing and run:

export GITHUB_OAUTH_TOKEN=(YOUR_TOKEN)
packer build packer-example.json

At the end of the build, Packer will output the ID of your new image (e.g., the new AMI ID). You can then test the image works as you expect by deploying it into a sandbox environment. You can deploy manually or via an automated process, as described in Deploying Packer images.

Automated tests for a Packer template

Just as with Automated tests for Terraform code, you can use Terratest to create automated tests for your Packer template. These are typically integration tests that:

  1. Build an artifact using Packer.

  2. Deploy the artifact on top of real infrastructure (e.g., deploy the artifact onto real EC2 instances in AWS). Often, this is done by using Terraform code, passing in the artifact ID as an input variable.

  3. Validate infrastructure works as expected.

  4. Undeploy the infrastructure.

  5. Delete the artifact.

Take a look at terraform_packer_example_test.go for an example of an automated test that implements this exact pattern. Check out the Automated tests for Terraform code section for instructions on how to write and run tests written with Terratest.

Using Docker for faster testing

Building machine images in the cloud (i.e., in AWS) can take a while, as you have to fire up a virtual server, run your provisioners, snapshot the server, shut it down, etc. This can add a few minutes of overhead to each Packer build. One way to speed it up for local testing is to add a Docker builder to your Packer template that uses the same base OS image. This will allow you to run the build completely locally and run the resulting image locally too (see Manually testing a Docker image), both of which can dramatically improve iteration speed. Once the Docker image is working fully, you can then build the machine image you want, with most of the errors in your template already resolved.

See Iterating locally using Docker for more details.

Deploying Packer images

Now that your Packer template has been thoroughly tested, you can deploy machine images to your real environments (e.g., staging and production). There are many ways to deploy machine images. We recommend using Terraform modules, as explained in Using Terraform Modules. For example, you can use the following modules, which allow you to specify the machine image (i.e., AMI) to deploy via an input variable (e.g., ami_id):

  • AWS Auto Scaling Group: use the asg-rolling-deploy module for stateless apps that need auto scaling, auto healing, and zero-downtime rolling deployment; use the server-group module for stateful apps that need auto healing, zero-downtime rolling deployment, and persistent EBS volumes and ENIs. Both of these modules are part of module-asg.

    Important
    You must be a Gruntwork subscriber to access module-asg.
  • Single EC2 Instance: use the single-server module to run a single EC2 instance that can attach persistent EBS volumes and ENIs. This module is part of module-server.

    Important
    You must be a Gruntwork subscriber to access module-server.

    Note that a single server is a single point of failure, so we generally recommend using one of the Auto Scaling Group modules instead, even just for one server, so you get auto healing and zero-downtime deployment.

Updating Packer templates

Now that you have your Packer images deployed, you can pull in updates as follows:

  1. Subscribe to the monthly Gruntwork Newsletter to be notified of all updates to the Gruntwork Infrastructure as Code Library. Alternatively, you can "watch" repos in GitHub that you’re interested in.

  2. When you find an update you’d like for a specific module, update any code using that module in your Packer templates to the new version number. For example, if you were using module-security at v0.18.3 and you wanted to update to v0.18.4, you would change from:

    "gruntwork install --module-name ip-lockdown --tag v0.18.3 --repo https://github.com/gruntwork-io/module-security"

    to:

    "gruntwork install --module-name ip-lockdown --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security"
  3. Pay close attention to the release notes for any additional instructions. In particular, if the MINOR version number was increased (e.g., v0.17.0v0.18.0), that implies a backwards incompatible change, and the release notes will explain what you need to do (e.g., you might have to add, remove, or change arguments you pass to the module).

  4. Tests your changes locally. You do this using the same process outlined in Manually testing a Packer template and Automated tests for a Packer template.

  5. Deploy your changes to each environment. You do this using the same process outlined in Deploying Packer images.

Building images with Docker

If you want to build Docker images with scripts and binaries from the Gruntwork Infrastructure as Code Library, you can create a Dockerfile.

Dockerfile basics

Here’s an example of how to use gruntwork-install in a Dockerfile:

Dockerfile
# This is an intermediate stage. We can safely pass secrets into it, as this stage will be thrown away.
FROM ubuntu:18.04 as intermediate

# You must set a GitHub personal access token as a build arg. This will be used to access the private gruntwork-io
# GitHub repos
ARG GITHUB_OAUTH_TOKEN
RUN if [ -z "$GITHUB_OAUTH_TOKEN" ]; then echo "ERROR: You must set GITHUB_OAUTH_TOKEN as a Docker build arg."; exit 1; fi

RUN curl -LsS https://raw.githubusercontent.com/gruntwork-io/gruntwork-installer/master/bootstrap-gruntwork-installer.sh | bash /dev/stdin --version v0.0.22 && \
    gruntwork install --module-name ip-lockdown --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security && \
    gruntwork install --binary-name ssh-grunt --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security

# This is the real Docker image that will be created in the end. It just carefully copies code from the intermediate.
# Note that the secrets from the intermediate will NOT be copied and published.
FROM ubuntu:18.04

# Copy just what we need from the intermediate image
COPY --from=intermediate /usr/local/bin/ip-lockdown /usr/local/bin/ip-lockdown
COPY --from=intermediate /usr/local/bin/ssh-grunt /usr/local/bin/ssh-grunt

A few things to notice about this Packer template:

  • We are using a multi-stage build. This ensures that the GitHub personal access token, which is passed in as a build argument, does not get stored in the final Docker image.

  • The GitHub personal access token is passed in as a Docker build arg to the first stage. This stage is therefore able to run gruntwork-install against private repos.

  • The second stage copies just the installed scripts/binaries from the first stage. Note that none of the build arguments get copied, so they will NOT end up in the final Docker image.

Building a Docker image

To build a Docker image from a Dockerfile, you run:

docker build --build-arg GITHUB_OAUTH_TOKEN=<YOUR_TOKEN> -t <IMAGE_NAME>:<IMAGE_TAG> .

where:

YOUR_TOKEN

Your GitHub personal access token.

IMAGE_NAME

The name to use for the image. When using Docker Hub as a Docker Registry (more on registries below), this is typically of the format <ORG>/<NAME>, where ORG is your organization name and NAME is the name for this image (e.g., gruntwork-io/example-image). When using ECR as a Docker Registry, this will be a URL of the form <ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/<NAME>, where ACCOUNT_ID is your AWS account ID, REGION is the AWS region where the ECR repo lives, and NAME is the name for this image (e.g., 111122223333.dkr.ecr.us-east-2.amazonaws.com/example-image).

IMAGE_TAG

An optional tag to add to the image. This is often used for version numbers (e.g., v1).

Manually testing a Docker image

To manually test a Docker image, you run:

docker run <IMAGE_NAME>:<IMAGE_TAG>

Where IMAGE_NAME and IMAGE_TAG are the name and tag for the Docker image you built earlier, as defined in Building a Docker image. This will run the default command (CMD) defined in the Dockerfile. Sometimes, you may want to override this command. For example, to fire up a Bash prompt in your Docker image, you can typically run the following:

docker run -it <IMAGE_NAME>:<IMAGE_TAG> bash

Note the -it flags to enable an interactive TTY so you can enter commands at the bash prompt.

If your Docker image needs to listen on a port, you can map this port to a port number on your host OS using the -p flag:

docker run -p 80:8080 <IMAGE_NAME>:<IMAGE_TAG>

# In another terminal
curl localhost:8080

If you find yourself running a lot of Docker commands, and having to remember lots of command-line arguments and environment variables to set for each one, look into using Docker Compose, which allows you to define one or more Docker containers to run and all the parameters for each one in a YAML file, and then run everything with a single docker-compose up.

Automated tests for Docker images

Just as with Automated tests for Terraform code, you can use Terratest to create automated tests for your Docker images. These are typically integration tests that:

  1. Build a Docker image.

  2. Deploy the Docker image, either by running docker run, docker-compose up, or deploying it via a Docker orchestration tool such as Kubernetes (more on this in Deploying Docker images).

  3. Validate the Docker image works as expected.

  4. Undeploy the Docker image.

Take a look at kubernetes_basic_example_service_check_test.go for an example of an automated test that implements this exact pattern. Check out the Automated tests for Terraform code section for instructions on how to write and run tests written with Terratest.

Publishing Docker images to a registry

To be able to use your Docker images beyond a single computer, you can publish them to a Docker Registry. The most popular options are:

The general workflow is:

  1. Login to your Docker Registry: docker login. Check your Registry’s documentation for authentication details.

  2. Build your Docker image: docker build -t <IMAGE_NAME>:<IMAGE_TAG> .

  3. Push your Docker image to the Registry: docker push <IMAGE_NAME>:<IMAGE_TAG>.

Deploying Docker images

Now that your Docker image has been thoroughly tested and pushed to a Docker Registry, you can deploy it to your real environments (e.g., staging and production). There are many ways to deploy Docker images, so in this guide, we’ll highlight just the following ones:

Kubernetes

To deploy your Docker image to Kubernetes, you’ll first need to set up a Kubernetes cluster. For AWS, we recommend using terraform-aws-eks to deploy an EKS cluster.

Important
You must be a Gruntwork subscriber to access terraform-aws-eks.

Once you have a working cluster, there are a number of ways to deploy Docker images. The basic version is to run kubectl apply and pass it a YAML manifest. However, most teams find this too verbose and repetitive, so a more DRY option is to use something like Helm. To run Helm, we recommend using terraform-kubernetes-helm and helm-kubernetes-services.

ECS

To deploy your Docker image to ECS, you’ll first need to set up an ECS cluster, which you can do using the ecs-cluster module in module-ecs.

Important
You must be a Gruntwork subscriber to access module-ecs.

Once you have a working cluster, you can deploy your Docker image using one of the ecs-service-xxx modules in module-ecs, such as ecs-service-with-alb or ecs-daemon-service.

Fargate

To deploy a Docker image using Fargate, you’ll first need to set up a Fargate cluster. You can do this by directly using the aws_ecs_cluster Terraform resource, as Fargate will manage all the worker nodes for you. Once you’ve created the cluster, you can deploy your image using the ecs-fargate module in module-ecs.

Important
You must be a Gruntwork subscriber to access module-ecs.
Updating Docker images

Now that you have your Docker images deployed, you can pull in updates as follows:

  1. Subscribe to the monthly Gruntwork Newsletter to be notified of all updates to the Gruntwork Infrastructure as Code Library. Alternatively, you can "watch" repos in GitHub that you’re interested in.

  2. When you find an update you’d like for a specific module, update any code using that module in your Dockerfiles to the new version number. For example, if you were using module-security at v0.18.3 and you wanted to update to v0.18.4, you would change from:

    RUN gruntwork install --module-name ip-lockdown --tag v0.18.3 --repo https://github.com/gruntwork-io/module-security

    to:

    RUN gruntwork install --module-name ip-lockdown --tag v0.18.4 --repo https://github.com/gruntwork-io/module-security
  3. Pay close attention to the release notes for any additional instructions. In particular, if the MINOR version number was increased (e.g., v0.17.0v0.18.0), that implies a backwards incompatible change, and the release notes will explain what you need to do (e.g., you might have to add, remove, or change arguments you pass to the module).

  4. Tests your changes locally. You do this using the same process outlined in Manually testing a Docker image and Automated tests for Docker images.

  5. Deploy your changes to each environment. You do this using the same process outlined in Deploying Docker images.

Gruntwork Installer alternatives

The Gruntwork Installer may not be a good fit for all companies, either because it’s a Bash script, so it won’t work on Windows, or perhaps due to security concerns (see Gruntwork Installer Security). If that’s the case, your best alternative is to copy the script (from the modules folder of the repo) or binary (from the releases page of the repo) you’re interested in directly to your own repositories, so it’s a local file for your builds. This will allow you to use, for example, the file provisioner in Packer to copy it into your machine images, or the COPY command in Docker to copy it into your Docker images.

Note that the major drawback of copying these files is that you’ll have to manually pull in new versions any time the Gruntwork Infrastructure as Code Library is updated.

Packer and Docker alternatives

Although this guide outlines how to use Docker and Packer, you can also use the scripts and binaries in the Gruntwork Infrastructure as Code Library and the Gruntwork Installer with any other configuration management strategy, such as Chef, Ansible, Puppet, or Salt. Check the docs for your chosen configuration management tool on how to copy and execute scripts and binaries.

Using Helm Charts

This section will show you how to use Helm charts from the Gruntwork Infrastructure as Code Library. As an illustrative example, we’ll deploy the official Nginx docker container using the k8s-service helm chart from helm-kubernetes-services.

You can use this Helm chart to deploy any web or worker service Docker container onto Kubernetes.

Add Gruntwork Helm chart repository

The Helm charts in the Gruntwork Infrastructure as Code Library are distributed using a Helm Chart Repository. In order to access Helm charts that are a part of the Gruntwork Helm chart repository, you will need to add the repository for use with your client:

helm repo add gruntwork https://helmcharts.gruntwork.io

Once the repository is added, you can install any chart that is a part of the repository by accessing them under the namespace gruntwork. To find the available charts distributed by a particular repository, you can use the search command from the helm client:

helm search gruntwork

This will query the Gruntwork Helm chart repository for information on all the available charts distributed by the repository, showing you a list similar to below:

NAME                    CHART VERSION   APP VERSION     DESCRIPTION
gruntwork/k8s-service   v0.0.10                         A Helm chart to package your application container for Ku...

Deploying a chart

Unlike most charts in the community, Helm charts that are a part of the Gruntwork Infrastructure as Code Library have a set of required input values. Therefore, you can not run helm install CHART_NAME to install a chart on to your Kubernetes cluster.

Each of the charts document the required and optional input values in the corresponding values.yaml file for the chart. You can access the values.yaml of a chart either by inspecting the source code for the chart, or using the helm inspect command. For example, to see the values of the k8s-service chart, you can run:

helm inspect values gruntwork/k8s-service

You can use this as a starting point to construct your own values.yaml file:

helm inspect values gruntwork/k8s-service > values.yaml

You can then open up this values.yaml file and fill in the required and optional parameters. For k8s-service, the two required parameters are containerImage and applicationName. For this example, we can fill in the official Nginx container from Docker Hub for the containerImage and nginx for the applicationName:

values.yaml
#----------------------------------------------------------------------------------------------------------------------
# CHART PARAMETERS
#----------------------------------------------------------------------------------------------------------------------

#----------------------------------------------------------------------------------------------------------------------
# REQUIRED VALUES
# These values are expected to be defined and passed in by the operator when deploying this helm chart.
#----------------------------------------------------------------------------------------------------------------------

# containerImage is a map that describes the container image that should be used to serve the application managed by
# this chart.
containerImage:
  repository: nginx
  tag: 1.16.1

# applicationName is a string that names the application. This is used to label the pod and to name the main application
# container in the pod spec. The label is keyed under "gruntwork.io/app-name"
applicationName: "nginx"

Note that you are free to override any default parameters in the values.yaml file. When you install the chart, this values.yaml file is merged with the default values.yaml file distributed with the chart, allowing you to override those parameters.

Once you have a values.yaml that specifies all the inputs, you can install the chart using those as inputs:

helm install -n nginx -f values.yaml gruntwork/k8s-service

This will render the yaml files in the chart using the input values as parameters and apply it onto the Kubernetes cluster.

Important
In the previous example, we passed in the parameter -n nginx to the install command. This allows us to force a release name for the deployment. When you install a chart using Helm, Helm tracks that deployment as a release and binds all the related resources under a unique name. If you don’t provide a release name, Helm will automatically generate a random name that is not related to the charts (e.g witty-lobster). This makes maintenance difficult as you will need to refer to this release name any time you want to make changes to the deployment.

Updating a previous deployment

Once a chart is installed and deployed, you can use the update command to make changes to it. For example, if you wanted to deploy an updated version of Nginx using our example, you can update the values.yaml file to point to a later version, and then run helm update with the updated input values and release name:

helm upgrade -f values.yaml nginx gruntwork/k8s-service

This command will render the gruntwork/k8s-service chart using the updated input values, and then replace the resources under the Helm release nginx.

You can also use the upgrade command to update to a later version of the chart:

helm upgrade -f values.yaml --version v0.0.10 nginx gruntwork/k8s-service

The above command will use version v0.0.10 of the gruntwork/k8s-service chart when rendering the values.

Contributing to the Gruntwork Infrastructure as Code Library

Contributions to the Gruntwork Infrastructure as Code Library are very welcome and appreciated! If you find a bug or want to add a new feature or even contribute an entirely new module, we are very happy to accept pull requests, provide feedback, and run your changes through our automated test suite.

This section outlines the process for contributing.

File a GitHub issue

Before starting any work, we recommend filing a GitHub issue in the appropriate repo. This is your chance to ask questions and get feedback from the maintainers and the community before you sink a lot of time into writing (possibly the wrong) code. If there is anything you’re unsure about, just ask!

Update the documentation

We recommend updating the documentation before updating any code (see Readme Driven Development). This ensures the documentation stays up to date and allows you to think through the problem at a high level before you get lost in the weeds of coding.

Update the tests

We also recommend updating the automated tests before updating any code (see Test Driven Development). That means you add or update a test case, verify that it’s failing with a clear error message, and then make the code changes to get that test to pass. This ensures the tests stay up to date and verify all the functionality in the repo, including whatever new functionality you’re adding in your contribution. The test folder in every repo will have documentation on how to run the tests locally.

Update the code

At this point, make your code changes and use your new test case to verify that everything is working.

Create a pull request

Create a pull request with your changes. Please make sure to include the following:

  1. A description of the change, including a link to your GitHub issue.

  2. Any notes on backwards incompatibility.

Merge and release

The maintainers for the repo will review your code and provide feedback. If everything looks good, they will merge the code and release a new version.

Forking the Gruntwork Infrastructure as Code Library

The Gruntwork Terms of Service give you permissions to fork the code from the Gruntwork Infrastructure as Code Library into your own repos. This is useful if your company does not allow external dependencies (e.g., you have a company policy that requires all source code to be pulled from an internal GitHub Enterprise or BitBucket server) or if you need to make modifications to the Infrastructure as Code Library that you do not wish to contribute back to Gruntwork. This section will walk you through what you need to do to fork the code.

Important
The definition of an Authorized User from the Gruntwork Terms of Service does NOT change if you fork the code. That is, if you create internal forks and give 50 users access to those internal forks, then the Gruntwork License requires that you pay for 50 Authorized Users.

How to fork the code

Here is how you fork the code in the Gruntwork Infrastructure as Code Library:

  1. Copy each Gruntwork repo into your private repositories.

  2. You’ll also want to copy all the versioned releases (see the /releases page for each repo).

  3. For repos that contain pre-built binaries (such as ssh-grunt mentioned earlier), you’ll want to copy those binaries as well.

  4. Within each repo, search for any cross-references to other Gruntwork repos. Most of the repos are standalone, but some of the Terraform and Go code is shared across repos (e.g., the package-kafka and package-zookeeper repos use the module-asg repo under the hood to run an Auto Scaling Group). You’ll need to update Terraform source URLs and Go import statements from github.com/gruntwork-io to your private Git repo URLs.

You’ll want to automate the entire process above and run it on a regular schedule (e.g., daily). The Gruntwork Infrastructure as Code Library is updated continuously, both by the Gruntwork team and contributions from our community of customers (see the monthly Gruntwork Newsletter for details), so you’ll want to pull in these updates as quickly as you can.

How to use your forked code

Once you’ve forked the code, using it in is very similar to what is outlined in Using Terraform Modules and Using scripts and binaries, except for the following differences:

  1. Point the source URLs of your Terraform modules to your own Git repos, rather than the gruntwork-io GitHub org.

  2. Point the --repo parameter of gruntwork-install to your own Git repos, rather than the gruntwork-io GitHub org.

Drawbacks to forking

While forking is allowed under the Gruntwork Terms of Services, it has some downsides:

  • You have to do a lot of work up-front to copy the repos, releases, and pre-compiled binaries and update internal links.

  • You have to do more work to run this process on a regular basis and deal with merge conflicts.

  • If your team isn’t directly using the Gruntwork GitHub repos on a regular basis, then you’re less likely to participate in issues and pull requests, and you won’t be benefiting as much from the Gruntwork community.

So, whenever possible, use the code directly from the gruntwork-io GitHub org, as documented in Using Terraform Modules and Using scripts and binaries. If your team relies on NPM, Docker Hub, Maven Central, GitHub, or the Terraform Registry, using Gruntwork repos directly is no different. However, if your company completely bans all outside sources, then follow the instructions above to fork the code, and good luck!

Operating system compatibility

Here’s a summary of the operating systems supported by Gruntwork the Gruntwork Infrastructure as Code Library as of August, 2019:

Terraform modules

Just about all the Terraform modules in the Gruntwork Infrastructure as Code Library work on all major operating systems. However, there are a handful of modules where we have to call out to scripts from our Terraform code. Most of these scripts are Python and work on all major operating systems (as long as Python is installed), but there are a couple places where we call Bash code (mostly sleep 30 to work around Terraform bugs). If you run into a portability issue, please report it as a bug, and we’ll get it fixed!

Go binaries

The Gruntwork Infrastructure as Code Library includes a number of dev tools written in Go which are compiled into standalone binaries for every major operating system. This applies to tools such as Terragrunt, cloud-nuke, openvpn-admin, houston, gruntkms, ssh-grunt, gruntsam, and kubergrunt.

Linux server scripts

There are a number of scripts in the Gruntwork Infrastructure as Code Library that are meant to be installed and executed on Linux servers in AWS. These include install-xxx and run-xxx scripts (e.g., install-vault, run-vault, install-kafka, run-kafka, install-elasticsearch, run-elasticsearch, etc.), CI / CD scripts (e.g., aws-helpers, build-helpers, circleci-helpers, git-helpers, terraform-helpers), and server scripts (e.g., attach-eni, mount-ebs-volume, add-dns-a-record). See Using scripts and binaries for how these scripts are meant to be installed and used.

Deployment scripts

There are a handful of deployment scripts in the Gruntwork Infrastructure as Code Library that are meant to be executed by a developer directly, such as the roll-out-ecs-cluster-update.py script used to do zero-downtime rolling deployments for ECS clusters. These scripts are written in Python and should work on all major operating systems.

Need Windows support?

While Windows users can use most of the Gruntwork Infrastructure as Code Library, the vast majority of Gruntwork customers use Linux or Mac, so we haven’t prioritized improving our Windows support. If you would like us to improve our Windows support, please contact us.

Next steps

Now that you’ve learned how to use the Gruntwork Infrastructure as Code Library, you can start leveraging it to build your infrastructure! Here are some good first steps: