Posted by on December 7, 2018

Continuous Integration (CI) and Continuous Deployment (CD) for databases is a lot harder than those things are for stateless applications. I still regularly meet teams who are deploying their database by hand because automated CD was too hard. Not coincidentally, those teams also need help from a data engineering consultant.


image: FabrĂ­zio Mello CC Attribution License

You can and should build your data structures from source control. This post will show you how to do so with Amazon Web Services’ Codestar, Redshift, and flyway. The goal is for your database structures to be in a known state at all times, and for all upgrades to be one-click.

Prerequisites

VPC

A Virtual Private Cloud (VPC), with private subnets and Network Address Translation (NAT) gateways. I presume that most people already have one of these. If not, the following scripts should help:

  1. VPC
  2. NAT Gateway

IAM permissions

You’ll need codestar, cloud formation, IAM, and redshift permissions.

Steps:

  1. Create a Code Commit Repository

    There are no special requirements for Code Commit. I used a command line aws codecommit create-repository cfn-flyway call.

  2. Create the CI/CD pipeline

    Next we use a Cloudformation script to create the Code Pipeline and its Code Build step.

    There is a lot going on here, but its not too much to understand.

    1. First IAM roles are created for the Pipeline and Codebuild.
    2. Then VPC security is configured for redshift.
    3. Then a redshift cluster is created.
    4. Then the Code Build project that does the deploys is created.
    5. And finally a Code Pipeline is created which watches our repository and calls code build on every commit
  3. Configure Code Build

    Code Build has been left at its defaults, expecting a buildspec.yml file. I use the same YAML most of the time:

    This script does very little. It delegates the work to install.sh and build.sh and makes sure that they are executable. Those two shell scripts do the deployment.

    Install downloads and installs flyway and the redshift JDBC driver. It sources build.env for shared configuration:

    All that is left is to call flyway and let it build the database:

    The flyway migrate command does the actual work of creating the database objects. The ls command is for debugging. Its output is written to the Code Build log.

  4. Add your database scripts

    So far we’re migrating no content. We need to create a sql folder, and start putting flyway compatibly named SQL scripts into it, for example:

And now you’re done. You have a record of what changed, when, why, and by whom. You can build a correct database from scratch, reliably and with almost no effort. When you’re comfortable, take away everyone’s DDL permissions on the database. And bask in the resulting calm and productivity.

Wait! What happens when something goes wrong?

I often hear that people believe that they need permission to type untested commands into production in order to resolve incidents. Typically in an untraceable way. I phrased it differently when I had this permission, but typing untested commands into production is what we’re talking about here. And I’ve never met a database that I didn’t configure where DDL commands were being logged in a way that the auditors can be confident that the log is complete. We don’t typically type untested commands into production because we need to test, because the source control commit message provides the motivation for change, and because we require a reliable audit log. Right?

As for how incidents are fixed, they are fixed the same way that everything else happens. The “ALTER DATABASE SET smoke=back_in” commands that are desperately needed to resolve incidents are committed into source control and deployed by the pipeline. With a test case that demonstrates if they worked or not.

Comments

Be the first to comment.

Leave a Reply