Continuous Integration (CI) and Continuous Deployment (CD) for databases is a lot harder than those things are for stateless applications. I still regularly meet teams who are deploying their database by hand because automated CD was too hard. Not coincidentally, those teams also need help from a data engineering consultant.

image: FabrĂzio Mello CC Attribution License
You can and should build your data structures from source control. This post will show you how to do so with Amazon Web Services’ Codestar, Redshift, and flyway. The goal is for your database structures to be in a known state at all times, and for all upgrades to be one-click.
A Virtual Private Cloud (VPC), with private subnets and Network Address Translation (NAT) gateways. I presume that most people already have one of these. If not, the following scripts should help:
You’ll need codestar, cloud formation, IAM, and redshift permissions.
There are no special requirements for Code Commit. I used a command line `aws codecommit create-repository cfn-flyway` call.
Next we use a Cloudformation script to create the Code Pipeline and its Code Build step.
---
AWSTemplateFormatVersion: "2010-09-09"
Description: Create a flyway CodePipeline for database migrations
Parameters:
DBMasterUsername:
Type: String
Description: Name for the DB User
Default: pipeline_is_dba
DBMasterPassword:
Type: String
Description: Password for the DB
NoEcho: true
DBPortNumber:
Description: The port number on which the database accepts incoming connections.
Type: Number
Default: 3306
PipelineBucket:
Type: String
Description: Name of the S3 bucket to store CodePipeline artifacts.
RepoName:
Type: String
Description: Name of the repo being deployed
PrivateSubnet:
Type: AWS::EC2::Subnet::Id
Description: Subnet for codebuild to access DB
VPC:
Type: AWS::EC2::VPC::Id
Description: Existing VPC in which to deploy
Resources:
CodePipelineRole:
Type: "AWS::IAM::Role"
Properties:
RoleName:
Fn::Sub: CodePipelineRole-${AWS::StackName}
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "codepipeline.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: /
Policies:
-
PolicyName: "CodePipelineNestedCFNAccessPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "s3:DeleteObject"
- "s3:GetObject"
- "s3:GetObjectVersion"
- "s3:ListBucket"
- "s3:PutObject"
- "s3:GetBucketPolicy"
Resource:
- Fn::Sub: arn:aws:s3:::${PipelineBucket}
- Fn::Sub: arn:aws:s3:::${PipelineBucket}/*
-
Effect: "Allow"
Action:
- "codecommit:ListBranches"
- "codecommit:ListRepositories"
- "codecommit:BatchGetRepositories"
- "codecommit:Get*"
- "codecommit:GitPull"
- "codecommit:UploadArchive"
Resource:
- Fn::Sub: arn:aws:codecommit:${AWS::Region}:${AWS::AccountId}:${RepoName}
-
Effect: "Allow"
Action:
- "codebuild:StartBuild"
- "codebuild:BatchGetBuilds"
Resource:
- Fn::Sub: arn:aws:codebuild:${AWS::Region}:${AWS::AccountId}:project/FlywayCodebuild
- Fn::Sub: arn:aws:codebuild:${AWS::Region}:${AWS::AccountId}:build/FlywayCodebuild:*
-
Effect: "Allow"
Action:
- "iam:PassRole"
Resource:
- Fn::Sub: arn:aws:iam::${AWS::AccountId}:role/CloudFormationRole-${AWS::StackName}
CodeBuildRole:
Type: "AWS::IAM::Role"
Properties:
RoleName:
Fn::Sub: CodeBuildRole-${AWS::StackName}
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "codebuild.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: /service-role/
Policies:
-
PolicyName: "CodeBuildAccessPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Sid: CodeCommitReadPermissions
Effect: "Allow"
Action:
- "codecommit:ListBranches"
- "codecommit:ListRepositories"
- "codecommit:BatchGetRepositories"
- "codecommit:Get*"
- "codecommit:GitPull"
Resource:
- Fn::Sub: arn:aws:codecommit:${AWS::Region}:${AWS::AccountId}:${RepoName}
- Sid: GeneralReadPermissions
Effect: "Allow"
Action:
- "ec2:Describe*"
- "cloudformation:ValidateTemplate"
- "iam:Get*"
- "iam:List*"
- "logs:Describe*"
- "logs:Get*"
- "tag:Get*"
Resource:
- "*"
- Sid: WriteLogsPermissions
Effect: "Allow"
Action:
- "logs:CreateLogGroup"
- "logs:CreateLogStream"
- "logs:PutLogEvents"
Resource:
- "*"
- Sid: PipelineBucketPermissions
Effect: "Allow"
Action:
- "s3:PutObject"
- "s3:GetObject"
- "s3:GetObjectVersion"
- "s3:ListBucket"
Resource:
- Fn::Sub: arn:aws:s3:::codepipeline-${AWS::Region}-*
- Fn::Sub: arn:aws:s3:::${PipelineBucket}/*
- Fn::Sub: arn:aws:s3:::${PipelineBucket}
CodeBuildInVPCPolicy:
Type: AWS::IAM::Policy
Properties:
PolicyName: codebuild-in-vpc
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ec2:CreateNetworkInterface
- ec2:DescribeDhcpOptions
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- ec2:DescribeSubnets
- ec2:DescribeSecurityGroups
- ec2:DescribeVpcs
Resource: "*"
- Effect: Allow
Action:
- ec2:CreateNetworkInterfacePermission
Resource: !Sub arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:network-interface/*
Condition:
StringEquals:
ec2:AuthorizedService: codebuild.amazonaws.com
Roles:
- !Ref CodeBuildRole
DBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Group for DB
VpcId:
Ref: VPC
AccessDBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow access to DB
VpcId:
Ref: VPC
ToDBRule:
Type: AWS::EC2::SecurityGroupEgress
Properties:
IpProtocol: tcp
FromPort: !Ref DBPortNumber
ToPort: !Ref DBPortNumber
DestinationSecurityGroupId:
Fn::GetAtt:
- DBSecurityGroup
- GroupId
GroupId:
Fn::GetAtt:
- AccessDBSecurityGroup
- GroupId
DBListenRule:
Type: AWS::EC2::SecurityGroupIngress
Properties:
IpProtocol: tcp
FromPort: !Ref DBPortNumber
ToPort: !Ref DBPortNumber
SourceSecurityGroupId:
Fn::GetAtt:
- AccessDBSecurityGroup
- GroupId
GroupId:
Fn::GetAtt:
- DBSecurityGroup
- GroupId
HTTPSSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow outbound HTTPS, for S3, CodeCommit, etc
VpcId:
Ref: VPC
HTTPSOutRule:
Type: AWS::EC2::SecurityGroupEgress
Properties:
IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
GroupId:
Fn::GetAtt:
- HTTPSSecurityGroup
- GroupId
RedshiftSubnetGroup:
Type: 'AWS::Redshift::ClusterSubnetGroup'
Properties:
Description: My ClusterSubnetGroup
SubnetIds:
- !Ref PrivateSubnet
Redshift:
Type: AWS::Redshift::Cluster
Properties:
ClusterSubnetGroupName:
Ref: RedshiftSubnetGroup
DBName: "mydb"
MasterUsername:
Ref: DBMasterUsername
MasterUserPassword:
Ref: DBMasterPassword
NodeType: "dc2.large"
ClusterType: "single-node"
Port:
Ref: DBPortNumber
VpcSecurityGroupIds:
- !GetAtt DBSecurityGroup.GroupId
FlywayCodeBuild:
Type: AWS::CodeBuild::Project
Properties:
Name: FlywayCodebuild
Description: build a database
ServiceRole:
Fn::GetAtt: [CodeBuildRole, Arn]
Artifacts:
Type: NO_ARTIFACTS
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/nodejs:7.0.0
EnvironmentVariables:
- Name: FLYWAY_URL
Value: !Join ["", ["jdbc:redshift://", !GetAtt Redshift.Endpoint.Address, ":", !Ref DBPortNumber, "/mydb"]]
- Name: FLYWAY_USER
Value: !Ref DBMasterUsername
- Name: FLYWAY_PASSWORD
Value: !Ref DBMasterPassword
Source:
Location:
Fn::Sub: https://git-codecommit.${AWS::Region}.amazonaws.com/v1/repos/${RepoName}
Type: CODECOMMIT
TimeoutInMinutes: 15
EncryptionKey:
Fn::Sub: arn:aws:kms:${AWS::Region}:${AWS::AccountId}:alias/aws/s3
Tags:
- Key: Name
Value:
Fn::Sub: FlywayCFN-${AWS::StackName}
VpcConfig:
SecurityGroupIds:
- !Ref AccessDBSecurityGroup
- !Ref HTTPSSecurityGroup
Subnets:
- !Ref PrivateSubnet
VpcId: !Ref VPC
DeployPipeline:
Type: "AWS::CodePipeline::Pipeline"
Properties:
Name:
Fn::Sub: Flyway-demo-${AWS::StackName}
RoleArn:
Fn::GetAtt: [CodePipelineRole, Arn]
Stages:
-
Name: CodeCommitStage
Actions:
-
Name: CodeCommitAction
ActionTypeId:
Category: Source
Owner: AWS
Version: "1"
Provider: CodeCommit
OutputArtifacts:
-
Name: GitOut
Configuration:
BranchName: master
RepositoryName:
Ref: RepoName
RunOrder: 1
-
Name: Deploy
Actions:
-
Name: CodeBuild
InputArtifacts:
- Name: GitOut
ActionTypeId:
Category: Build
Owner: AWS
Version: "1"
Provider: CodeBuild
Configuration:
ProjectName:
Ref: FlywayCodeBuild
RunOrder: 1
ArtifactStore:
Type: S3
Location:
Ref: PipelineBucket
There is a lot going on here, but its not too much to understand.
Code Build has been left at its defaults, expecting a buildspec.yml file. I use the same YAML most of the time:
---
version: 0.1
phases:
install:
commands:
- echo "Configuring build files"
# Allow execute on the files, -v is verbose, -v -v is more verbose
- chmod -v -v +x ./*.sh
- ./install.sh
build:
commands:
- ./build.sh
This script does very little. It delegates the work to `install.sh` and `build.sh` and makes sure that they are executable. Those two shell scripts do the deployment.
#!/bin/bash
source ./build.env
wget "https://repo1.maven.org/maven2/org/flywaydb/flyway-commandline/${FLYWAY_VERSION}/flyway-commandline-${FLYWAY_VERSION}-linux-x64.tar.gz"
tar -xvzf "flyway-commandline-${FLYWAY_VERSION}-linux-x64.tar.gz"
wget "https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/${JDBC_VERSION}/RedshiftJDBC42-${JDBC_VERSION}.jar"
mv "RedshiftJDBC42-${JDBC_VERSION}.jar" "./flyway-${FLYWAY_VERSION}/drivers/"
mv ./sql "./flyway-${FLYWAY_VERSION}/sql/"
Install downloads and installs flyway and the redshift JDBC driver. It sources `build.env` for shared configuration:
#!/bin/bash
if [ "${BASH_SOURCE[0]}" -ef "$0" ]
then
echo "Source this script, do not execute it!"
exit 1
fi
export FLYWAY_VERSION="5.2.4"
export JDBC_VERSION="1.2.16.1027"
All that is left is to call flyway and let it build the database:
#!/bin/bash
source ./build.env
ls -R "./flyway-${FLYWAY_VERSION}/"
"./flyway-${FLYWAY_VERSION}/flyway" migrate -X
The flyway migrate command does the actual work of creating the database objects. The ls command is for debugging. Its output is written to the Code Build log.
So far we’re migrating no content. We need to create a `sql` folder, and start putting flyway compatibly named SQL scripts into it, for example:
CREATE TABLE first_table (id int primary key ,name varchar(100) );
And now you’re done. You have a record of what changed, when, why, and by whom. You can build a correct database from scratch, reliably and with almost no effort. When you’re comfortable, take away everyone’s DDL permissions on the database. And bask in the resulting calm and productivity.
I often hear that people believe that they need permission to type untested commands into production in order to resolve incidents. Typically in an untraceable way. I phrased it differently when I had this permission, but typing untested commands into production is what we’re talking about here. And I’ve never met a database that I didn’t configure where DDL commands were being logged in a way that the auditors can be confident that the log is complete. We don’t typically type untested commands into production because we need to test, because the source control commit message provides the motivation for change, and because we require a reliable audit log. Right?
As for how incidents are fixed, they are fixed the same way that everything else happens. The “ALTER DATABASE SET smoke=back_in” commands that are desperately needed to resolve incidents are committed into source control and deployed by the pipeline. With a test case that demonstrates if they worked or not.
Be the first to comment.