tldr; Our learnings from building a serverless workflow to automate our DB syncing process using AWS Step Functions. Dive deep with code examples and SAM templates to go from zero to fully automated workflows.
This post follows up on Securely Syncing DBs with AWS DMS, and covers how we built an automated workflow to manage pre and post processing steps using AWS Step Functions.
You’ll see code examples related to our specific use case, but the aim is to provide a blueprint you could follow for any serverless workflows you may want to build.
We’ll cover a set of tools and technologies designed to get you moving quickly, including:
- Setting up the development environment with PyCharm IDE
- Building and deploying to Cloud Formation using AWS Toolkit
- Writing architecture as code using SAM templates
- Building a State Machine
- Using Lambda Layers to share code between functions
- Scheduling the pipeline with AWS CloudWatch events
At AWS re-invent in 2018 some great IDE integrations with AWS were announced that make building and deploying code to AWS super smooth. We’ll cover our how we used the PyCharm IDE at Finimize, but note that Visual Studio and IntelliJ also have the AWS Toolkit.
You can download PyCharm here, and the AWS toolkit extension. Once installed, you can create a new ‘AWS Serverless Application’ project, which comes with a couple of example SAM templates to get you started. SAM is short for Serverless Application Model, which is essentially shorthand syntax to express functions, APIs, databases, and event source mappings. SAM is an extension of Cloud Formation templates, which are often much more concise.
Building our First AWS Lambda task
You’ll notice that the project has an example Lambda function in the app.py file under the hello_world folder. We need to keep the lambda_handler function which has the event and context arguments but we can replace the body of the function. In our case the first step in our workflow is to check that the database schema versions between our Production and Staging DBs are in sync. We keep these schema versions in S3, so our code looks like this:
An important thing to notice here is that we return a dictionary on success or failure. The returned dictionary can be interpreted by Step Functions, and we can implement branching logic based on the state returned from the function. Later we’ll see how this can be used by Choice task to determine whether to proceed or move the State Machine to a ‘Fail’ state.
You may notice that we’ve imported boto dependencies here — these are available in Python Lambda runtimes out of the box. if we wanted to use any other pip dependencies we would need to define them in requirements.txt file for each Lambda folder.
To deploy this Lambda function we’ll need to ensure that we have the relevant information defined in our SAM template file which is at the root of the project. Below is the updated template.yaml:
You’ll notice I’ve changed the function name and code path, and moved some of the settings to the Globals block. I’ve also referenced a role for this Lambda function, which is an IAM role I’ve defined separately. Without this role, the default role is assigned, which has much more limited permissions. For more detailed information about building SAM templates please refer to the SAM Github documentation.
Deploying our first Lambda function
We can deploy this function to AWS by right clicking on the template.yaml file in the Project window, then clicking ‘Deploy Serverless Application’. At this point we have options to update an existing Cloud Formation Stack, or to create a new one. If you’re building a Serverless workflow for the first time it probably makes sense to create a new stack for this so it’s nicely isolated from the rest of your AWS architecture. You can also choose to use or create an S3 bucket where generated deployment artefacts would be pushed to. If you have Lambda function(s) with any natively compiled dependencies, then you may also want to check the option to build the functions inside an AWS Lambda like docker container. More information on the deployment options can be found here.
Clicking deploy will perform a number of steps. Firstly it will trigger a build of your code and dependencies, which creates zip files ready to be uploaded. Next it creates a Cloud Formation deployment which builds out or updates the resources listed in template.yaml file in AWS. Under the hood this runs the SAM CLI commands, as AWS Toolkit essentially wraps this command line tool. The SAM CLI documentation gives more details on exactly what the build and deploy commands are doing.
Step Functions Pipeline
The pipeline here ties together all the steps in our database syncing process. At a high level this encompasses removing foreign key constraints from our target database, starting the AWS DMS migration task, then re-applying the key constraints and fixing auto-increment sequences on the target DB after the DMS task completes.
This pipeline is a single state machine which is defined using JSON. Each of the green boxes you can see above are Tasks which represent a single unit of work performed by the State Machine. In our example these tasks are either Lambda functions, Choice states, which handle branching logic based on the internal state produced by previous tasks, or Wait states which we can use to retry tasks after a given interval. The wait states are quite useful when you want to check the result of a long running process (like a DMS migration task), and keep the number of costly state transitions to a minimum. The above state machine is just under 140 lines of pretty printed JSON.
Next we’ll run through how we built our State Machine, highlighting some of the Step Functions features along the way.
Checking for failure with Choice tasks
To illustrate how a Choice task works we’ll show a state machine which runs the first Lambda function we created, then a Choice task which looks at the internal state created by that function and decides whether to move the State Machine in to either the ‘Succeed’ or ‘Fail’ state.
The “Versions In Sync?” Choice task above is looking at a portion of the internal state created by the previously run “Prod And Staging Version Check” Lambda function. The Variable is a JSON path expression. The BooleanEquals comparison operator is used to check if the result is true. If it is, we move to the Completed task. If not, we would run through any remaining choices to see if they could be satisfied. But since we don’t have any other choices we would follow the Default state transition, which in this case is to the failure task. In the full State Machine we have many of these choice tasks to evaluate if a previous Lambda function task failed.
In order to add the Step Functions State Machine to the SAM template file, we need to add the following snippet, where the definition string is the raw JSON of the State Machine displayed above.
Step Functions State Machines are actually not yet supported by the SAM templating library, so they are written in Cloud Formation template syntax. You can find more information about how to define State Machines in Cloud Formation here.
At this point you could deploy your serverless application, and any changes would be deployed alongside the Lambda function we defined earlier.
Rather than building out the rest of our State Machine step by step, I’ll go over some of the other interesting features Step Functions provides.
Polling long running processes with Wait tasks
In addition to Lambda functions and Choice tasks, Step Functions also lets you define Wait tasks. This might not seem like much, but Step Functions pricing is based on the number of state transitions you make in a month. So if you have any long running tasks that might be started by your State Machine (in our case a DMS migration task), you can poll the task until it completes at a wait interval which you can define. Without the wait step you could very quickly exceed your free tier usage.
To check the progress of the DMS Migration task we have a Lambda function that queries the status of DMS migration task via the Boto SDK, and returns the result (see code below).
In the State Machine we build a loop around the above Lambda Function, a choice task and wait task. This loop only exits if the “$.status” Variable value equals “Stopped” (see the JSON below).
Sharing code across functions with Lambda layers
One of the problems you can have once you have a number of Lambda functions is that there can be code duplication between them. You may also have lots of boilerplate code in these functions which means they’re not as concise as you’d like them to be. Fortunately there is a way to solve this by factoring out code in Lambda Layers. There are other cases for using Layers, another obvious one is to package common dependencies into a Layer so that functions can compose them simply by adding this Layer to their definition.
In our case we had a number of functions that needed access to a PostgreSQL client, which needed to initialise itself with credentials. We opted to create a DBConnection class which handled this and could be easily added to the associated Lambda functions. You can see a snippet of this code below.
In order to get the above DBConnection class to work as a Lambda Layer we need to place this db_connection.py file in a directory structure that matches that supported by your Layers runtime. For more information about dependency folder structures see the AWS documentation. As we’re using Python the file is placed under ‘python/lib/python3.7/site-packages’.
We define the layer in the SAM template like so:
For the Lambda functions that want to make use of this layer we add the Layers property (see example below).
Automating State Machine Execution
Once the State Machine has been deployed to AWS, it’s possible to start an execution of this via the AWS console, or using command line tools or scripts. But you’ll likely not want to trigger this manually for every run. If you want this trigger this execution in response to some kind of event, then the best way to go is to create a simple Lambda function that starts an execution (see code below).
The code above uses the Boto SDK and takes the State Machine ARN (Amazon Resource Name). You can optionally supply input to the execution, which might make sense if there are relevant parameters coming from the event that triggered the execution. In our case the State Machine doesn’t need any input.
Once you have this Lambda function you could invoke it from other AWS services or trigger it using an Event Source Mapping or schedule it to run on a timer. In our case we wanted to run this on a fixed interval of once every 8 hours. To do this we created a CloudWatch event, which runs on a fixed schedule. You can also use cron expressions if you want more control over when these run. See the documentation for more information on CloudWatch events scheduling.
You can define Cloudwatch events that trigger Lambda functions using SAM. The following snippet shows Lambda definition in our SAM template file.
Further advice for building Serverless Workflows
I haven’t really touched the IAM roles I’ve used for Lambda functions I’ve described here. If you’re following this post you’ll need to ensure you create an IAM role for Step Functions. You’ll also need to ensure you create appropriate roles based on the access requirements of Lambda functions. For best practice you should follow the least access principle, so functions only have permissions to access the services they require. You can define these roles in your SAM template as Cloud Formation code, as SAM doesn’t support creating IAM role resources at the time of writing.
I’ve talked about setting up AWS Toolkit so you can build and deploy code quickly to get started. But I don’t recommend this approach for anything beyond quick prototyping. Once you have a mature Serverless workflow you should really look to deploy via a proper Continuous Integration pipeline. Deploying code and architecture from an IDE is simultaneously cool and terrifying.
When I first started building Serverless workflows my approach was to create a Lambda function for every logical step in the pipeline. This was the case even with some very simple functions like getting or writing an item in DynamoDB. Fortunately Step Functions has some tight service integrations that mean you can define some actions in JSON directly, rather than having a Lambda function. It’s worth looking into the Step Functions Service Integrations documentation to see if you can save some code.