Unit tests and integration tests are vitally important, but sometimes even those aren’t sufficient to ensure that critical services in your application will function smoothly in production.
In those cases, adding a staging step to our CI/CD process allows us to test a feature with real data in a less supervised environment. For example, here at Lumigo we decided to use it for our Node.js tracer.
In this article, we’ll share how we automate the process of deployment to the staging environment and release using serverless.
Before we get started, you should already have the following in place:
- Final Release – a flow for releasing the final version to production
- Deploy Staging – a flow for deploying the staging environment
Our original CI/CD architecture
Before adding a staging environment, our CI/CD flow at Lumigo consisted of two parts:
- Before merging the feature to master it had to pass unit tests and integration tests.
- When the code was merged to master, a CircleCI job was triggered and released the new version to npm (“final release”).
For more on this, read Development workflow for serverless applications.
Our CI/CD architecture with staging
And this is how the CI/CD process looks with the staging step added:
Release beta and trigger
This is the first step of the process and, in our case, a CircleCI job is triggered on merge to master. It runs a bash file consisting of three parts:
- Release the beta version.
- Deploy the staging environment with the new beta version. A notification is triggered if errors occur in this environment.
- Release the final version with delay. The delay (of 2 hours) provides enough time for the staging environment to run with the new beta version.
Let’s go into those steps in more detail:
1. Releasing beta version to NPM
With npm we can add a “beta” tag to the release.
In our package.json, the version is usually the release version and not the beta version. We need to update this version to be the beta version before the release to npm:
Now, in npm, under “Versions”, you should see something like this:
If your project isn’t an npm package, you can release the beta version in a different way.
2. Trigger deploy-staging
The beta version is in npm, now the staging environment should use it.
We should trigger the CircleCI job that deploys the staging environment:
3. Trigger release-with-delay
We want to run the release flow (with delay). We will do that by triggering the Lambda step-function-invoker:
Install beta on staging and monitor it
Use beta version
In the package.json file, we need to change the dependency to use the beta version:
If you aren’t using Node.js, you have a different requirements file, edit it instead.
Searching for errors in the staging environment
The next step is verifying that the staging environment works as expected and there aren’t any errors. In order to automate this process, we – of course – use Lumigo! It monitors your serverless application and it sends you a notification if there was an error. So, if there aren’t any problems, no manual work is needed. By default, errors are exceptions in Lambdas, but you can configure other types of errors as well.
If you want, you can also manually check the status of specific Lambdas in the staging environment by using CloudWatch.
What can we do if there are errors in the staging environment?
AWS Step Functions have the ability to be stopped. So if we see errors in staging and we don’t want to release our version, we can just halt the execution of the Step Function: release-with-delay. We can stop the execution from the AWS console by selecting the running execution and clicking “Stop execution”:
What if we have to release now and can’t wait for the delay to finish?
There are sometimes cases where we need to release as soon as possible, like in the case of a bugfix. In those cases, we can simply stop the execution of the release-with-delay Step Function, then manually trigger the final-release lambda.
Building the final release flow
As we’ve already discussed, there are several things we want to achieve with our final release flow:
- A delay of 2 hours.
- It should be stoppable after it’s started.
- It should release automatically if not stopped manually.
- There should be an option to disable the delay if necessary.
First, we need to define the step-function-invoker Lambda. This will make sure that only one instance of this Step Function is running each time, in order to avoid collision of releases.
If no issues occur in staging that prompt us to stop the execution of the step function, the final-release Lambda will release our version automatically after the set delay.
Let’s define the final-release lambda:
We also need to configure a CIRCLECI_TOKEN as an environment variable in CircleCI.
Make sure the version of your CircleCI config file is supported: https://circleci.com/docs/2.0/api-job-trigger/. At the time of writing, version 2.1 isn’t supported.
If you aren’t using CircleCI, replace the code in the handler so that it calls your final release flow.
Release-with-delay Step Function
We are using a Step Function here because it allows us to create a delay while giving us the option to cancel the process after it has started. You can read more about step functions here.
Let’s define the step-function in our serverless.yml file:
There you have it: a serverless-focused CI/CD flow that includes a staging environment, is composed entirely of serverless components, and can easily be expanded to include more services.
Adding a staging environment to the CI/CD flow can often mock production behavior better than tests, so we see it as a vitally important step when it comes to critical services.
Over the past two years the R&D team here at Lumigo has gained a wealth of hard-earned experience in the particular requirements of CI/CD as it pertains to serverless development, and we’ll continue to share what we learn in the serverless trenches as we hone our approach.