Opinionated secrets management that helps us sleep at night.
What is secrets management?
Secrets management is one of those things that is talked about quite frequently, but there seems to be little consensus on how to actually go about it. In order to understand our journey, we first have to establish what secrets management means (and doesn’t mean) to us. Secrets management is the process of ensuring passwords, API keys, certificates, etc. are kept secure at every stage of the software development lifecycle. Secrets management does NOT mean attempting to write our own crypto libraries or cipher algorithms. Rolling your own crypto isn’t a great idea. Suffice it to say, crypto will not be the focus of this post.
There’s such a wide spectrum of secrets management implementations out there ranging from powerful solutions that require a significant amount of operational overhead, like Hashicorp Vault, to solutions that require little to no operational overhead, like a .env file. No matter where they fall on that spectrum, each of these solutions has tradeoffs in its approach. Understanding these tradeoffs is what helped our Engineering team at Betterment decide on a solution that made the most sense for our applications. In this post, we’ll be sharing that journey.
How it used to work
We started out using Ansible Vault. One thing we liked about Ansible Vault is that it allows you to encrypt a whole file or just a string. We valued the ability to encrypt just the secret values themselves and leave the variable name in plain-text. We believe this is important so that we can quickly tell which secrets an app is dependent on just by opening the file. So the string option was appealing to us, but that workflow didn’t have the best editing experience as it required multiple steps in order to encrypt a value, insert it into the correct file, and then export it into the environment like the 12-factor app methodology tells us we should.
At the time, we also couldn’t find a way to federate permissions with Ansible Vault in a way that didn’t hinder our workflow by causing a bottleneck for developers. To assist us in expediting this workflow, we had an alias in our bash_profiles that allowed us to run a shortcut at the command line to encrypt the secret value from our clipboard and then insert that secret value in the appropriate Ansible variables file for the appropriate environment.
alias prod-encrypt="pbpaste | ansible-vault encrypt_string --vault-password-file=~/ansible-vault/production.key"
This wasn’t the worst setup, but didn’t scale well as we grew. As we created more applications and hired more engineers, this workflow became a bit much for our small SRE team to manage and introduced some key-person risk, also known as the Bus Factor. We needed a workflow with less of a bottleneck, but allowing every developer access to all the secrets across the organization was not an acceptable answer. We needed a solution that not only maintained our security posture throughout the software development lifecycle, but also enforced our opinions about how secrets should be managed across environments.