Amazon DynamoDB is built to deliver single-digit millisecond performance at any scale. It is built to store Terabytes of data. It is built to support Amazon’s Cyber Monday traffic.
However, this scalability comes with an overhead. No matter what resource you look up, you will be constantly reminded that “You Must Know Your Access Patterns In Advance”. Amazon Marketplace had been running for years before they migrated to DynamoDB so they were well equipped to know how they needed to access their data and fully reap the rewards of a NoSQL setup.
In contrast, a new startup with an exciting new product wants to build an MVP quickly. They don’t know how their app will grow over time. They want to be agile and react to their users to develop their product accordingly. They can’t know all their access patterns in advance.
This no doubt poses an issue, but not an insurmountable one.
Being able to map your access patterns in as much detail as possible will translate to a smooth DynamoDB adventure. However, building a startup is about disrupting the market so you can’t be afraid of a little turbulence along to way to reach your goal.
If you use DynamoDB from the start you will never have to worry about the scalability of your product. If your app gets a massive social media spike, you won’t have to scramble to prevent your site going down. And reaching that scale is what all startups aspire to so how can we build towards this while limiting the turbulence?
When talking to people interested in using DynamoDB but who want to remain Agile, there are three common concerns they have. These are all valid and should be carefully considered, but they can be conquered with the right approach.
I need to know my access patterns in advance
In Alex deBrie’s “The DynamoDB Book”(check it out if you haven’t already!), he addresses this misconception early on. The premise being that, yes, you do need to know your access patterns and it helps a lot but you can add to them over time. When you need to add to your model, you should still go through thorough planning to maximise your index efficiency, but in most cases you are making additive changes that are formulaic to implement.
RDBMS(Relational Database Management System) allows you to implement a tried and tested pattern to set up your database and worry about how to query it later. DynamoDB presents you with a riddle before you can utilise it. For the problem solver lovers amongst us though this is exciting.
A Global Secondary Index(GSI) provides you with a new way to efficiently query your data. You are presented with a constraint of 20 GSIs. For each new access pattern you could theoretically create a perfect GSI to suit that need. This would be very inefficient as each GSI adds to your storage cost and the 20 GSI limit will catch up with you. You shouldn’t avoid adding a GSI and the number you need will vary with your requirements but if you are approaching double figures you likely need to reassess.
When starting out you can employ the Overload technique on your GSIs (and primary key) to really optimise the number of access patterns you can squeeze out of each one. I was recently acting as a rubber duck for a colleague trying to add a new access pattern. He was wondering how he could shape a new GSI to fit this. He went through a few approaches, when suddenly he got very excited and realised that his existing overloaded GSI could already satisfy his query.
DynamoDB is not designed to be perfectly flexible but by overloading your indexes you open yourself up to much more potential flexibility in the future. After adding a few overloaded GSIs you increase your chances of satisfying a new access pattern without any change.
I can’t easily migrate my data
This is a common phrase I hear when speaking to developers coming from a RDBMS approach. Migrations in DynamoDB are more involved compared to an RDBMS with an ORM(Object Relational Mapping) where they are largely automated for you.
The most involved DynamoDB migrations require you write a script which scans your table and updates some or all of your items. This adds overhead to your migrations — creating the script, running on your environments, handling any errors, dealing with sync issues etc.
The incorrect assumption made is that they happen at the same frequency in both approaches. I have worked on a project with DynamoDB for the last 14 months. In that time we ran 3 such migrations. Compare that to a previous project using Django/PostgreSQL where we likely ran 100+ migrations in the same timeline. We may have been lucky with only needing 3 but it is still a different magnitude to a typical RDBMS project.
A migration which would add a new column in RDBMS doesn’t require this Scan and Update style migration for DynamoDB. Due to it being schema-less you can simply update your application code to start adding this new attribute to new items. To handle the case where the attribute doesn’t exist(i.e. on old items) you can define a default value in your data layer which should abstract it away from the rest of your application code. You can of course run a Scan and Update to add a default value but you don’t need to.
Understanding the Overloaded technique and how Item Collections work will often allow you to avoid needing a new migration script to implement a new access pattern. However, you will need to implement these eventually. They may seem scary but once you have done a few you will realise how straight forward and formulaic they can be. Again Alex deBrie’s “The DynamoDB Book” runs through numerous examples of how to handle migration strategies that may arise.
I can’t query the data I already have
As mentioned an RDBMS setup effectively keeps all doors open when it comes to queries. You will rarely find yourself unable to query data you have been collecting for the last 6 months. One of the costs of DynamoDB is that you may code yourself into a hole in this respect.
DynamoDB allows you to create new indexes to query your data but only on scalar data types (number, string, binary, boolean). If any of the data you wish to query is stored in a complex data type(map, list, set) then you hit a problem. You are not completely stuck as you can potentially run a migration to extract this data into a new data model but it will often lead to a large amount of code change.
This is a big downside but it can be avoided in most cases. Using complex data types is a good strategy in DynamoDB (whereas it violates the normalisation practices in RDBMS). It allows you to “pre-join” the data your users are going to be requesting in an efficient way. However, it mustn’t be used lightly. By putting any data into a complex data type you are effectively saying you will never want to query based on this data(for user access patterns at least).
It can be argued that you won’t know which data you will want to query on in the future but in most cases you can make a reasonable guess. In my experience, these issues tend to appear due to a misunderstanding of DynamoDB rather than an unexpected access pattern appearing. Many people fall into the trap of trying to embrace DynamoDB concepts of denormalisation and duplication without considering their impacts. By understanding the concepts of DynamoDB you can avoid many of these pitfalls.
When weighing up the cost of overcoming DynamoDB’s Agile Hurdles it is worth considering that DynamoDB has more benefits other than infinite scalability that apply to an Agile MVP Project approach
- Pay as you Use: While your app is getting off the ground and building traction you only pay for what you use. If you have no users for two days you will likely pay nothing(up to 25GB storage). If it then spikes to 1000s of users the next day, you pay for their usage as normal.
- Cheap Environments: You can create new environments cheaply and quickly with an Infrastructure as Code approach. This means you can test new features on new environments, you can set up an environment just for user testing, you can run experiments etc. Perhaps one of the biggest advantages is you can create new environments on the fly in your CI pipeline in order to run integration tests against (without worrying about cost). Check out my previous article which discusses how to take advantage of this while reducing CI time.
- Forced Efficiency: DynamoDB is architected to not allow you to write an inefficient query. If your data is modelled in a way that you can query what you need, you don’t need to worry about that query becoming inefficient over time. RDBMS queries, on the other hand, will often work at time of development and then bite you as you scale.
DynamoDB Learning Curve
DynamoDB has a steep learning curve no doubt. Many people try DynamoDB on their new projects and get discouraged as they can’t fetch the data they need efficiently. They will often lay the blame on DynamoDB at this point but the root cause is generally an inefficient use of a powerful tool.
It is easy to fall into the trap of trying to reimplement RDBMS data patterns in DynamoDB which appears to work until it doesn’t. For example, DynamoDB would allow you create a table for Blog Posts and a table for Comments, create a relationship between them, scan each table and piece together the data your app needs. A new DynamoDB dev sees a cool way to use a scalable tech in the way they’re used to working while anyone with DynamoDB experience sees 2 immediate red flags.
Without diving into the details too much(the best practice is to use a single table and avoid using Scans in nearly all situations), the point is that most bad experiences with DynamoDB are avoidable but they do require some research to overcome. It is a different approach to RDBMS and while some relational techniques like creating an ERD(Entity Relationship Diagram) are still useful and beneficial, most relational methods should not be shoehorned in.
All of the hurdles mentioned require an understanding of DynamoDB concepts before they can be overcome. For every new DynamoDB project you need to plan before you can implement but for every new DynamoDB dev they need to research before they can plan.
DynamoDB is an extremely powerful tool. If you efficiently implement it in your project from the start you will solve many future headaches. It poses some issues for an Agile Development style but by keeping the following 3 points in mind you can minimise the pain these cause while being assured you are building a future proof solution.
- Implement Overloaded Indexes to remain as flexible as possible with your access patterns
- Learn the less time-consuming migration techniques but don’t be afraid to occasionally run a full table update
- Be very wary of the query impacts before deciding to put some data into a handy complex data type
If you think Serverless and DynamoDB could help get your exciting product off the ground, while building for the future and minimising cost, don’t hesitate to get in contact