After making the initial design we needed to validate if the newly designed platform could handle the load that was estimated, up to and beyond 50k requests / minute.
When the base api functionality was developed we were ready to put things to the test. It became clear that Lambda was not going to be a bottleneck. However the load test showed that when the concurrency of the load test increased the time it took for the records to go from Kinesis to Dynamodb increased significantly.
This needed to be fixed before we continued...
Kinesis had no trouble capturing all the data. However the bottleneck was the Lambda function, that was triggered by Kinesis, responsible for pushing the data to DynamoDB. After tweaking the batch size of the chunks that were processed per Lambda function we didn't notice any significant improvement.
After digging around in the documentation of Kinesis we found out we were hitting the read limit of Kinesis (5 per second per shard). There were 2 options: we could increase the shards, which would increase the events to Lambda and thus increase the amount of Lambda functions that ran in parallel. Option 2 was to enable the Enhanced Fan-Out feature of Kinesis; this would improve the read parallelism per shard by providing each consumer with its own throughput limit.
We opted to increase the number of shards and confirmed with load testing that with 6 shards and batch sizes of 300 from Kinesis to the Lambda function we could meet the required capacity for the projects. We also confirmed that the platform could handle a LOT more by increasing the shards in Kinesis.