For many workloads, the pay-per-invocation pricing model makes AWS Lambda a far cheaper alternative to containers or VMs. Many companies have reported saving up to 90% by moving their existing workloads to AWS Lambda.
While the event-driven programming model is very powerful, these architectures also have some inherent complexities. Often, simple mistakes can lead to a rapid escalation in costs, which might come as a surprise at the end of the month. Other costs can creep up on you when you focus too much on the cost of AWS Lambda. For example, we often overlook the cost of peripheral services such as CloudWatch Logs or event sources such as API Gateway and Step Functions.
In this post, we’ll look at some common surprises in your serverless bill and how you can mitigate them. We’ll also explore how Epsagon can help you keep an eye on your serverless bill and quickly identify functions to optimize.
Common Surprises in Serverless Bills
Too Much Parallelism During Fan-Out
AWS Lambda is often used with SNS topics to implement the fan-out pattern. Every message published to SNS will trigger a Lambda invocation. This 1:1 ratio means the cost of Lambda invocations will grow linearly to the throughput of messages.
A linear growth of cost is not a bad thing on its own. But at scale, it can still amount to significant costs, especially if the function cannot finish quickly. There are several wastages to consider here:
- Lambda invocations are billed in 100ms blocks, so every invocation will incur some wastage (e.g., a 50ms invocation will incur 50ms of billed but unused invocation time).
- If the function needs to talk to external services, it will also spend some of the invocation time waiting for the IO operations to complete. That’s idle waiting time that you will still pay for.
It’s not worth optimizing functions with low throughput, as the gains are marginal. However, at scale, you can save considerable costs by processing messages in batches instead. You can do this by moving to event sources that support batch processing. Both SQS and Amazon Kinesis allow Lambda functions to process messages in batches.
Accidental Infinite Recursion
You can trigger a Lambda invocation when an object is uploaded to Amazon S3 or when a row has been written to a DynamoDB table. The Lambda function often needs to perform post processing on the data. However, if you modify the data in place, the modification will trigger the same function again. This is the most common scenario in which people create infinite recursions by mistake, and are then surprised by the resulting cost.
My rule-of-thumb is to always use a different S3 bucket or DynamoDB table for the process data. If that is not an option for you, it’s also possible to leave a marker on the data to label it as “processed,” so that when the function is invoked again it knows to ignore it. For example, you can add a Metadata attribute to the processed S3 object in order to distinguish it from the raw input. However, this approach adds complexity and is easy to get wrong or forget, which is why I don’t generally recommend it.
Not Tuning Memory Allocations
A Lambda invocation is charged in GB-seconds. The cost per 100ms of invocation is directly proportional to the amount of memory allocated to the function. Furthermore, CPU time and network bandwidth are also allocated proportionally to the function’s memory allocation.
Functions with more memory will therefore also run faster. However, since invocations are charged in 100ms blocks, there is a lot of room for micro-optimizations.
If a 128MB function averages 102ms, then on average you will be charged for 200ms, or $0.000000416. If upping the memory allocation to 192MB can bring the average invocation time below 100ms, then you will lower the function’s average cost to $0.000000313, saving 25% per invocation.
Equally, suppose a 1024MB function averages 20ms, which is far below the thresholds defined in your business requirements. In this case, you can save money and still deliver a satisfactory level of performance by giving the function less memory.
Finding the sweet spot between cost and performance can be tricky. Fortunately, Alex Casalboni, Senior Tech Evangelist at AWS, has an elegant solution for automating the fine-tuning process using Step Functions. You can read his excellent post on the subject here.
Not Considering the Cost of Event Sources
AWS Lambda is almost always used with some event source, such as API Gateway, Amazon Kinesis Data Streams, or SNS topics. It’s important to take into account the cost of these event sources, as they can be much more expensive than AWS Lambda itself.
API Gateway, for example, is charged at $3.50 per million API calls received, plus data transfer charges. In practice, API Gateway often costs more than AWS Lambda, sometimes several times more. In fact, at scale, API Gateway can be so expensive that you might wish to rewrite your API to run on containers or VMs.
Similarly, Step Functions is charged at $25 per million state transitions, which makes it one of the most expensive AWS services. Have a look at my previous post to see when you should consider using Step Functions.
Besides API Gateway, AWS Lambda is often used with SNS, SQS, or Amazon Kinesis to perform background processing. When projecting the cost of these event sources, you need to take scale into consideration.
A common argument used against Amazon Kinesis is that paying for shard hours makes it much more expensive than SNS/SQS. This is true when the throughput is low, as you can see below.
However, as the throughput increases, the cost of SNS/SQS grows much more rapidly. This owes to a much higher cost per million requests for these services compared to Amazon Kinesis. At even a moderate scale of 1,000 messages per second, the cost of SNS can be significantly higher than Amazon Kinesis. Not to mention the fact that Amazon Kinesis supports batching so a single Lambda invocation can process multiple messages, which also leads to lower AWS Lambda costs.
Not Considering the Cost of Peripheral Services
Whenever a Lambda function writes to stdout, the content is captured and shipped to CloudWatch Logs asynchronously. This is great, as you get log delivery out of the box. But CloudWatch Logs is not free. In fact, at $0.50 per GB ingested, many people are finding that they spend more on CloudWatch Logs than the Lambda invocations that generated the logs.
To ensure your CloudWatch Logs cost stays low, while still keeping some of your debug logs in production to help diagnose issues, consider sampling debug logs in production.
Emerging Tools for Managing Serverless Cost
While it is still an emerging space, quite a number of vendors are already working on tools to help you better manage the cost of your serverless architecture.
Epsagon is an observability tool for microservices and serverless. Where it stands out from the rest is its ability to trace transactions as data flows through many components, like Lambda functions and event sources, as well as other non-AWS services that your functions depend on.
It also provides a high-level overview of the cost of your Lambda functions in the dashboard, which includes a list of the most invoked functions, along with their monthly cost.
In summary, here are the common mistakes that can lead to a surprise in your serverless bill, and tips on how you can mitigate them:
|Too much parallelism during fan-out.||Use event sources that support batching.|
|Accidental infinite recursions.||Use a separate Amazon S3 bucket/DynamoDB table for processed data.|
|Not tuning memory allocations.||Be specific about your latency requirements, and choose the lowest memory allocation that meets the requirements. Use Alex Casalboni’s technique to autotune AWS Lambda memory allocations.|
|Not considering the cost of event sources.||Refer to AWS pricing page for the event source, and calculate the cost for your required monthly throughput.|
|Not considering the cost of peripheral services.||Sample debug logs in production.|
In addition to these problems and tips, we also looked at how Epsagon can help you monitor your serverless bill.
As I mentioned already, serverless is an emerging and quickly growing space. Over the months and years to come, the tooling ecosystem will grow and mature—and I, for one, am excited about the future of serverless. In the meantime, I hope this post helps you avoid a few costly mistakes!
If you have enjoyed this post, you might also like these related articles:
Looking for observability into your serverless application? Sign up to Epsagon’s free trial!