One of the most exciting features Amazon Aurora Serverless brings to the table is its ability to go to sleep (pause) when idle. This is a fantastic feature for development and test environments. You get access to a powerful database to run tests quickly, but it goes easy on your wallet as you only pay for storage when the instance is paused.
You can configure Amazon RDS Aurora Serverless to go to sleep after a specified period of time. This can be set to anywhere between five minutes and 24 hours
For this feature to work, however, inactivity has to be complete. If you have so much as a single query or even maintain an idle open connection, Amazon Aurora Serverless will not be able to pause.
This means, for example, that pretty much any monitoring you may have enabled, including our own Percona Monitoring and Management (PMM) will prevent the instance from pausing. It would be great if Amazon RDS Aurora Serverless would allow us to specify user accounts to ignore, or additional service endpoints which should not prevent it from pausing, but currently you need to get by without such monitoring and diagnostic tools, or else enable them only for duration of the test run.
If you’re using Amazon Aurora Serverless to back very low traffic applications, you might consider disabling the automatic pause function, since waking up currently takes quite a while. Otherwise, your users should be prepared for a 30+ seconds wait while Amazon Aurora Serverless activates.
Having such a high time to activate means you need to be mindful of timeout configuration in your test/dev scripts so you do not have to deal with sporadic failures. Or you can also use something like the mysqladmin ping command to activate the instance before your test run.
Some activation experiments
Let’s now take a closer look at Amazon RDS Aurora Serverless activation times. These times are measured for MySQL 5.6 based Aurora Serverless – the only one currently available. I expect numbers could be different in other editions
I measured the time it takes to run a trivial query (SELECT 1) after the instance goes to sleep. You’ll see I manually scaled the Amazon RDS Aurora Serverless instance to a desired capacity in ACU (Aurora Compute Units), and then had the script wait for six minutes to allow for pause to happen before running the query. The test was performed 12 times and the Min/Max/Avg times of these test runs for different settings of ACU are presented above.
You can see there is some variation between min and max times. I would expect to have even higher outliers, so plan for an activation time of more than a minute as a worst case scenario.
Also note that there is an interesting difference in the activation time between instance sizes. While in my tests the smallest possible size (2 ACU) consistently took longer to activate compared to the medium size (8 ACU), the even bigger size (64 ACU) was the slowest of all.
So make no assumptions about how long it would take for instance of given size to wake up with your workload, but rather test it if it is important consideration for you.
In some (rare) cases I also observed some internal timeouts during the resume process:
[root@ip-172-31-16-160 serverless]# mysqladmin ping -h serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com -u user -ppassword
mysqladmin: connect to server at 'serverless-test.cluster-XXXX.us-east-2.rds.amazonaws.com' failed
error: 'Database was unable to resume within timeout period.'
What about Autoscaling?
Finally, you may wonder how such Amazon Aurora Serverless pausing plays with Amazon Aurora Serverless Autoscaling ?
In my tests, I observed that resume always restores the instance size to the same ACU as it was before it was paused. However, this is where pausing configuration matters a great deal. According to this document, Amazon Aurora Serverless will not scale down more frequently than once per 900 seconds. While the document does not clarify over what period of time the conditions initiating scale down – cpu usage, connection usage etc – have to be met for scale down to be triggered, I can see that if the instance is idle for five minutes the scale down is not performed – it is just put to sleep.
At the same time, if you change this default five minute period to a longer time, the idle instance will be automatically scaled down a notch every 900 seconds before it finally goes to sleep. Consequently, when it is awakened it will not be at the last stage at which the load was applied, but instead at the stage it was at when it was scaled down. Also, scaling down is considered an event by itself, which resets the idle counter and delays the pause. For example: if the initial instance scale is 8, and the pause timer is set to 1h, it takes 1h 30 minutes for the pause to actually happen – 30 minutes to do scale down twice, plus 1 hour at the minimum size for pause to trigger
Here is a graph to illustrate this:
This also shows that when the load is re-applied at about 13:47, it recovers to the last number of ACU it had before the pause.
This means that a pause time of more than 15 minutes makes the pause behavior substantially different to the default.
- Amazon Aurora Serverless automatic pause is a great for test/dev environments.
- Resume time is relatively long, can reach as much as one minute.
- Consider disabling automatic pausing for low traffic production applications, or at least let your users know they need to wait when they wake up the application.
- Pause and Resume behavior is different in practice for a pause timeout of more than 15 minutes. Sticking to the default 5 minutes is recommended unless you really know what you’re doing.