We are building a data hub within AWS where we obviously use several S3 buckets. As the whole data hub is built upon serverless technologies we also use several Lambdas and sometimes they get triggered when new files arrive in one of our S3 buckets.
AWS offers S3 bucket notifications for sending events to Lambdas and also other services like SQS and SNS. In plenty of places, we are using this to invoke our Lambdas which then can act upon these new files.
This works great! Really. It is super fast and easy to set up, the Lambdas will be invoked directly after a new file arrived and therefore it works like a charm.
In the AWS documentation, you can find information, that it may take a minute or longer, but we were not yet able to notice this and during all our tests the target Lambda was invoked in a few seconds, typically it felt like it was invoked instantaneously.
“In Amazon S3, event notifications are designed to be delivered at least once. Typically, they are delivered in seconds but can sometimes take a minute or longer.”
… when I am happy with it, why wouldn’t I choose S3 bucket notifications again? The main reason is exactly this:
“Notification configurations that use
Filtercannot define filtering rules with overlapping prefixes, overlapping suffixes, or prefix and suffix overlapping.”
And even if you don’t define any filters at all you are not able to define more than one event rule for the same event type (e.g. two rules without any filter for
ObjectCreated events will not work).
We knew about this restriction, but as we always just had in mind to set up one target for the same filter settings we were fine with it until Murphy’s law kicked in and within one sprint we noticed that we have two tickets for two buckets where we have to add a new notification target, but there is already one with the same filter settings.
Whew! We had to completely rethink our notification design for these buckets without having too much time for it because we planned to finish these tickets within the sprint.
In the end, we found a quite easy solution that we will now use for all new notifications we have to set up and maybe we will even switch some or all of our other bucket notifications to this approach so that we don’t need to refactor at the end so many things at once.
That’s the reason why I would always go again for this solution as long as I am not really sure that we will never need to set up two notifications for the same filter settings.
Solution A: Use AWS SNS
One of the often-mentioned solutions is to use AWS SNS for any kind of event fanout. I will not go into details, because you can find this solution described in the following blog post:
This is a reasonable solution and also quite easy to set up, but we decided against it mainly due to the following reason:
- You have to set up one SNS topic for every group of targets sharing the same notification configuration (filters, event type, bucket), so you might end up with a lot of new topics.
- SNS will forward the S3 event as an SNS message, meaning that the target will receive an SNS message where the
messageattribute contains the whole S3 event as a JSON String payload. Meaning you first have to unwrap this to get the S3 event and your target must be aware of your infrastructure.
Anyway, if you expect a lot of targets (e.g. many thousands of e-mail subscribers) then SNS is most likely your way to go!
Solution B: Use AWS EventBridge
What do you want to forward? Exactly. Events! And AWS EventBridge even has “event” in its name, so that’s the way I recommend it!
Obviously… I can recommend it not just because of the name, but to see an example architecture have a look at this article:
The funny thing is that we came to a very similar solution without knowing this article. Basically, we also decided to use AWS EventBridge to forward the events to our targets, but instead of CloudTrail we have created a small Lambda that will be called by S3 bucket notifications, but directly forwards the input to AWS EventBridge so that we can then forward it to any other target.
So okay, to be honest, we still use S3 bucket notifications… but just for forwarding our events to EventBridge and the event rules will then take care of invoking our targets. We decided to never set up any other S3 bucket notifications except this one for forwarding it to EventBridge.
Some remarks from my side why I most likely will not decide to change our architecture to what is described in the AWS article:
- In my opinion, CloudTrail should have a different focus than creating events, I think it should be used for auditing purposes and not “misused” just to get the bucket events in EventBridge. Anyway, it is possible and if you are happy with it — do it! :)
- CloudTrail requires a logging bucket, so every event will be written to a bucket even if you just need the event in EventBridge, of course, you can set up a retention policy, but I also think this is an avoidable overhead.
- Pricing.. okay, both solutions are most likely really cheap so most likely the development costs are much higher. But in general, I expect CloudTrail + a logging bucket will be more expensive than just invoking a Lambda that will not need much memory and will finish work after a few milliseconds. But I didn’t yet calculate the costs, so if I am wrong with it let me know 😄.
AWS EventBridge allows filtering of the events based on event patterns, so you can easily set up a rule that will just invoke the target if certain conditions match. Compared to regular S3 bucket notification settings you can also create much more conditions, e.g. you can even restrict by object size or other metadata available in the event if you like, and also restricting on object key prefixes (=filter prefixes) works as well.
But what you can’t do: Restrict on object key suffix. Meaning if you want to forward notifications only for files e.g. ending with
.json you can’t use this approach and your target Lambda will be invoked for all objects and has to take care of the filtering.
Check out the EventBridge documentation about the possible filter event patterns. At least we placed a feature request to get suffix (or even better regex) filtering as well, if you need this feature — please do it as well!
Why I love this approach
It’s easy to set up — really.. do it as described in the AWS article or as I proposed with a small Lambda in between that you can even reuse for all your buckets and you simply have your events available in your event bus.
Because it is so simple I would always prefer this small overhead than having the possibility open that I, later on, have to migrate to a solution like this when I want to add one more notification target to a bucket.
Some other plus points:
- Decoupling: As soon as the bucket events are available in the event bus other components that want to be invoked from these components need to set up a new event rule and no changes at bucket level itself are required anymore.
- More targets: EventBridge allows routing your events to way more targets than the regular S3 bucket notifications do.
- Cross-account: Our system is split over several AWS accounts and EventBridge lets you connect your event buses so that you can easily forward events to a different account and act on them in this one.
- Hiding infrastructure: Event rules will not just decide which events should be forwarded, they also allow you to transform the event and forward just the input you want to forward. Meaning things like unwrapping events (as required if SNS is used) are not required and you can even reformat the whole event and e.g. just forward the object key and not the whole event. Thus your target doesn’t even have to know that this event was triggered by S3 itself.