Create a Highly Scalable Image Processing Service on AWS Lambda and API Gateway in 10 Minutes


Like all serverless articles I need to start this one with an explanation on what is serverless.

Serverless is not serverless, it’s just thinking about servers, less.

Now with that out of the way, let’s see how you can get a simple Python 3.6 application running that

  • converts an input photo into a black and white photo
  • has OpenCV Python dependency

accessible through an API that

  • accepts binary data payload (jpeg)
  • returns binary data payload (jpeg)

all without

  • provisioning a server
  • paying for it*

*Unless your traffic exceeds the generous Lambda free pricing tier.

First make sure you have Docker installed.

For this example we’ll use a service from AWS called Lambda that allows us to deploy our function and its dependencies and easily connect it to an API. In order to create the API we’ll use API Gateway — service also provided by AWS.

For the simplicity of this tutorial we’ll deploy our code by uploading it to Lambda via the AWS Web Console. We’ll also write our function code inside the AWS console to keep things simple. In any serious case you would do deployments via AWS CLI.

  1. Start by logging in to the AWS Console and search for Lambda.

2. Click on Create function.

3. Set up the function parametres.
We’re naming our function lambda-demo. Make sure to pick Python 3.6 as the Runtime and create a new role from AWS policy templates.

4. After creating the function, you’ll be given some template code in the Lambda console.

import json
def lambda_handler(event, context):
# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

You can invoke this function right away by configuring a test event. Click on Test and configure the first test event. For the purposes of this article, the default tempate works fine.

After creating the test event, click on Test. You should receive the following in the function logs:

{
"statusCode": 200,
"body": "\"Hello from Lambda!\""
}

Brilliant.

Now let’s create something more useful than that.

Let’s build a function that takes an image as input and turns it to grayscale. We will be using OpenCV for that, specifically its Python bindings. Although using OpenCV might be overkill for such a task, it demonstrates how such a useful library can be included in your Lambda environment with relative ease.

We will now

  1. Generate a Lambda-ready Python package for OpenCV.
  2. Upload that package into Lambda Layers so it can be used in any function you build.
  3. Import OpenCV to our Lambda function.

I’ve put together a dead simple tool — a Docker image that can gather any pip package and generate a .ZIP we can upload to Lambda Layers. If you want to explore the tool you can find it from LambdaZipper.

If you have Docker installed you can open your terminal and just run

docker run --rm -v $(pwd):/package tiivik/lambdazipper opencv-python

That’s it! In your current working directory you’ll find opencv-python.zip

One of the most useful serverless toolkits is serverless. However we are not going to use it in this example. Re-inventing the wheel is rarely a good idea, with an exception when you want to learn how things work under the hood. Although mature frameworks such as serverless exist, it is a good idea to dig into some of the core functionalities these frameworks abstract.

Let’s explore what the tool abstracted from us.

If you take a look at package.sh then you can see that it performed a pip install command with opencv-python argument. All that was executed in amazonlinux:2017.03 environment that, to some extent, mimics the AWS Lambda environment. You can explore the execution environment in the Dockerfile.

Let’s upload the opencv-python.zip to Lambda Layers so we can use that package from now on in all our functions. Think of Layers as data that can be used in any function you write. This can be Python modules, code snippets, binary files or anything.

Navigate to Layers panel in AWS Lambda and press Create layer.

Set up the layer name, description and upload the zip file. Make sure to select the correct runtime, in our case Python 3.6. Press Create layer.

As of writing this article, uploading the ZIP file from the web interface is limited to 50MB. Fortunately our opencv-python package is less than that. In case your package exceeds that you can provide the package as a link from an S3 bucket. Bear in mind that Lambda sets deployment package limit at 250MB.

After creating the function you should be greeted with a message

Successfully created layer opencv-python version 1.

Yay!

Let’s go back to our lambda-demo function and add the opencv-python layer to our function execution environment. Click on Layers > Add a layer and select your opencv-python layer.

Let’s try importing the library as usual:

import json
import cv2
def lambda_handler(event, context):
# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

Now let’s click on Test. We’re provided with the response:

Response:
{
"errorMessage": "Unable to import module 'lambda_function'"
}

For some reason Lambda wasn’t able to find our Python package. Let’s explore.

By default all Lambda layers are mounted to /opt. Let’s comment out our cv2 import and take a look what’s inside /opt.

import json
#import cv2
from os import listdir
def lambda_handler(event, context):
# TODO implement
print(listdir("/opt"))
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

In the function logs we can see our cv2 module in /opt.

[‘bin’, ‘cv2’, ‘numpy’, ‘numpy-1.16.2.dist-info’, ‘opencv_python-4.0.0.21.dist-info’]

By default /opt/bin is added to the $PATH environment variable. You can reference that from AWS docs. However our layer modules exist in /opt/ not in /opt/bin. So, let’s include /opt into$PATH as well so Lambda can see our package.

In Environment Variables section, add the following environment variable. Key: PYTHONPATH Value: /opt/

Typically you can just import the package without altering the path, but in this case it’s necessary for Lambda environment to detect our package.

Let’s modify our code:

import json
import cv2
def lambda_handler(event, context):
# TODO implement
print(cv2.__version__)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

Save your changes and click Test. We are greeted with4.0.0 in the console, informing us the OpenCV version used.

Brilliant, Python OpenCV running in Lambda!

Let’s continue implementing the core application logic — converting images into grayscale. Let’s modify our Lambda function code:

import json
import cv2
import base64
def write_to_file(save_path, data):
with open(save_path, "wb") as f:
f.write(base64.b64decode(data))
def lambda_handler(event, context):
# Write request body data into file
write_to_file("/tmp/photo.jpg", event["body"])

# Read the image
image = cv2.imread("/tmp/photo.jpg")

# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Write grayscale image to /tmp
cv2.imwrite("/tmp/gray.jpg", gray)

# Convert grayscale image into utf-8 encoded base64
with open("/tmp/gray.jpg", "rb") as imageFile:
str = base64.b64encode(imageFile.read())
encoded_img = str.decode("utf-8")

# Return the data to API Gateway in base64.
# API Gateway will handle the conversion back to binary.
# Set content-type header as image/jpeg.

return {
"isBase64Encoded": True,
"statusCode": 200,
"headers": { "content-type": "image/jpeg"},
"body": encoded_img
}

The API which we will set up in a moment will accept a binary image from the client. The binary image will then be converted into base64 by AWS API Gateway and passed into the Lambda.

Of course the API Gateway is not set up yet so testing this code with our current test will fail. However, before we move into setting up an API that invokes this Lambda, we can test it in the Lambda console by providing a base64 encoded image in the event body.

Reconfigure the Test with this body. In case you’re wondering, this is a base64 encoded cat photo 😺 https://imgur.com/a/0NpkzzL

Invoking this test should now succeed with:

Response:
{
"isBase64Encoded": true,
"statusCode": 200,
"headers": {
"content-type": "image/jpeg"
},
"body": "/9j/4AJRgAB.....P+WqHNf//Z" <- long base64 string of black and white image here
}

We’re now ready to set up an API that invokes this Lambda function.

Open AWS API Gateway console. Press Create API.

Create a new REST API, provide API name and description. In this case we’re calling our API lambda-demo.

From Resources > Actions choose Create Method and define a POST method.

For the Integration type choose Lambda Function and pick your Lambda function from the dropdown menu. Enable Use Lambda Proxy integration and hit Save.

We want our API to be able to handle binary data.

From Settings > Binary Media Types click Add Binary Media Type and define the binary types as:

image/jpeg
image/png
*/*

Press Save Changes.

Navigate back to our POST method.

Under Method Response add a Content-Type Response Header and set the type to image/jpeg:

Before publishing the API you can test it by clicking on the Client Test button:

In this case we are providing the base64 image body itself, not a json object. For convenience you can try pasting the raw base64 string from the following link into the Request Body field cat_base64_body.

Response is a base64 string of the black and white photo.

Click on Actions >Deploy API.

Create a new Deployment stage, give it a descriptive name, for example development and press Deploy.

The API is now live and functional! You’ll receive a url where your API is deployed:

https://XXXXX.execute-api.XXXX.amazonaws.com/development

Let’s try it out!

  1. Let’s download the same photo to our local environment
curl https://i.imgur.com/offvirS.jpg -o kitty.jpg

2. Post the image to our API as binary data. Result is a black and white image in your current working directory. 🎉

curl -X POST --data-binary @kitty.jpg https://XXXXX.execute-api.eu-central-1.amazonaws.com/development -o kitty_bw.jpg

Bear in mind that for the simplicity of this tutorial there is no error handling, request validation or authorization set up. For a production application these should be set up in API Gateway and your Lambda function code.

That’s it, I hope you found this guide useful! If you did you can subscribe for future articles here or find me on Twitter. 👏