Building your own Portrait Mode using Machine Learning in < 30 minĀ [iOS]

By Christopher Kelly

Took me like 30 minutes to segment this using GIMP — this happens in real time using Fritz!

Image segmentation is a computer vision task that labels each pixel in an image with the type of object it belongs to. We can use that labeled information to build more engaging, immersive experiences.

Take a green screen, for instance. Shooting images where the background is a single, distinct color makes it much easier to swap those pixels out with visual effects, transporting us to new worlds.

But most of us don’t have a green screen lying around. That’s where machine learning comes in. In this post, we’ll use Fritz Image Segmentation, powered by deep learning, to separate people from backgrounds—no green screen required.

We’ll use the Fritz SDK to segment people and blur only the background of an image. It should take us about 30 minutes. After this tutorial, you’ll be able to safely FaceTime your mom without worrying about cleaning your room.

After finishing this tutorial — your personal portrait mode

The completed demo app is available here. For more demos of mobile machine learning-powered features, check out Heartbeat.

First things first, create a new Xcode project. If you want, fork the Fritz Image Segmentation Demo project. Each commit in the demo app will follow a step in this tutorial. Before going on to the next step, make sure your empty project builds (you will likely need to change the Bundle Identifier).

To use the Image Segmentation Feature, you need to create a Fritz account and setup the SDK in your app. This is a quick process that shouldn’t take you more than 5 minutes. The Fritz SDK is available on cocoapods.

  1. Create an account here
  2. Create a project — projects group your apps together, e.g. iOS and Android flavors
  3. Add an app to the project. You’ll need the bundle identifier you used in Step 1.
  4. After following the directions in the webapp, run your app with the Fritz SDK configured

Today, we’re just using the Fritz SDK for image segmentation, but you can also use it to create apps with other machine learning powered features. You can label the contents of an image (think “not hotdog”), detect the location of objects in an image, or stylize images to look like famous paintings. Check out our docs for more information.

After you’ve initialized your app per the instructions in the webapp, you’re ready to start building your Image Segmentation app.

Next, we’re going to add the camera to our app and display live video using our new VideoPreviewView.

First things first, we need to add camera permissions to the Info.plist file. As a key, add “Privacy — Camera Usage Description” with a description:

To display frames from the camera, we need to configure AVCaptureSession and AVCaptureVideoDataOutput objects:

A couple notes. First, we have to set the pixelBuffer format to kCVPixelFormatType_32BGRA for the segmentation model. Also, we’re setting the videoOrientation to .portrait so that the image is fed into the captureOutput delegate function in the correct orientation.

To make sure we have everything hooked up correctly, let’s display the raw video coming from the camera. For reasons that will become apparent in the next step, we’re going to display the camera output via a UIImageView.

When we run the app, we should see normal video displayed. Here we make sure to update the previewView asynchronously so we don’t block the UI Thread:

Run this and you should see live video in your app!

Once you have the video displaying your preview view, it’s time to run the image segmentation model.

First, update your Podfile to include the People Image Segmentation Model.

pod ‘Fritz/VisionSegmentationModel/People’
There are actually three different kinds of models: Living Room, Outdoor, and People. Learn more about these models here

Initialize the image segmentation model as a variable in the view controller and run the model in the captureOutput function:

The model’s output is a FritzVisionSegmentationResult object. The easiest way to work with the result is to call its toImageMask function. The output is a UIImage that can overlay the input image to the model. The color of each pixel represents the class the model predicts. For our people model, black pixels represent people.

The reason we’re using the UIImageView is that this view type lets you pass in another UIImageView as a mask. Pixels where the value is greater than zero in the mask image will let the background shine throuh.

When you build and run this example, you should only see people—nothing else!

At the end of step 4 your app should be able to “see” people. And only people…

The last step will be blurring the background. To do this, we need to add _another_ UIImageView to display the background video. Before adding the blur, let’s add the background video back in.

Now, adding the blur is as simple as adding in a UIVisualEffectView

App with built-in BlurEffectView

However, this blur view is a bit strong. The effect will work better with a configurable blur radius. By defining our own CustomBlurView, we can tailor the effect to our liking:

We’ve built our first app using image segmentation and hopefully have a feel for the power of image segmentation.

There are many next steps we could take. Create an app that puts custom backgrounds behind people, or one that replaces the sky with cheese— there are so many options available. Have an idea? Let us know in the comments!

Discuss this post on Hacker News