The FaceTime Bug and the Dangers of Implicit State Machines

By David Khourshid

Photo by rawpixel on Unsplash

FaceTime hit a pretty serious bug. Apparently, if you start a FaceTime video call, swipe up, tap “Add Person” before the call is answered, and add yourself, you’ll be able to listen to the person on the other end, before they accept the call.

In retrospect, the immediate question that gets asked is:

How does this get past testing?

Assumptions can be made here, such as the thoroughness of testing (I’m sure their QA team is top-notch) or deadlines (I’m sure this feature was appropriately planned). Here’s the interesting thing: it does not matter.

Why? Because no matter how thoroughly you manually test your software or follow best programming practices, these types of serious bugs are really easy to introduce into production software, and difficult to detect until it’s too late.

In fact, you can have full test coverage, thousands of unit tests, and plenty of integration and end-to-end tests that cover all the happy-paths and many edge cases, and something like this can still sneak in. But that does not mean it’s unpreventable.

If you read the Hacker News thread, you’ll see a few theories on how such a severe bug could manifest itself in production software.

Notice a common theme? State, or more specifically, a state machine related bug seems to be present. I don’t have direct access to the FaceTime code, and most of us can’t be entirely sure what the real cause for this bug is, unless Apple decides to make a technical post-mortem description public.

To be clear, we do not know the root cause of the bug. But regardless of how this bug manifest itself, the problem remains the same: something happened in a state it wasn’t supposed to happen in.

So instead, we’re going to use this as a learning opportunity and “recreate” this bug. We’ll see how easy these bugs are to create, but also how they can be mitigated by modeling our software with explicit state machines.

Imagine that much of the FaceTime code, protocols and all, are already coded, and you are tasked with implementing a (simplified) user story:

As a user, when I swipe up, there should be an Add Person button that adds a person to the group conversation.

Shouldn’t be too difficult. You get to work (pseudocode, or JavaScript, same thing):

function addPersonToGroupChat(person) {
establishAudioConnection(groupChat, person);
}
function onTapAddPerson(person) {
addPersonToGroupChat(person);
}

The logic seems simple — when you tap Add Person, the onTapAddPerson event handler will addPersonToGroupChat(), and that subroutine will establishAudioConnection() between the added person and the group chat.

You write tests for this, such as:

it('should add person to group chat when "Add Person" tapped', () => {
// ...
});

You compile it, test it on a real device or two, stress-test it to see how many people you can add, maybe add some business logic for limiting the number of people that can be added, verify everything works, and call it a day. You’ve just added an important feature, with full test coverage. You feel accomplished.

In hindsight, you notice the obvious mistake; you missed a huge edge-case, but you can’t put your finger on what exactly you did wrong. The code and tests are completely sound, and all you can think of is “I probably should have added an if statement in my event handler to make sure the call is active.”

That’s still the wrong way to think about it, and that only increases code and logic complexity, making it much more difficult to prevent bugs like this in the future, much less add features, detect all edge cases, collaborate with other developers, and properly document the business logic represented in the code.

We need to start getting in the habit of properly modeling our software. State machines to the rescue!

Wikipedia has a useful but technical description on what a finite state machine is. In essence, a finite state machine is a computational model centered around states, events, and transitions between states. To make it simpler, think of it this way:

  • Any software you make can be described in a finite number of states (e.g., idle, loading, success, error )
  • You can only be in one of those states at any given time (e.g., you can’t be in success and error at the same time)
  • You always start at an initial state (e.g., idle)
  • You move from state to state, or transition, based on events (e.g., from the idle state, when the LOAD event occurs, you immediately transition to the loading state)

It’s like the software that you’re used to writing, but with more explicit rules. You might have been used to writing isLoading or isSuccess as boolean flags before, but state machines make it so that you’re not allowed to have isLoading === true && isSuccess === true at the same time.

It also makes it visually clear that event handlers can only do one main thing: forward their events to a state machine. They’re not allowed to “escape” the state machine and execute business logic, just like real-world physical devices: buttons on calculators or ATMs don’t actually do operations or execute actions; rather, they send signals to some central unit that manages state, and that unit decides what should happen when it receives that signal.

All programming languages have primitives that enable you to code explicit state machines. For most simple use-cases, a switch/case block will suffice. Creating state machines is a three-step process:

  1. Identify states (and the initial state)
  2. Identify events (anything that can cause a transition)
  3. Determine transitions (i.e., what is the next state based on the current state and event that just occurred?)

So let’s pretend we went back in time, and you were given the same user story:

As a user, when I swipe up, there should be an Add Person button that adds a person to the group conversation.

Forget about edge-cases or potential bugs for now. Your only goal right now is to accomplish this task within a finite state machine, where you explicitly model the states the program can be in at any given moment in time.

Using a switch statement in a transition function that determines the next state given the current state and event, suppose that this state machine already exists:

function faceTimeMachine(state, event) {
switch (state) {
case "idle":
switch (event.type) {
case "CALL":
// next state should be 'calling'
return "calling";
default:
// event not handled; stay on same state
return state;
}
case "calling":
switch (event.type) {
case "CALL_REJECTED":
return "idle";
case "CALL_ACCEPTED":
return "callActive";
default:
return state;
}
case "callActive":
switch (event.type) {
case "END_CALL":
return "idle";
default:
return state;
}
default:
// should never reach here
return state;
}
}

If we were to visualize this as a state diagram, it would look something like this:

We can clearly see all the possible states in the FaceTime program, as well as which transitions happen between states on certain events. The events are handled based on what state they’re in, rather than being handled in isolation; e.g., it doesn’t make sense to handle END_CALL if you’re not even in a call.

Okay, so we have our task at hand — create an Add Person button and handle what happens when it is tapped:

function onTapAddPerson(person) {
// only allowed to send an event!
// send a 'TAP_ADD_PERSON' event to the service
faceTimeService.send({
type: 'TAP_ADD_PERSON',
person: person
});
}

Where’s the code for adding the person to the group chat, and establishing an audio connection with that person? Well, because of our state machine, all that logic must be handled by the running “service” or whatever you want to call it (the thing that is interpreting and running the state machine). Remember: we’re not allowed to do any logic inside event/IO handlers besides dispatching events to a service that manages state. This is because the logic of our app is determined by both state and event, not just by event, and not just by state. That service knows, at all times, what the current state is, and it can receive events and handle them appropriately.

So now let’s handle that 'TAP_ADD_PERSON' event inside the existing FaceTime state machine instead of the event handler. The first question you’ll ask yourself is:

Which state is this event supposed to be handled in?

And that’s because the design of the state machine is such that you can’t just handle events arbitrarily; they must be handled within the context of the current state.

Looking through the states, you see an appropriate, obvious state: 'callActive'. You think to yourself, “well of course this event shouldn’t be handled unless we’re in an active call…”

    // ...
case "callActive":
switch (event.type) {
case 'TAP_ADD_PERSON':
// execute side-effects
addPersonToGroupChat(event.person);
establishAudioConnection(event.person);
          // stay in the same state
return state;
case "END_CALL":
return "idle";
default:
return state;
}
// ...

Guess what. You mitigated a major iOS FaceTime bug without even realizing it. Take a look at the updated state machine visualization and be proud of yourself:

You can see that it is mathematically impossible for the 'TAP_ADD_PERSON' event to be handled in any state other than 'callActive'; in other words, it’s impossible to establish an audio connection with a person unless the call is active. There is simply no transition that executes that side-effect unless you are in that state, and that is by explicit design.

The important question you asked yourself, “Which state is this event supposed to be handled in?” is an important question guided by the design of the state machine; one that you might not have asked if you were writing this program without an explicit state machine.

I often emphasize that state machines are an extremely important part of designing software. They’re fundamental to computing itself, as all computers are essentially complex networks of inter-communicating state machines. And you are creating state machines in every bit of code that you write without even realizing it. The problem is, you are creating implicit state machines, where the notion of program states and events being handled that might change those states exist, but the design of the state machines are completely scattered throughout your code-base and hidden in event handlers and if-statements and other abstractions.

If you imagine the states of your application as rooms in a house, an implicit state machine is like having doorways and thinking “Hmm… there really ought to be a door there.” When you have guests over, you can simply tell each one of them “don’t go into that room… er, just don’t go through that doorway” and hope that they listen.

Having an explicit state machine in this analogy means that each of these doorways has a door, and the only way to go from one room to the other is to open the door, if allowed (e.g., if you have a key to that door). This is the result of properly modeling your house and anticipating the flow of people from room to room.

Implicit state machines are dangerous because, since the application logic implied by the state machine is scattered around the code base, it only exists in the developers’ heads, and they’re not very well-specified. Nobody can quickly reference what states the application can be in, nor how the states can transition due to events, nor what happens when events occur in certain states, because that crucial information is just not readily available without studying and untangling the code in which it exists.

This means that bugs, such as the FaceTime bug, are hard to detect, because you need to know:

  • Every possible state the application can be in
  • Every possible event that can occur in the application
  • Every possible action/side-effect that can be executed when an event occurs in any given state

Without explicitly enumerating these states, events, and transitions, these are extremely difficult to determine, without spending hours manually analyzing the code. Even worse, since there is little to no constraints on actions occurring due to events (unless you litter your code with if statements), a combinatorial explosion results, and it becomes infeasible to test every possible edge case in your code.

In our pseudocode, we have 3 possible states and 5 possible events. Without an explicit state machine, there are 3 × 5 = 15 different ways the events can interact with your app. With an explicit state machine, where events are strictly handled in certain states, that’s reduced to just 5. And the bug? It’s in one or more of those 15 possibilities.

The solution is simple: explicitly model the states, events, and transitions in your programs. This can be done at a higher-level of abstraction and/or at a lower-level. The idea of using explicit state machines in your application is not a new idea; it’s been around for half a century and is a core part of how mission-critical software has been working in many different industries.

It doesn’t matter what language you’re using. Search “state machine <language>” and you’ll find dozens of articles and techniques for creating and incorporating state machines into your software.

If you want to go deeper, there’s a hierarchical extended formalism for state machines called statecharts, created by David Harel in 1987. There’s even a W3C Spec called SCXML which describes a declarative way of specifying statecharts (albeit in XML… I know, I know).

On the front-end, I’ve created a library called XState which allows you to create, execute, and visualize state machines and statecharts in JavaScript and TypeScript. However, it needs to be said: you do not need any library to start modeling your applications using state machines.

For more resources on using state machines and statecharts in user interfaces, I recommend going to:

I hope that we can make software modeling a priority in all of our applications in 2019 and beyond, so we can avoid introducing “FaceTime bugs” and any other potentially severe (yet easily overlooked) bugs in the future.

P.S. Speaking of state-related bugs, has anyone else noticed how terrible Medium’s editor is? 🤐