Why Can a Machine Beat Mario but not Pokemon?

By Shayaan Jagtap

By now, you’ve probably heard of bots playing video games at superhuman levels. These bots can be programmed explicitly, reacting to set inputs with set outputs, or learn and evolve, reacting in different ways to the same inputs in hopes of finding the optimal responses.

A couple of famous examples are:

  • AlphaZero, a Chess bot that became the greatest player on Earth with 24 hours of training
  • AlphaGo, a Go bot that famously beat world class players Lee Sedol and Ke Jie
  • MarI/O, a Super Mario bot that could learn on its own to speed run any Mario level

These games are complex, and training these machines takes clever combinations of complicated algorithms, repeated simulations, and time. I want to focus on MarI/O and why we can’t use a similar approach to beat a game of Pokemon (watch the video in the link above if you are unfamiliar with how it works).

There are 3 key differences between Mario and Pokemon responsible for this:

  1. Number of Objectives
  2. Branching Factor
  3. Global vs Local Optimization

Let’s compare the games using each of these factors.

The way a machine learns is by optimizing some kind of objective function. Whether it’s maximizing a reward or fitness function (in Reinforcement Learning and Genetic Algorithms) or minimizing a cost function (In Supervised Learning), the goal is analogous: achieve the best score possible.

Mario has one objective: get to the end of the level. Simply put, the further right you get before dying, the better you did. This is your single objective function, and your models ability can be measured directly with this one number.

Pokemon has… many. Let’s try to identify our objective. Is it to beat the Elite 4? To catch all the Pokemon? Train the strongest team? Is it all of the above or something else entirely? Just asking this question is strange, as the answer is probably some personally subjective combination of all of them.

Not only do we have to define the end goal, but also what progress looks like, so each unit of action corresponds to a reward or loss based on many, many possible choices at any one time.

This leads us to the next topic.

Simply put, the branch factor is how many choices you can make at any step. In Chess, the average branching factor is 35; in Go, it’s 250. For every additional step into the future, you have (branching factor)^(steps) number of choices to evaluate.

In Mario, you either go left, right, jump, or do nothing. The number of choices the machine has to evaluate is small. And, the smaller the branching factor, the further ahead into the future the bot can afford to look, computationally.

Pokemon is an open world game, which means you have a lot of choices at any given time. Simply moving up, down, left, or right isn’t a useful measure to calculate the branching factor. Instead, we look at the next meaningful action. Is that next action going to battle, talking to an NPC, or going to the next local area on the left, right, up or down? The number of possible choices range from large to very large as you progress through the game.

Building a machine that can figure out the best set of choices requires it to consider its short and long term goals, which leads us to the final topic.

Local and global optimization can be thought of both spatially and chronologically. Short term goals and the immediate geographic area are considered local, whereas long term goals and relatively larger areas such as cities or even the whole map are considered global.

Breaking down each run into its constituent parts can be a way to decompose the Pokemon problem into bite sized chunks. Optimizing locally to get from point A to point B in one area is easy, but deciding which destination is the best point B is a much harder problem. Greedy algorithms fail here since locally optimal decisions steps do not necessarily lead to a global optimum.

Mario maps are small and linear. Pokemon maps are large, intricate, and non-linear. Your top priority changes over time in pursuit of higher order goals, and translating global objectives into prioritized local optimization problems is no easy task. This is just not something our current models are equipped to handle.

From the perspective of the bots, Pokemon is not just one game. Bots are specialists, and the bot that helps you move around the map is powerless to help when you encounter a NPC who wants to battle. From their perspective, these are two completely different tasks.

When in battle, every turn presents dozens of options. Choosing which move to use, which Pokemon to swap in, and when to use different items is a complex optimization problem all by itself. After looking around, I found this article where someone explains his process at building a battle simulator. It’s well thought out and incredibly complex, and still doesn’t even consider item usage, a key factor for determining battle outcomes.

So far, we should rejoice in the fact that we can build bots that are better at our own games than we are. So far, these games are mathematically complex, but simple in objective. As advancements in AI are made, we will create machines capable of solving increasingly impactful real world problems, all through their own learning of complex optimization problems. Rest assured that there are still things we are better at, including our childhood games — at least for now. Thanks for reading!