An Analysis On How Deepmind’s Starcraft 2 AI’s Superhuman Speed is Probably a Band-Aid Fix For The Limitations of Imitation Learning

By Aleksi Pietikäinen

As all you have probably heard by now, an AI called AlphaStar developed by Google Deepmind has recently beaten human professionals in the real time strategy game Starcraft 2. This is an unprecedented feat in the field of AI. However, I do have some constructive criticism about the way how they did it.

I will try to make a convincing argument for the following:

  1. AlphaStar played with superhuman speed and precision.
  2. Deepmind claimed to have restricted the AI from performing actions that would be physically impossible to a human. They have not succeeded in this and most likely are aware of it.
  3. The reason why AlphaStar is performing at superhuman speeds is most likely due to it’s inability to unlearn the human players tendency to spam click. I suspect Deepmind wanted to restrict it to a more human like performance but they are simply not able to. It’s going to take us some time to work our way to this point but it is the whole reason why I’m writing this so I ask you to have patience.

First of all I want to clarify that I am a layman. I’ve been following AI development and the Starcraft 2 scene for years but I do not claim to be an expert in either topic. If you notice any misconceptions in what I’m about write please do point them out. I’m only a fanboy and all of this is incredibly fascinating to me. This essay will contain a lot of speculation and I admit that I can’t prove all of my core claims definitively. Having said that, if you are so kind to read all of this and disagree with me, please argue in good faith. I would love to be proven wrong.

Lastly I want to emphasize that I do find AlphaStar to be an amazing achievement. It is in my opinion the greatest feat from Deepmind to date and I’m eagerly waiting to see how it continues to improve. Thank you for your patience. Ok, here we go.

David Silver, the Co-Lead of the AlphaStar team: “AlphaStar can’t react faster than a human player can, nor can it execute more clicks than a human player”.
Here’s the lead designer of the AI giving us their mission statement.

The Starcraft 2 scene was dominated in 2018 by a player called Serral. He is the current world champion and won 7 out of 9 major tournaments he attended that year resulting in the single most dominant run of any Starcraft 2 player in the history of the game. This guy is fast. Maybe the fastest player in the world.

Here’s a first person view of him playing:

Serral is the player in the pinkish white color. Take a look at his APM displayed in the upper left corner of the screen. APM is short for actions per minute. Basically it is a number that represents how fast the player is clicking on his mouse and keyboard. At no point is Serral able to sustain more than 500 APM for long. There is a burst of 800 APM but it only lasts a fraction of a second and is most likely resulted from spam clicking, which I will be discussing shortly.

While arguably the fastest human player is able to sustain an impressive 500 APM, AlphaStar had bursts going up to 1500+. These inhuman 1000+ APM bursts sometimes lasted for 5 second stretches and were full of meaningful actions. 1500 actions in a minute translates to 25 actions a second. This is physically impossible for a human to do. I also want you to take into account that in a game of Starcraft 5 seconds is a long time, especially at the very beginning of a big battle. If the superhuman execution during the first 5 seconds gives the AI an upper hand it will win the engagement by a large margin because of the snowball effect. Here’s an engagement from AlphaStar in game 3 vs Mana:

AlphaStar is able to sustain a 1000+ APM over a period of 5 seconds. Another engagement in game 4 had bursts going up to a dizzying 1500+ APM:

One of the commentators points out how the average APM is still acceptable but it is quite clear that the sustained bursts are way higher than what a human could do.

Most human players have a tendency to spam click. Spam clicks are exactly what they sound like. Meaningless clicks that don’t have an effect on anything. For example, a human player might be moving his army and curiously enough, when they click to where they want the army to go, they click more than once. What effect does this have? Nothing. The army won’t walk any faster. A single click would have done the job just as well. Why do they do it then? There are two reasons:

  1. Spam clicking is the natural by-product of a human being trying to click around as fast as possible.
  2. It helps to warm up finger muscles.

Remember the player Serral we talked about earlier? The impressive thing about him is actually not how fast he is clicking but how precise he is. Not only does Serral posses a really high APM (the total clicks per minute, including spam clicks) but also a ridiculously high effective-APM (the total clicks per minute, excluding spam clicks). I will be abbreviating effective-APM as EPM from this point onwards. The important thing to remember is that EPM only counts meaningful actions.

Take a look at how a former proplayer lost his mind on twitter after discovering the EPM of Serral:

Serral’s EPM of 344 is practically unheard of. It is so high that to this day I still have a hard time believing it to be true. The differentiation between APM and EPM has some implications for AlphaStar as well. If AlphaStar can potentially play without spam, wouldn’t this mean that it’s peak EPM could be at times equal to it’s peak APM? This makes the 1000+ spikes even more inhuman. When we also take into consideration that AlphaStar plays with perfect accuracy it’s mechanical capabilities seem downright absurd. It always clicks exactly where it intends to. Humans missclick. AlphaStar might not play with its foot on the gas pedal all the time but when it truly matters, it can execute 4 times faster than the fastest player in the world, with accuracy that the human pro can only dream of.

There is a clear, almost unanimous consensus among the Starcraft 2 scene that AlphaStar performed sequences that no human could ever hope to replicate. It was faster and more precise than what is physically possible. The most mechanically impressive human pro in the world is several times slower. The accuracy can’t even be compared.

David Silver’s claim that AlphaStar can only perform actions that a human player is able to replicate is simply not true.

Oriol Vinyals, the Lead Designer of AlphaStar: It is important that we play the games that we created and collectively agreed on by the community as “grand challenges” . We are trying to build intelligent systems that develop the amazing learning capabilities that we possess, so it is indeed desirable to make our systems learn in a way that’s as “human-like” as possible. As cool as it may sound to push a game to its limits by, for example, playing at very high APMs, that doesn’t really help us measure our agents’ capabilities and progress, making the benchmark useless.

Why is Deepmind interested in restricting the agent to play like a human? Why not just let it run wild with no limitations? The reason is that Starcraft 2 is a game that can be broken by mechanical perfection. In this video a bot attacks a group of tanks with some zerglings implementing perfect micro. Normally the zerglings would not be able to do much against the tanks but thanks to the robots perfect micro, they become much more deadly and are able to destroy the tanks with minimal losses. When the unit control is this good, the AI doesn’t even need to learn strategy. Deepmind is not necessarily interested in creating an AI that can simply beat Starcraft pros, rather they want to use this project as a stepping stone in advancing AI research as a whole. It is deeply unsatisfying to have prominent members of this research project make claims of human-like mechanical limitations when the agent is very obviously breaking them and winning it’s games specifically because it is demonstrating superhuman execution.

AlphaStar is able to outperform human players with unit control that was not taken into consideration when the game developers were carefully balancing the game. This inhuman control can obfuscate any strategic thinking the AI has learned. It can even make the strategic thinking completely unnecessary. This is not the same thing as being stuck in a local maxima. When the game is played with inhuman speed and accuracy, abusing superior control is very likely to be the best and most effective and correct way to play the game. As disappointing as that sounds.

This is what one of the pros who played AlphaStar had to say about the AI’s strengths and weaknesses after losing to it with a score of 1–5:

MaNa: I would say that clearly the best aspect of its game is the unit control. In all of the games when we had a similar unit count, AlphaStar came victorious. The worst aspect from the few games that we were able to play was its stubbornness to tech up. It was so convinced to win with basic units that it barely made anything else and eventually in the exhibition match that did not work out. There weren’t many crucial decision making moments so I would say its mechanics were the reason for victory.

There’s almost unanimous consensus among Starcraft fans that AlphaStar won almost purely because of it’s superhuman speed, reaction times and accuracy. The pros who played against it seem to agree. There was a member of the Deepmind team who played against AlphaStar before they let the pros test it. Most likely he would agree with the assesment as well. David Silver and Oriol Vinyals keep repeating the mantra of how AlphaStar is only able to do things that a human could do as well, but as we have already seen, that is simply not true.

This does not sound like doing things the right way.

Are you sure about that David?

Something about this is really sketchy.

Now we finally get to the meat and potatoes of this essay. Thank you for sticking with me for this long. First let’s recap.

  • We know what APM, EPM and spam clicking are.
  • We have a rudimentary understanding of what the upper limits of human play looks like.
  • We understand that AlphaStars gameplay is in direct contradiction with what the developers claim it was allowed to be able to execute.
  • We understand that the consensus among Starcraft 2 scene is that AlphaStar won the games through superhuman army control and that superior strategic thinking wasn’t even needed.
  • We understand that the goal of Deepmind is not to create a bot that only microes really well or abuse the game in ways it was never meant to be played like.
  • It is incredibly unlikely that no one in Deepmind’s Starcraft AI team questioned that burst APM of 1500+ is possible for a human player to replicate. Their Starcraft guy probably knows more about the game than I do. They are working closely with Blizzard, the company that owns Starcraft IP. It is in their interest (see the previous bullet point and mission statements from David Silver and Oriol Vinyals previously mentioned in this essay) to make the bot act as close to a human as possible.

Taking all those points into account, why would Deepmind ever allow their AI to perform clearly above the limitations of a human body? David Silver and Oriol Vinyals keep hammering home the point that AlphaStar can’t do anything that a human couldn’t replicate but we have seen how that is simply not true.

Here’s what I suspect happened:

This is pure speculation on my part and I don’t claim to know for sure this happened. At the very start of the project Deepmind agrees upon heavy APM restrictions on AlphaStar. At this point the AI is not allowed to have superhuman bursts of speed we saw in the demonstration. If I was designing these restrictions they would probably look something like this:

  • Maximum average APM over the span of a whole game.
  • Maximum burst APM over a short period of time. I think capping it around 4–6 clicks per second would be reasonable. Remember Serral and his 344 EPM that was head and shoulders above his competitors? That is less than 6 clicks per second. The version of AlphaStar that played against Mana was able to perform 25 clicks per second over sustained periods of time. This is so much faster than even the fastest spam clicks a human can do that I don’t think the original restrictions allowed for it.
  • Minimum time between clicks. Even if the speed bursts of the bot were capped, it could still perform almost instantaneous actions at some point during the time slice it was currently occupying and still perform in an inhuman way. A human being obviously could not do this.

Some people would argue for adding a random element on accuracy as well but I suspect that it would hinder the rate of training progress way too much.

Next Deepmind downloads thousands of high-ranking amateur games and begins imitation learning. At this stage the agent is simply trying to imitate what humans do in games. The agent adopts a behavior of spam clicking. This is highly likely because human players spam click so much during games. It is almost certainly the single most repeated pattern of action that humans perform and thus would very likely root itself very deeply into the behavior of the agent.

AlphaStars maximum burst APM has been initially restricted close to how fast a human spam clicks. Because most of the actions Alphastar is executing are spam clicks, it does not have the APM available to experiment in fights. If the agent doesn’t experiment, it won’t learn. Here’s what one of the developers said in an AMA yesterday, I think he tipped his hand a little:

Oriol Vinyals, the Lead Designer of AlphaStar: Training an AI to play with low APM is quite interesting. In the early days, we had agents trained with very low APMs, but they did not micro at all.

In order to speed up development they change APM restrictions to allow high bursts. Here are the APM restrictions that AlphaStar was playing in the demonstration:

Oriol Vinyals, the Lead Designer of AlphaStar: In particular, we set a maximum of 600 APMs over 5 second periods, 400 over 15 second periods, 320 over 30 second periods, and 300 over 60 second period. If the agent issues more actions in such periods, we drop / ignore the actions. These were values taken from human statistics.

At first glance it looks reasonable to someone with a shallow understanding of Starcraft, but it allows for the superhuman bursts of speed we discussed earlier as well as the superhuman mouse precision.

There’s a limit to how fast a human can spam click. The most typical form of spam clicking is issuing a movement or an attack command to a unit. This is done by clicking a place on the map with your mouse. Try clicking your mouse as fast as you can. The agent learnt that kind of spam clicking. It would not be clicking faster because the humans it is imitating are not clicking faster. The extra APM that allows it to go to superhuman speeds can be considered “free” APM wich it can experiment with.

The free APM is used to experiment in engagements. This kind of interaction would happen often while training. AlphaStar starts to learn new kind of behavior that leads to better outcomes and it starts to break away from the constant spam clicking.

If the agent learned actual useful actions why then didn’t Deepmind go back to the speculated initial harsher, more humanlike limitations on APM? Surely they must have realized that their AI was performing superhuman actions. The Starcraft community has almost unanimous consensus that AlphaStar had superhuman micro. The human pros said in the ama that AlphaStars greatest strength was it’s unit control and greatest weakness it’s strategic thinking. The Starcraft people within Deepmind’s team must have been thinking the same. The reason is probably because the agent still occasionally displays spam clicking. Even though it seems to be able to execute crisply with very little spam most of the time in the games it played, it still regularly engage in spam-clicking. This is apparent in game 1 against Mana when Alphastar is moving up the ramp:

Look closely for the blue circle animation thingy.

The agent was spam clicking movement commands at 800 APM. It still had not unlearned it completely even though it is completely useless and eats up it’s APM resources. The spam clicking would hurt the agent most during big engagements and the APM cap was probably tinkered to allow it to perform well even in those.

So there you have it. I suspect that the agent was not able to unlearn spam clicking it picked up from imitating human players and Deepmind had to tinker with the APM cap to allow experimentation. This had unfortunate side effect of superhuman execution which resulted in the agent essentially breaking the game by being able to execute strategies that were never intended to be possible in the first place.

I care about this because the way how Deepmind beat the human pros was in direct contradiction to what their mission statement was and what they repeatedly claimed to be the “right way” to do it. What leaves the sourest taste in my mouth is this image:

It seems to be designed to mislead people unfamiliar with Starcraft 2. It seems to be designed to portray the APM of AlphaStar as reasonable. Look at Mana’s APM and compare it to AlphaStar. While the mean of Mana is higher, the tail of AlphaStar goes way above what any human is capable of doing with any kind of intent or precision. Notice how Mana’s peak APM is around 750 while AlphaStar is above 1500. Now take into account that Mana’s 750 consist over 50% spam clicks and AlphaStar’s EPM consist only of perfectly accurate clicks.

Now take a look at TLO’s APM. The tail goes up to 2000’s. Think about that for a second. How is that even possible? It is made possible by a trick called rapid fire. TLO is not clicking super fast. He is holding down a button and the game is registering this as 2000 APM. The only thing you can do with rapid fire is to spam a spell. That’s it. TLO just over-uses it for some reason. The neat little effect this has is that TLO’s APM’s upper tail is masking AlphaStars burst APM and making it look reasonable to people who are not familiar with Starcraft.

Deepmind’s blog post makes no attempt at explaining TLO’s absurd numbers. If they don’t explain TLO’s funky numbers they should not include them graph. Period.

This is getting dangerously close to lying through statistics. Deepmind has to be held to a higher standard than this.