An AI crushed two human pros at StarCraft—but it wasn’t a fair fight
By Timothy B. Lee
8 - 10 minutes
DeepMind, the AI startup Google acquired in 2014, is probably best known for creating the first AI to beat a world champion at Go. So what do you do after mastering one of the world's most challenging board games? You tackle a complex video game. Specifically, DeepMind decided to write an AI to play the realtime strategy game StarCraft II.
StarCraft requires players to gather resources, build dozens of military units, and use them to try to destroy their opponents. StarCraft is particularly challenging for an AI because players must carry out long-term plans over several minutes of gameplay, tweaking them on the fly in the face of enemy counterattacks. DeepMind says that prior to its own effort, no one had come close to designing a StarCraft AI as good as the best human players.
Last Thursday, DeepMind announced a significant breakthrough. The company pitted its AI, dubbed AlphaStar, against two top StarCraft players—Dario "TLO" Wünsch and Grzegorz "MaNa" Komincz. AlphaStar won a five-game series against Wünsch 5-0, then beat Komincz 5-0, too.
AlphaStar may be the strongest StarCraft AI ever created. But it wasn't quite as big of an accomplishment as it might appear at first glance because it wasn't an entirely fair fight.
AlphaStar was trained using "up to 200 years" of virtual gameplay
I'll cop to not fully understanding what all of that means. DeepMind declined to talk to me for this story, and DeepMind has yet to release a forthcoming peer-reviewed paper explaining exactly how AlphaStar works. But DeepMind does explain in some detail how it trained its virtual StarCraft players to get better over time.
The process started by using supervised learning to help agents learn to mimic the strategies of human players. This reinforcement learning technique was sufficient to build a competent StarCraftII bot. DeepMind says that this initial agent "defeated the built-in Elite level AI—around gold level for a human player—in 95% of games."
DeepMind then branched this initial AI into multiple variants, each with a slightly different playing style. All of these agents were thrown into a virtual StarCraft league, with each agent playing others around the clock, learning from their mistakes, and evolving their strategies over time.
"To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays," DeepMind writes. "One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors, but do so by building more of a particular game unit."
According to DeepMind, some agents gained the equivalent of 200 years of practice playing StarCraft against other agents. Over a two-week period, this Darwinian process improved the average skill of the agents dramatically:
At the end of this process, DeepMind selected five of the strongest agents from its virtual menagerie to face off against AlphaStar's human challengers. One consequence of this approach was that the human players faced a different opposing strategy in each game they played against AlphaStar.
AlphaStar had an unfair advantage in its initial games
Last week DeepMind invited two professional StarCraft players and announcers to provide commentary as they replayed some of AlphaStar's 10 games against Wünsch and Komincz. The commentators were blown away by AlphaStar's "micro" capabilities—the ability to make quick tactical decisions in the heat of battle.
This ability was most obvious in Game 4 of AlphaStar's series against Komincz. Komincz was the stronger of the two human players AlphaStar faced, and Game 4 was the closest Komincz came to winning during the five-game series. The climactic battle of the game pitted a Komincz army composed of several different unit types (mostly Immortals, Archons, and Zealots) against an AlphaStar army composed entirely of Stalkers.
Stalkers don't have particularly strong weapons and armor, so they'll generally lose against Immortals and Archons in a head-to-head fight. But Stalkers move fast, and they have a capability called "blink" that allows them to teleport a short distance.
That created an opportunity for AlphaStar: it could attack with a big group of Stalkers, have the front row of stalkers take some damage, and then blink them to the rear of the army before they got killed. Stalker shields gradually recharge, so by continuously rotating their troops, AlphaStar was able to do a lot of damage to the enemy while losing very few of its own units.
The downside of this approach is that it demands constant player attention. The player needs to monitor the health of Stalkers to figure out which ones need to blink away. And that can get tricky, because a StarCraft player often has a lot of other stuff on his plate—he needs to worry about building new units in his base, scouting for enemy bases, watching for enemy attacks, and so forth.
Commentators watching the climactic Game 4 battle between AlphaStar and Komincz marveled at AlphaStar's micro abilities.
"We keep seeing AlphaStar do that trick that you're talking about," said commentator Dan Stemkoski. AlphaStar would attack Komincz's units and "then blink away" before taking significant damage. "I feel like most pros would have lost all these stalkers by now," he added.
AlphaStar's performance was particularly impressive, because at some points he was using this tactic with multiple groups of Stalkers in different locations.
"It's incredibly difficult to do this in a game of StarCraft II, where you micro units on the south side of your screen, but at the same time you also have to do it on the north side," said commentator Kevin "RotterdaM" van der Kooi. "This is phenomenally good control."
"What's really kind of shocking about this is we went over the actions per minute, and it's not really that high," added Stemkoski. "It's an acceptable pro level of speed coming out of AlphaStar."
DeepMind produced a graphic that illustrates the point:
As this chart demonstrates, top StarCraft players can issue instructions to their units very quickly. Grzegorz "MaNa" Komincz averaged 390 actions per minute (more than six actions per second!) over the course of his games against AlphaStar. But of course, a computer program can easily issue thousands of actions per minute, allowing it to exert a level of control over its units that no human player could match.
To avoid that, DeepMind says it put a hard cap on the number of actions per minute AlphaStar could make. "We set a maximum of 600 APMs over 5-second periods, 400 over 15-second periods, 320 over 30-second periods, and 300 over 60-second period," wrote DeepMind researcher Oriol Vinyals in a reddit AMA following the demonstration.
But as other redditors quickly pointed out, five seconds is a long time in a StarCraft game. These limits seem to imply that AlphaStar could take 50 actions in a single second or 15 actions per second for three seconds.
More importantly, AlphaStar has the ability to make its clicks with surgical precision using an API, whereas human players are constrained by the mechanical limits of computer mice. And if you watch a pro like Komincz play, you'll see that the number of raw actions often far exceeds the number of meaningful actions.
For example, if a human player is guiding a single unit on an important mission, he will often issue a series of "move" commands along the unit's current trajectory. Each command barely changes the unit's path, but, if the human player has already selected the unit, it takes hardly any time to click more than once. But most of these commands aren't strictly necessary; an AI like AlphaStar could easily figure out the unit's optimal route and then issue a much smaller number of move commands to achieve the same result.
So limiting the raw number of actions an AI can take to that of a typical human does not necessarily mean that the number of meaningful actions will be remotely comparable.
And as I mentioned earlier, the API used by AlphaStar in its initial games gave it a godlike view of the entire battlefield (albeit only those portions of the battlefield within range of one of AlphaStar's units). If a human player wants to take actions in two different parts of the board, he has to first take the extra step of moving the camera to the new location.
Forcing AlphaStar to use a camera helped level the playing field
To their credit, DeepMind was cognizant of this issue. So after playing back some of AlphaZero's back-to-back 5-0 victories over StarCraft pros, the company staged a final live match between AlphaStar and Komincz. This match used a new version of AlphaStar with an important new limitation: it was forced to use a camera view that tried to simulate the limitations of the human StarCraft interface. The new interface only allowed AlphaStar to see a small portion of the battlefield at once, and it could only issue orders to units that were in its current field of view.
DeepMind had several weeks to train this new version of AlphaStar, but last week it still seemed significantly weaker than the version Komincz played the month before.
In the early minutes of the live exhibition game, Komincz held his own, easily parrying AlphaStar's attacks. Then he launched a devious counterattack.
Komincz loaded two powerful Immortal units into a transport ship called a Warp Prism and flew them into the back of AlphaStar's base, where fragile probes were gathering the minerals that power AlphaStar's war machine. He dropped the Immortals into the base and began blasting away at the probes.
AlphaStar had—once again—built a large army of Stalkers, which it immediately dispatched to defend the probes. But before the Stalkers could get into range of the Immortals, Komincz loaded them back into the Warp Prism and zoomed them out of range. Once the Warp Prism was gone, AlphaStar sent its stalker army back out toward Komincz's base.
Komincz then repeated the gambit: drop the Immortals, destroy a couple of probes, then pick the Immortals up again just before the Stalkers arrive. He did it again. And again. As he did this, AlphaStar's Stalker army wasted precious seconds marching back and forth indecisively.
"This is what I'm used to seeing when humans battle AIs," Stemkoski said, as Komincz dropped the Immortals in AlphaStar's base for the third time. "You're finding something that they're doing that's a mistake and you're making them do it over, and over, and over."
While the Warp Prism gambit kept AlphaStar's large Stalker army occupied, Komincz amassed a large army of his own. He marched it straight into AlphaStar's base and attacked the probes gathering resources for AlphaStar's war machine. AlphaStar harassed Komincz's army with its Stalkers, but Komincz kept his forces together, and AlphaStar wasn't able to stop him before he could do crippling damage to his base.
We don't know exactly why Komincz won this game after losing the previous five. It doesn't seem like the limitation of the camera view directly explains AlphaStar's inability to respond effectively to the drop attack from the Warp Prism.
But a reasonable conjecture is that the limitations of the camera view degraded AlphaStar's performance across the board, preventing it from producing units quite as effectively or managing its troops with quite the same deadly precision in the opening minutes. That may have given Komincz enough breathing room to find a weakness in AlphaStar's strategic understanding and exploit it.
Ultimately, last week's presentation raised more questions than it answered. DeepMind says that it has seen significant improvements in AlphaStar's performance in as little as a week of training. If that progress continues, AlphaStar may regain a decisive advantage over the best human players even with the new limitations on its API.
On the other hand, it's not yet clear if the camera restrictions introduced last week are sufficient to make the fight truly fair. AlphaStar is still accessing information about the game through a specialized API that's different from the rendering shown to human players. This API may allow the software to glean more information and issue instructions more quickly and precisely than would be possible for a human player.
The ultimate way to level the playing field would be to make AlphaStar use the exact same user interface as human players. The interface could be virtualized, of course, but the game should get the same raw pixel inputs as a human player and should be required to input instructions using a sequence of mouse movements and keystrokes—with inputs limited to speeds that human hands can achieve. This is the only way to be completely sure that AlphaStar isn't giving its software an unfair advantage.