Tencent's AI Plays And Defeats StarCraft II's Built-In "AI" In Full Matches

Curious as we can be, we just love to test AIs by making them play games.

This time, researchers from Chinese technology giant Tencent have developed a pair of AI agents capable of defeating StarCraft II’s (SC2) AI on the highest difficulty levels in full matches. This makes the company the first to do so.

In a published white paper, the researchers explained the creation of the two agents, named TSTARBOT1 and TSTARBOT2.

The first agent acts as a macro-level controller that oversees several specific algorithms designed to handle lower level functions. The second agent, the TSTARBOT2, is a more robust agent. The macro-micro controller consists of several modules capable of handling entire facets of the gameplay independently.

The agents were trained by playing a 1 on 1 game, both using the Zerg race. The training used the Abyssal Reef, a map known to have thwarted neural network AIs from winning against SC2's built-in AIs. But in just a couple of days training, the TSTARBOTS were able to defeat the traditional AI opponent on the hardest setting.

Module diagram for Tencent AI's agents based on the macro-micro hierarchical action
Module diagram for the agents based on the macro-micro hierarchical action

Experts in the AI field have numerous agents to play numerous games. From AI playing Doom game, AI in playing Unreal Tournament 2004, AI in playing Quake III Arena, to OpenAI's AI in defeating world's best Dota 2 player and Google in defeating the world's best Go player to name a few.

But unlike the game of Go or Chess where all pieces are laid on the table in plain sight (allowing AIs to craft and anticipate in advance), StarCraft II is a real-time strategy game in which the players should do careful planning in both managing resources, defending and attacking.

Tencent's AI played SC2 with the "fog-of-war" turned on. What this means, the AI can’t see the enemy AI's units and base, until it scouted the map.

So here, the TSTARBOT were designed to imitate the human thought process, with a lot of information needed to be processed by the agents.

Interestingly, Tencent trained the agents using only a single CPU. But to accommodate the lack of resources, the company used a huge amount of processors to process the amount of data needed to train the bots on billions of frames of videos.

But this gave an advantage of being efficient, as the researchers explained:

"We currently take 1920 parallel actors (with 3840 CPUs across 80 machines) to generate the replay transitions, at the speed of about 16,000 frames per second. This significantly reduces the training time (from weeks to days), and also improves the learning stability thanks to the increased diversity of the explored trajectories."

Overview of Tencent AI's macro-micro hierarchical actions
Overview of the agents' macro-micro hierarchical actions

One major difficulty the researchers needed to tackle, was SC2's three highest difficulty levels featuring "AI" that cheats. For example, in the highest difficult setting (level 10), the computer knows in advance where the resources are, as if there isn't any fog-of-war. The AI can see all units on the map, giving it an unfair advantage.

Tencent's AI plays like regular humans, meaning it doesn't have any advantage: it plays the game using methods similar to a mouse click and macros, and plays exactly the same thing as a human player would. The AI sees the game by interpreting video output in a frame-to-frame basis, and translate the information into data it can work with.

So clearly here, Tencent's AI has a huge disadvantage when playing against SC2's AI.

Tencent managed to win because the two agents were trained using high-level commander paradigm Tencent developed. This specifically keeps track on the overall strategy depending on middle and low-level algorithms for unit-level management. As a result, the AI could play in manners similar to a experienced SC2 human player, rather than a computer opponent.

After playing 100 games with different random seeds, where a tie is counted as 0.5 when calculating the win-rate. "We can see that the agent is able to consistently defeat built-in AIs in all levels, showing the effectiveness of the hierarchical action modeling," said the researchers.

Tencent's AI dominated the game, with wins over 90 percent of the time.