OpenAI Five

  • Defeat the world’s top professionals at 1v1

    Achieved last year, this milestone showed our Dota system had learned the mechanical rules of Dota at world-competitive levels in this 1-on-1—using one of the three lanes, one of the three game phases, typically lasting 10 rather than 45 minutes, a single hero, and no neutral creeps, Roshan, warding, or invisibility.

  • Defeat five of the world’s top professionals

    Five will attempt this live at The International in Vancouver’s Rogers Arena this week! We’ve been playing a ladder of increasingly skilled opponents, with pro players as the ultimate opponents. (One way to compare: Benchmark players’ median tournament earnings—$20,000; pro players’ at The International—$600,000.) This milestone will show Five’s ability to navigate the vast complexity and strategy of 5-on-5 Dota.

  • Defeat the world’s top professional team

    In order to achieve an extreme level of teamwork and coordination, pro teams live and work together for many hours a day. Defeating the top team, who will be determined by The International this week, will show that Five’s teamwork and strategic execution can match the highest level achievable by humans.

  • Dota is selected by looking down the list of games on Twitch, picking the most popular one that ran on Linux and had an API.

  • First commit in our Dota repository.

  • First commit in Rapid repository.

  • 1v1 bot beats top professional Dota 2 players at The International 7

  • First game won by a Dota 2 professional by normal gameplay against final 1v1 bot (tried by dozens of pros for thousands of games).

  • First 5v5 results: OpenAI Five beats our scripted bot in exceedingly restricted 5v5 (playing to first tower death, with 5 invulnerable couriers, mirror match, five fixed heroes, no neutrals, runes, shrines, wards, Roshan, or invisibility).

  • OpenAI Five beats in-house OpenAI team at very restricted 5v5 (objective of max net worth at 7 minutes, with 5 invulnerable couriers, mirror match, five fixed heroes, no neutrals, runes, shrines, wards, Roshan, or invisibility).

  • OpenAI Five defeats in-house OpenAI team at fairly restricted 5v5 (5 invulnerable couriers, mirror match, five fixed heroes, no wards, Roshan, or invisibility)

  • OpenAI Five defeats popular casters at the Benchmark in front of a live audience and 100k livestream viewers, with somewhat restricted 5v5 (5 invulnerable couriers, 18 heroes)

  • OpenAI Five to play a team of top professional Dota 2 players at The International 8.

OpenAI Five is a team of five artificial neural networks, which you can think of as simulated “brains” which our team has designed to be well-shaped for learning Dota but start with no knowledge. OpenAI Five sees the world as a list of 20,000 numbers which encode the visible game state (limited to the information a human player is permitted to see), and chooses an action by emitting a list of 8 numbers. The OpenAI team writes code which maps between game state/actions and lists of numbers. Once trained, these neural networks are creatures of pure instinct—their neural networks implement memory but do not otherwise learn further. They play as a team, but we do not design special communication structures—only provide them with an incentive.

OpenAI Five’s neural networks start out with random parameters, and uses our general-purpose training system, Rapid, to learn better parameters. Rapid has OpenAI Five play copies of itself, generating 180 years of gameplay data each day across tens of thousands of simultaneous games, consuming 128,000 CPU cores and 256 GPUs. At each game frame, Rapid computes a numeric reward which is positive when something good has happened (e.g. an allied hero gained experience) and negative when something bad has happened (e.g. an allied hero was killed). Rapid then applies our Proximal Policy Optimization algorithm to update the parameters of the neural network—making actions which occurred soon before positive reward more likely and those soon before negative reward less likely.

Just like humans don’t plan out their muscle movements while planning out their day, the community (OpenAI included) had expected long-term planning to require algorithms which handle short-term and long-term plans separately—perhaps via a hierarchical reinforcement learning breakthrough. But despite its very simple underlying algorithm, OpenAI Five learns professional-level strategies from scratch—no human data provided.

Special thanks to Scott Gray for blazing-fast GPU kernels, Diane Yoon and Larissa Schiavo for help organizing the Benchmark, Jonas Schneider and Jack Clark for help with communications.