Atari - Solving Games with AI🤖 (Part 2: Neuroevolution)
By Greg Surma
13 - 16 minutes
In today’s article, we are going to so solve another Atari game with AI. In Part 1 of the Atari series, we’ve successfully solved Breakout and Space Invaders with Reinforcement Learning.
In Part 2 we are going to use Neuroevolution to solve Atlantis. After the end of this article, you will be able to implement an evolutionary algorithm that can optimize neural networks and beat average human scores in multiple Atari games.
Atlantis is a shooter video game from 1982 for the Atari 2600 and 7800.
The player controls the last defenses of the City of Atlantis against the Gorgon invaders. The city has seven bases, which are vulnerable to attack but only three of them can shoot. The gun bases have fixed cannons, which have triggers controlled by the player, thus game’s action space (possible actions) looks as follows.
[‘NOOP’, ‘FIRE’, ‘RIGHTFIRE’, ‘LEFTFIRE’]
The center base fires straight up, while the far left and far right bases fire diagonally upwards across the screen.
For every hit opponent (Gorgon invader) player scores 100 points.
The goal of the game is to keep Atlantis alive and score as many points as possible.
Before we proceed to the Genetic Algorithm that learns how to play Atlantis, let’s start with Neuroevolution basics.
tl;dr Neuroevolution is an evolution of Neural Networks
I strongly recommend you to check my previous article that covers Genetic Evolution basics.
Similarly to the above example of the evolutionary algorithm used in the game of Snake, we are also going to map specific model weights (chromosomes) with evaluation scores returned by the game environment.
Afterward, we are going to perform an evolution of the population of the chromosomes (model weights) using the ‘survival of the fittest’ principles, which will ultimately lead us to selecting the top chromosomes. While our neuroevolution agent will optimize to successfully play Atlantis, such a system would be able to solve any black box problem with a similar structure (i.e reward function, ease of emulation, etc.).
In other words, we can describe our approach as follows.
We have a Convolutional Neural Network model that takes game state in the form of pixels as an input and returns an action to perform as an output.
Our goal is to optimize the model’s weights so that the network’s outputs give us the best actions/results.
We are going to optimize our network with an Evolutionary Algorithm.
This is how our GE algorithm looks like.
While it may look overwhelming at the first sight, we are going to go through it step by step.
We are selecting the top performers of the population (10%) based on their fitness scores. Only the selected chromosomes will be allowed to procreate and breed a new generation.
For each chromosome, we are performing a gameplay and storing a final score which will be used for an evaluation.
In order to perform a ‘gameplay for chromosome’, we need to set the weights of the model with a specified chromosome and then perform every move based on the model’s predictions. This is a key for mapping chromosomes with the evaluation scores.
You may ask now if we can set the weights of the model with a vector of any shape.
No, we cannot. Our chromosomes need to have the exact shape as our model’s parameters.
To give you a better understanding of this concept, let’s look at the following piece of code that sets the initial population with the random values.
In line 2 we are getting an array of weights from the CNN model. Specific values are not important there because we are going to overwrite them anyway but with such an array, we can easily derive its shape.
Afterward, from line 7 to line 22 we are disentangling the network array with a series of the for-loops that can access every value in the array. Ultimately, we are going to access and set each of the 1 686 180 params.
To wrap things up, in the selection phase each chromosome in the population gets evaluated in the game environment and only the top performers get qualified for the next step.
In the crossover stage, we are going to mix the top chromosomes from the selection phase and get their offsprings to populate a new generation.
We are going to match our parents based on the roulette selection. Without going into the further details (which I already covered), its underlying concept states that the better the score of a single chromosome, the more likely it is to reproduce.
After we have picked our pairs of chromosomes, we actually need to make them perform a crossover.
There are a couple of ways to perform a crossover and in this project, we are going to perform the one in which we randomly change the corresponding genes across the two parents.
After we’ve breed our new population, there is one more step to go that would keep our population genetically diverse - mutation.
The concept is very simple, to maintain a genetic diversity we are going to set random values to stochastically selected (with the probability of 0.01) genes.
Keep in mind that the above code with the for-loop array unwrapping can (and should) be refactored and optimized. I just left it like this to give you a better understanding of how each network’s layer/parameter gets accessed.
Now as we’ve covered how a single generation is being created and evaluated, we can run our algorithm and watch how our population gets better with every generation.
I’ve picked the following hyperparams for our GE algorithm
but you should definitely play with them and hopefully get even better results.
Above training phase took about a week on the 2.9 GHz Intel i7 Quad-Core CPU but there are a couple of ways that can speed up the process:
reduce the network’s size, it’s definitely possible to get decent scores with way less than 1 686 180 params
optimize the chromosome un-wrapping code
speed up the game emulation phase (the biggest bottleneck)
Finally, after ~220 generations of training we can evaluate our final model which is an outcome of the two top performers from the last generation.
Human average: ~29,000
GE average: ~31,000 (106%)
We’ve proved that it’s possible to optimize Neural Network’s weights with a Genetic Evolution algorithm in order to successfully play Atlantis on an above-human level. The very important thing about such algorithm is its general nature. We didn’t have to hardcode any domain-specifics or use heuristics to make it successfully play Atlantis. Keep in mind that, while we focused and trained on a specific game, we could apply above GE algorithm to any of the Atari games without any modifications and I encourage you to do so (it’s super easy with the atari project).
Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.
And don’t forget to 👏 if you enjoyed this article 🙂.