Reinforcement Learning promises to solve the problem of designing intelligent agents in a formal but simple framework. At the same time, there exists a large pool of methods to optimize an agent’s policy to maximize return such as value-based methods, policy-based methods, imitation learning, and model-based approaches. These methods themselves have many variants and incremental improvements, mostly driven by a set of major challenges in the field of reinforcement learning. All in all, it is easy to get lost in the large number of publications and subfields of research.
This blog post aims at tackling this massive quantity of approaches and challenges, providing an overview of the different challenges researchers are working on and the methods they devised to solve these problems. This mind map is very far from complete and in large parts driven by my interests, if you have any particular suggestions, please let me know!
What is the goal of Reinforcement Learning? We have introduced the framework to solve the problem of designing intelligent agents. It can be further formalized with ‘an agent that maximizes reward in a particular environment’. Or in the context of AGI, Intelligence has been defined as ‘The ability to achieve goals in a wide range of environments’ by Marcus Hutter and Shane Legg. Marcus Hutter formalized the optimal universal agent AIXI, as I have described in another blog post.
There is a variety of methods to optimize an agents policy. Here we look at different categories and specific implementations of methods to optimize RL policies.
There is a wide range of challenges that current major algorithms do not handle very well yet. Thus, many new specialized algorithms have been developed. To understand where and why specific research is happening, I tried to sort in recent research into the respective challenges they’re addressing.