Open Sourcing Active Question Reformulation with Reinforcement Learning

By Google



Natural language understanding is a significant ongoing focus of Google’s AI research, with application to machine translation, syntactic and semantic parsing, and much more. Importantly, as conversational technology increasingly requires the ability to directly answer users’ questions, one of the most active areas of research we pursue is question answering (QA), a fundamental building block of human dialogue.

Because open sourcing code is a critical component of reproducible research, we are releasing a TensorFlow package for Active Question Answering (ActiveQA), a research project that investigates using reinforcement learning to train artificial agents for question answering. Introduced for the first time in our ICLR 2018 paper “Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”, ActiveQA interacts with QA systems using natural language with the goal of providing better answers.

Active Question Answering
In traditional QA, supervised learning techniques are used in combination with labeled data to train a system that answers arbitrary input questions. While this is effective, it suffers from a lack of ability to deal with uncertainty like humans would, by reformulating questions, issuing multiple searches, evaluating and aggregating responses. Inspired by humans’ ability to "ask the right questions", ActiveQA introduces an agent that repeatedly consults the QA system. In doing so, the agent may reformulate the original question multiple times in order to find the best possible answer. We call this approach active because the agent engages in a dynamic interaction with the QA system, with the goal of improving the quality of the answers returned.

For example, consider the question “When was Tesla born?”. The agent reformulates the question in two different ways: “When is Tesla’s birthday” and “Which year was Tesla born”, retrieving answers to both questions from the QA system. Using all this information it decides to return “July 10 1856”.
What characterizes an ActiveQA system is that it learns to ask questions that lead to good answers. However, because training data in the form of question pairs, with an original question and a more successful variant, is not readily available, ActiveQA uses reinforcement learning, an approach to machine learning concerned with training agents so that they take actions that maximize a reward, while interacting with an environment.

The learning takes place as the ActiveQA agent interacts with the QA system; each question reformulation is evaluated in terms of how good the corresponding answer is, which constitutes the reward. If the answer is good, then the learning algorithm will adjust the model’s parameters so that the question reformulation that lead to the answer is more likely to be generated again, or otherwise less likely, if the answer was bad.

In our paper, we show that it is possible to train such agents to outperform the underlying QA system, the one used to provide answers to reformulations, by asking better questions. This is an important result, as the QA system is already trained with supervised learning to solve the same task. Another compelling finding of our research is that the ActiveQA agent can learn a fairly sophisticated, and still somewhat interpretable, reformulation strategy (the policy in reinforcement learning). The learned policy uses well-known information retrieval techniques such as tf-idf query term re-weighting, the process by which more informative terms are weighted more than generic ones, and word stemming.

Build Your Own ActiveQA System
The TensorFlow ActiveQA package we are releasing consists of three main components, and contains all the code necessary to train and run the ActiveQA agent.
  • A pretrained sequence to sequence model that takes as input a question and returns its reformulations. This task is similar to machine translation, translating from English to English, and indeed the initial model can be used for general paraphrasing. For its implementation we use and customize the TensorFlow Neural Machine Translation Tutorial code. We adapted the code to support training with reinforcement learning, using policy gradient methods.*
  • An answer selection model. The answer selector uses a convolutional neural network and assigns a score to each triplet of original question, reformulation and answer. The selector uses pre-trained, publicly available word embeddings (GloVe).
  • A question answering system (the environment). For this purpose we use BiDAF, a popular question answering system, described in Seo et al. (2017).
We also provide pointers to checkpoints for all the trained models.

Google’s mission is to organize the world's information and make it universally accessible and useful, and we believe that ActiveQA is an important step in realizing that mission. We envision that this research will help us design systems that provide better and more interpretable answers, and hope it will help others develop systems that can interact with the world using natural language.

Acknowledgments
Contributors to this research and release include Alham Fikri Aji, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Alexey Gronskiy, Neil Houlsby, Yannic Kilcher, and Wei Wang.


* The system we reported on in our paper used the TensorFlow sequence-to-sequence code used in Britz et al. (2017). Later, an open source version of the Google Translation model (GNMT) was published as a tutorial. The ActiveQA version released today is based on this more recent, and actively developed implementation. For this reason the released system varies slightly from the paper’s. Nevertheless, the performance and behavior are qualitatively and quantitatively comparable.