A brief summary on Reading Wikipedia to Answer Open Domain Questions. By-Danqi Chen, Adam Fisch, Jason Weston and Antoine Bordes.

Rajath Nag Nagaraj (Raj)
5 min readMar 11, 2021

--

This is a brief summary of my understanding on the DrQA system by Danqi Chen et.al.

Source — The original paper from Danqi Chen.

Before jumping to the paper its time to know a little about what is QA in NLP. In simple words a system which can provide answer for the question asked in natural language ex: Who won ICC cricket world cup in 2011? A: India.

Now trying to understand the paper.

The end result of this paper is to build a system which can perform question answering for factoid questions with the help of Wikipedia. For example, who won the 2011 ICC cricket tournament? Ans: India, this is a classic example of end result of the system built by Danqi Chen et.al. The example question is a factoid question which is expecting a result of an event. The system which is built is known as DrQA. The DrQA system has two main parts the document retriever and document reader. The retriever helps finding the relevant articles from Wikipedia (around 5million articles) and the reader identifies the answer spans among the selected articles. The datasets utilized to train the system are SquAD, TREC, WebQuestions etc

A little deeper into the concept of the paper, the Wikipedia is a huge content of information which can be used to answer the question of any domain. The Wikipedia has a lot of potential to help modern machines or model to provide QA if the models are capable of utilizing it or understanding it. The systems can understand or process information from sources like freebase, but the drawback is they are very sparse and cannot provide question and answering. The Wikipedia has vast knowledge or information which is constantly getting updated has posed a difficult task of answering the open-ended questions. The key idea would be in this process is to identify important documents related to the questions and as a second step the answer must be identified from the extracted files. The authors name this process of producing the answer as machine reading at scale. The one important thing of the system built here is it assumes the Wikipedia as a set of articles not as the graph structure which creates a greater flexibility for the switching the source to other documents, articles or even books. The model here is built to be very specific i.e., just one source of information which is Wikipedia unlike other services from IBM, DeepQA etc. which uses multiple resources to provide answers.

The paper addresses evaluating the MRS ( Machine Reading at scale) with the help of various available datasets ex: SquAD. The DrQA as told above consists of two parts document retriever utilizes hashing and TF-IDF to sub select relevant articles to answer the question, the second part a multi layered recurrent neural network to detect the answer in those returned documents. The experiments have shown tremendous results which outperformed the search engine of Wikipedia. The final paper shows the evaluation of system with the help of multitask training and distant supervision i.e., they define some benchmarks on various datasets and will not use single task training.

A little history behind QA was discussed in the paper, in the initial stage the QA was defined finding answers for the unstructured data. The QA with Wikipedia data was attempted previously by Ryu in the year 2014 but they combined content with articles which have answers. There were few other people who worked on the similar QA systems, but the present work only emphasizes on the text from Wikipedia only and it compares with few other systems like YodaQA which uses again combination of sites.

The deeper understanding of each step in DrQA is important to understand, the step one document retrieval is an important step which narrows down the search and collect the articles. The inverted index lookup and term vector model performs better than the Wikipedia engine. In DrQA the questions and answers are treated as TF-IDF weighted bag of words vectors. The further improvement model is extracting the n-grams features and best features are got for bigram.

The document retriever is the second step in the DrQA which consists of RNN model applied to each of the paragraphs and aggregate the answers. Looking a little deeper to understand the working of this step all the words or tokens in a paragraph is represented as a vector which is passed to the RNN as inputs. Here the useful context information is encoded, and the feature vector consists of several parts. The first part is the word embedding, the system here uses 300-dimension glove model, and the model is trained on 1000 frequent words. Exact match is the next part where there are three features which are binary in nature are tried to match with the question. The tokens some features are added manually which represent the token. The last part is aligned question embedding this captures similarity between the question word and tokens.

The goal here is to predict the tokens paragraph wise which are most likely to match the answer. The paragraph vectors and question are fed as input to two classifiers to predict the both the end of answer and there is a bilinear term to capture the probabilities of start and end word of a answer with the help of question. The Wikipedia here is knowledge source and the other datasets are for training the model.

The experiment first provides the evaluations on the individual module which is document reader and retriever. The first the document retriever is evaluated for its performance on various datasets like SquAD then extracting the relevant articles on Wikipedia and results shows the retriever outperforms the Wikipedia engine. The second module which is the reader is now evaluated, a 3 layered LSTMs are used for paragraphs and encoding. Finally, the results on both development and test sets are represented. The model has achieved 70% of match and 79% of F1 score on test set which can match the results on the top performance on SquAD dataset. The system is then represented as the simplest model when compared to all the other available.

Final evaluation of the full system DrQA model answering the open-ended questions is the next goal. The impact of three different versions as result of distant supervision and multitask learning across the training sources. The single modelled trained only on SquAD outperformed all data by the multitask model.

The further improvements can be better training for document reader and aggregating over multiple paragraphs.

On the positive note I felt the system has been evaluated thoroughly on the various data available and individual module in the DrQA is evaluated separately which I felt as a good thing. The results are easier to understand especially the tables are clearly tabulated.

References:

  1. Reading Wikipedia to Answer Open-Domain Questions — Danqi Chen, Adam Fisch, Jason Weston & Antoine Bordes.

--

--

Rajath Nag Nagaraj (Raj)
Rajath Nag Nagaraj (Raj)

Written by Rajath Nag Nagaraj (Raj)

I go by "Raj" and I am a Senior Applied Gen-AI Scientist at a Fortune 100 Company.

No responses yet