AnswerFlow

Attention plot shows which words of the question has the model's attention while it composes an answer.

Inspiration

Conversational AI services such as Dialogflow, Lex, or Watson Assistant require creation and management of complex conversation trees. State-of-the-art question-answering systems can learn from question/answer pairs but also require passage context as training data and input. We developed a chatbot trained by feeding it with just question/answer pairs. Our chatbot works well without passage contexts and thus is simpler to operationalize. The chatbot does not use word meanings or linguistic properties such as parts-of-speech or stop words. It can work with non-English questions and answers.

What it does

Answers questions based on Wikipedia pages (Google Natural Questions).

How we built it

We used data from Google's Natural Questions posted on the web https://ai.google.com/research/NaturalQuestions/download

From the data we only use questions and short answers such as these for training the chatbot.

question: what is the orange stuff on my sushi
answer: tobiko

question: who spread the theory that one is a product of the mind and body
answer: rene descartes

question: when did star trek the next generation first air
answer: september 28 , 1987

The training data is in English but the chatbot can work with other languages by feeding it question-answer pairs in another language. The model does not need to know word meanings or use linguistic features like word stems, parts-of-speech, or stop words.

A key feature of our model is attention mechanism. Here is an attention plot of question "where does jinx you owe me a coke come from" with correctly predicted answer.

question attention plot

Here is an attention plot of the question reworded as "jinx u owe me coke come from where?".

another attention plot

The two questions have different utterances. But the intent is the same, which is to ask about origin of a children's game. Attention mechanism assigns a weight to each word in the question which is then used by the chatbot to predict the next word in the answer.

Challenges we ran into

Identifying a useful and efficient collaborative platform. Getting reasonably big data set in a compact form for training and running TF2. The original files were really huge (GBs) and Google-Colabs does not let you store files in the container. So, we adopted the example of 200 data sets, stored only the simplified version (12Mb), and run our code from a Google Drive via the Google-Colabs.

Accomplishments that we're proud of:

For the 200 data set we identified 70 short answers out of 101 possible long answers, that is 69% useful training data. After training we got 76 answers (different from 'unknown') out of the 101 possible questions with long answers, that is is 75% (5 new answers). Within this set there were at least 52 correct answers out of the 76 possible non-trivial answers - accuracy rate 68%. The total training time was 11.72 minutes for 100 epochs, final total_loss 0.1290. There were 2 AnswerFlow answers with similarity above 85% based on spacy 'en_core_web_md' similarity comparison to the correct answer. Below are these two examples:

Question: "where is zimbabwe located in the world map" AnswerFlow answer: "in southern africa , botswana , zambia and mozambique" The answer from Google NQ data was: "in southern africa , between the zambezi and limpopo rivers , bordered by south africa , botswana , zambia and mozambique" Similarity:94.24%

Question: "where is arkansas river located on a map" AnswerFlow answer: "the arkansas river flows through colorado , kansas , oklahoma , and arkansas , and arkansas , and arkansas , and arkansas , and arkansas , and arkansas , and " The answer from Google NQ data was: "the arkansas river flows through colorado , kansas , oklahoma , and arkansas , and its watershed also drains parts of texas , new mexico and missouri. " Similarity:95.19%

What we learned

Google Colab is a powerful playground and let us use TPU and GPU (the TPU runtime environment was more efficient for our training purpose but the GPU was faster)

What's next for AnswerFlow

1) Use larger data set for training on short answers; 2) Do training for long answers; 3) implement a measure of how reasonable is a given answer; 4) implement a QA skin via webpage, app, or another chatbot interface.