Inspiration
Hi, we're Alex, Anuraag, and Michael. These days, it’s very easy for people to just skim news headlines without truly understanding what is going on in the world. Personally, we've noticed this tendency with ourselves and others, and felt inspired to harness modern computational techniques to address this issue.
What it does
Our web app uses novel unsupervised NLP methods to generate quizzes on trending news articles, with the goal of increasing news awareness and comprehension through fun and engaging methods.
To begin, a news article's raw text is extracted, and passed into a preprocessor, generating some "cleaned" article snippets. From there, GPT-3 is used for the first time in our pipeline, where novel prompting techniques that we designed are used to force GPT-3 into generating short, simple sentences from these snippets. Following this, the simple sentences are passed into a neural coreference resolution model to reduce ambiguity that may be present in the sentence (the rationale behind this is described a bit more in the Challenges section below. Finally, these simple, disambiguated sentences are passed into GPT-3, once again using novel prompting techniques that we designed, to generate question and answer pairs.
As an example, considering our entire pipeline, it is able to take a phrase such as:
In a post on Telegram, presidential adviser Kyrylo Timoshenko listed the number of people in each region without power, a total of about 1.5 million.
Simplify it into:
There are 1.5 million people in regions without power.
And then generating the question/answer pair we’re seeking:
Q: How many people are in regions without power?
A: 1.5 million
From there, these question and answer pairs are fed into our web app, which displays them in a traditional free-response quiz format. The user's answer is then compared to the "known" answer, and a score is presented to the user at the end.
We pull in some of the most popular news articles each day, generating a “daily digest” quiz, but our method of quiz generation is uniquely flexible and scalable, so we also allow the user to input any news article URL of their choosing for quiz generation.
How we built it
Prior to embarking on this, we conducted some research that we intended to build off of:
We took a lot of inspiration from recent papers like Language Models are Few-Shot Learners and Emergent Abilities of Large Language Models on the subject of prompt-engineering for large language models. These papers show that large language models are capable of performing meaning-dependent tasks with surprising accuracy after being shown only a few examples (and being explicitly trained on none). For our use-case this means we could leverage the ability of a GPT-3 size model without having to go through the computationally expensive procedure of training or finetuning it.
We iteratively built up a pipeline of GPT-3 prompts, text-processing, and coreference resolution models, performing ablation studies to determine the effects of possible design decisions. Our Challenges we ran into section has more detail on these ablations, but we ended up leveraging the OpenAI API for GPT-3 access and using a small model to perform coreference resolution to improve the fidelity of our questions.
As input to our pipeline, we pull in articles via urls and perform pre-processing using the python libraries newspaper-3k, spacy, and nltk.
Concurrently with developing our processing pipeline, we were also working on building out our web app. We developed the frontend with React and Material UI, and then wrapped our processing pipeline with Flask to expose the relevant data and processing endpoints that we needed to run our quizzes.
Challenges we ran into
We spent a lot of time performing ablations on each step of the question-answer generation pipeline. Initially we devoted a lot of time to implementing and testing a question-generation system based on the PAQ paper, but because that paper emphasizes recall over precision (not suitable for a quiz where precision is important) we ended up scrapping almost the entire system and building our own generation pipeline from scratch.
Due to the finicky nature of GPT-3 prompt engineering we also spent a lot of time designing prompts on cheaper versions of GPT-3 that didn’t scale up well to the larger models that were more difficult to test. In particular the order of examples in our prompts tended to matter a lot (which we only knew to test because of the paper Fantastically Ordered Prompts and Where to Find Them). We ended up creating our own “challenge” test set of article snippets we could test ablations with, which improved the rigor of our experiments and helped us confirm which prompts perform the best.
We also ran into some problems with pronoun ambiguity in sentences. For example, consider the following sentence (which we ran into while testing some early iterations of our pipeline):
Roger Federer won his first major singles title at Wimbledon in 2003 at age 21. He was part of an era where he dominated men's tennis along with Rafael Nadal and Novak Djokovic.
The “he” in the second sentence is ambiguous, and if this sentence were to be used in a downstream question/answer generation task, it could generate confusing and unclear questions. This was a twist we had not entirely expected, but to mitigate this we turned to a coreference resolution model. Coreference resolution is the process of inferring an ambiguous term in a sentence (pronouns are a great example of this!) corresponds to. By integrating such a model into our processing pipeline, we transformed the above into:
Roger Federer won his first major singles title at Wimbledon in 2003 at age 21. Roger Federer was part of an era where Roger Federer dominated men's tennis along with Rafael Nadal and Novak Djokovic.
This disambiguation allowed us to produce coherent and understandable questions on a consistent basis.
Accomplishments that we're proud of
We have a fully functional article processing pipeline, which is capable of producing coherent question and answer pairs, which we can use with our web app to generate a quiz! Achieving that in under 36 hours is pretty awesome, and exceeded all of our expectations.
Also, our web app is fully optimized for both mobile and desktop with a reactive design, so we’re able to target multiple platforms.
What we learned
We learned about the effectiveness of different question-answer generation techniques. Probably Asked Questions (PAQ) turned out to generate noisy QA pairs compared to GPT-3 prompting. We learned how to create a comprehensive prompting design to optimize GPT-3’s ability to understand how to identify key entities in simplified sentences in order to structure questions and answers.
What's next for QuizNews
-Increased gamification (leaderboard and additional features to encourage users to actually read and understand articles daily)
-Refined QA-generation pipeline (more hyperparameter tuning and prompt optimization)
-Categorization of trending news and wider selection of news articles on which to quiz


Log in or sign up for Devpost to join the conversation.