Chrome Quizzer

Inspiration

Timing for this couldn't be any better. The World Wide Web gave us access to enormous amounts of information on basically any topic imaginable: from science, to pop culture, to sports, to art... A couple of years later, we are living another revolution: AI. AI is ever more present in our daily lives, all the way from a simple chatbot to discovering cures for diseases around the world. For the average person, it is reshaping the way we interact with (and think of) technology.

That's why the marriage between AI and the browser is all the more exciting to me. Not only do common people have access to groundbreaking technology LOCALLY (!), but for the first time, the browser is able to understand the content being preserved.

For me, reading so much content every day on very different topics can get very overwhelming, and I am never 100% sure if I grasped the whole picture, especially if I am learning a new topic. So I thought, it would be great if there was a tool that can "quizz" me on the topic. So I built Quizzer.

What it does

Quizzer is a Chrome Extension that uses Google Chrome Built-in APIs to extract relevant content from the current tab and generate a summary, quizzes, and games related to the content. After answering all the questions, it also generates a list of followup suggestions consisting of: categories, summaries, recommendations, and followup search terms to further dive into the topic.

How we built it

Tab and Article Extraction

Quizzer uses Mozilla's Readability.js to parse the current tab and extract the relevant article, hence stripping it of any distractions. This data is then fed into the models.

Model Acquisition

Quizzer uses Google Chrome's Summarizer API and LanguageModel/Prompt API, both of which need to be enabled and downloaded for their first use. Hence, the class ModelAcquisition provides common functionality for determining availability, downloading and cacheing the models for future reuse. These models are then cloned for each "quiz generation" to avoid polluting with unnecessary context.

Service Worker

A service worker was setup to handle asynchronous tasks such as prompting the models, or executing chron jobs.

getTabData - Extracts data from the current browser tab.
generateSummary - Based of the article content, generates a summary using the Summarizer API
generateQuizData - Based of the article content, generates a quiz using the LanguageModel API with {responseConstraint: quizSchema} using a custom json schema
generateCrossword - Based of the article content, generates a list of words and clues using the LanguageModel API with {responseConstraint: crosswordSchema} using a custom json schema that is then fed into Michael Wehar's crossword layout generator NPM package that returns the layout information for a crossword puzzle
generateSuggestions - This runs on a chron in the background. Based of the most recent quiz answer attempts, generates a list of learning suggestions including categories, summary, recommendations, and followup search terms, using the LanguageModel API.
generateFlashCard - This executes whenever the command on the context menu is selected and uses the LanguageModel API with {responseConstraint: flashcardSchema} to generate flashcards, store them in Chrome's local storage, and display them in the sidebar.
evaluateDrawing - Receives a base64-encoded image and transforms it into a bitmap image to be fed into the LanguageModel API with multimodal functionality to generate a score for each drawing.

The service worker interfaces with the frontend using Google Chrome Messaging API (chrome.runtime.sendMessage and chrome.runtime.onMessage) for more efficient communication.

Sidebar and Styling

The sidebar communicates with the service worker and loads the responses asynchronously. Here, I really focused on getting the UI right. I wanted it to be non-distracting, yet feel inviting and intuitive.

I used css variables whenever possible to standarize the look and feel, and to support both adaptive dark and light themes.

At the top it consists mostly of two buttons: one for generating a new quiz, and another one for visualizing the suggestions in the dashboard. Then, a pill displays the title and favicon of the relevant tab for better clarity and context, followed by a quick summary of the tab's contents. I wanted the score to be very big and visible so that it feels inviting.

Questions, Crossword and Hangman are each rendered using a custom web component QuestionComponent, CrosswordComponent and HangmanComponent to build an extensible and atomic UI.

Flashcards are rendered using a custom web component called FlashCardComponent and displayed using fun microinteractions, such as random tilt and hover effects.

Challenges we ran into

Being a novel API, the amount of learning resources was very limited, but the API´s were really straightforward to use.

One challenge I ran into was figuring out the best way of handling concurrent requests to the same model, specifically for generating the suggestions immediately after the user attempted an answer. When done several times in a row, this would create a bottleneck. Eventually, I decided it would be more efficient to generate the new suggestions every 60 minutes regardless.

Another challenge I ran into was the Prompt API not generating the expected answers or formats. This was quickly solved by using a custom JSON schema, which also allowed me to build more robust code.

The multimodal mode seems powerful but due to the size of the model it can be a bit buggy at times.

Accomplishments that we're proud of

I am really happy with the way the project turned out. Through asynchronous programming, performance turned out to be really well. Likewise, I was really impressed by the prompt API's ability to generate structured content based of a JSON Schema (I really liked not having to build another data type. Kudos for the team for embracing an already great standard), which allowed me to build this even further than I expected.

I explored every possibility and tried catching exceptions early in a way that gives good feedback to the user, eliminating the "it seems broken" feeling.

I also included several quality of life UI features such as automatic links to search suggestions (using the browser's default search engine), automatic light/dark mode switching with matching colors to ensure high contrast, instant feedback on questions and games, random rotation on flashcards, etc.

What we learned

I learned an awful lot about generative AI in this project; both from a prompting perspective (i.e., which are the best prompts to write), but also from a possibility perspective. API's are really powerful and easy to use once you get the hang of them, and I can't wait to see more examples in action.

Likewise, this was my first time building a Chrome extension, and it give me very good insights into the way the extensions I use every day (and the browser itself) work.

What's next for Chrome Quizzer

I want to take this to new levels as time goes on. I will try to keep it updated to use the latest and greatest on-device APIs.

I want to integrate with Gemini on the cloud to generate images that are relevant to the content of the article.
I want to support more languages both on the UI and the prompt's replies themselves.
I want to add more games down the line to make the experience more inviting.
I would like to add a scoreboard so that users can compete with friends and feel more enticed to use the extension and learn more.
I would like it to be able to read more types of documents (e.g., pdfs, images, videos, audios, etc)
The tool is open source, so I would like lots of people to use it and give feedback so I can further improve it with time.