Inspiration

Around 45% of people in the US use LLM's like ChatGPT, Claude, and Gemini on a daily basis. Almost none of them fact-check what the LLM feeds them. Unfortunately, according to a comprehensive study by Purdue University, 52% of responses generated by ChatGPT are factually incorrect, especially in areas involving health and sciences.

This inspired us to create LLMuminate, a chrome extension that fact-checks AI responses in real time against reputable peer-reviewed journals like PubMed with the click of a button. With its ease of use and ability to evaluate and score AI responses, LLMuminate provides an efficient way to mitigate misinformation and evaluate LLMs.

What it does

LLMuminate scans your latest AI-generated response and provides the following:

-Similarity Score: Compares the AI response to 10-20 of the most relevant peer-reviewed papers, assigning it a score from 0-100 based on factual consistency.

-Factual Comparison: Extracts key claims from the AI response, and then evaluates their accuracy according to the most relevant scholarly sources. LLMuminate provides direct quotes from these papers as evidence either for, neutral, or against what the original LLM stated to be true. The links to the papers are also provided.

-Non-LLM Sources LLMuminate works on any webpage, or even just on the new tab page in Chrome. If it detects that the user isn't on ChatGPT, Claude, or Gemini, it will provide a text box for the user to paste information to be fact checked.

How we built it

-Keyword Extraction: We use Claude 3.5 Sonnet through the Anthropic API to extract 10-15 keywords from the LLM response, which are then formatted as search queries with boolean logic to ensure the most relevant results, and then plugged into a search on PubMed

-Web Scraping: The resulting articles are scraped using BeautifulSoup and their abstract and full text extracted.

-Similarity Score: Uses Claude 3.5 Sonnet to extract relevant quotes from the research paper and compare them directly to the original AI-generated response, calculating a similarity score based on the factual consistency between the two.

Challenges

-Computation Time: Scraping several research papers and extracting relevant quotes took too much time (up to a minute per query, which diminishes the UX), so we optimized it through prompt engineering and limiting to the abstract and results.

-Similarity Quantification: There's no set standard for scoring the factual similarity between two sources, so we leveraged Anthropic to do so based on discrepancies.

What's next for LLMuminate

-Model Training Applications: Use fact checking during the LLM training process to reduce misinformation and hallucinations.

-Improve Computation Time: Currently, LLMuminate takes between 15-30 seconds to run. Ideally, the chrome extension would be useable in real time with in a conversation with ChatGPT, Claude, Gemini, or anything else.

-Multimodal Capabilities: Fact check voice, video, and image input to support various input and output types for AI responses.

Built With

Share this project:

Updates