PubMed RAG Researcher

About

PubMed RAG Researcher is a Streamlit app for rapid literature review. It searches PubMed abstracts, summarizes findings with Gemini, and supports follow-up Q&A grounded in retrieved papers.

Updated Features

PubMed search with configurable date range and target paper count.
Iterative PMID batching to keep searching until enough valid papers are collected.
Optional free-access filtering behavior through the sidebar toggle.
Real-time retrieval progress UI (progress bar + per-paper status text).
Gemini-generated structured summary with:
- findings synthesis,
- most relevant paper recommendation,
- IEEE-style references.
Vector retrieval for chat using ChromaDB + Google Generative AI embeddings.
Automatic fallback to stored abstracts if vector retrieval is unavailable.
Retrieval-mode transparency in UI (shows whether answer used vector search or fallback).
Session-state chat history and conversation continuity.
PDF export options:
- summary only,
- summary with full follow-up chat log.
Graceful degradation when ChromaDB is unavailable.

Retrieval Flow

Search PubMed with your topic and date range.
Build a local abstract corpus for the current run.
Store documents in ChromaDB (when available).
For follow-up questions, retrieve top-k semantically similar chunks.
Fall back to full stored abstracts if DB retrieval is unavailable.

Tech Stack

Python
Streamlit
Google Gemini API (google-generativeai)
Metapub (PubMed access)
ChromaDB (vector store)
FPDF (PDF generation)
python-dotenv (env configuration)

Demo

PUBMED.RAG.VIDEO.mp4

Setup

Clone this repository.
Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root:

GEMINI_API_KEY=your_key_here
NCBI_API_KEY=your_ncbi_key_here

NCBI_API_KEY is optional but recommended for higher PubMed request throughput.

Run the app:

streamlit run app.py

Testing

This repo currently includes tests for:

pdf_generator.py (smoke + regression coverage)
database_manager.py (ResearchDB behavior and fallbacks)

Recommended (runs both unittest-style and pytest-style tests):

pytest -v tests

If pytest is not installed:

pip install pytest

Current Limitations

Summaries and Q&A are bounded by abstract availability/quality from PubMed records.
If embeddings or vector DB initialization fails, chat falls back to raw abstract context.
Export currently targets PDF only.

Future Improvements

Add citation-level source snippets in chat responses.
Improve ranking with metadata-aware re-ranking (year, study type, relevance score).
Add trend visualizations (publication count over time by topic).
Add optional full-text ingestion for open-access papers.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cache		cache
chroma_db		chroma_db
debug_db		debug_db
debug_db2		debug_db2
manual_test_db		manual_test_db
test_chroma_db		test_chroma_db
test_chroma_db_disabled		test_chroma_db_disabled
test_db		test_db
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
database_manager.py		database_manager.py
debug_inspect.py		debug_inspect.py
fulltext_fetcher.py		fulltext_fetcher.py
pdf_generator.py		pdf_generator.py
requirements.txt		requirements.txt
test_manual.py		test_manual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PubMed RAG Researcher

About

Updated Features

Retrieval Flow

Tech Stack

Demo

Setup

Testing

Current Limitations

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PubMed RAG Researcher

About

Updated Features

Retrieval Flow

Tech Stack

Demo

Setup

Testing

Current Limitations

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages