A professional Graph + Vector Retrieval project that indexes Markdown documents, builds a lightweight knowledge graph, retrieves relevant chunks with FAISS, expands results through graph concepts, and returns explainable answers with citations and reasoning paths.
Important: This project is an LLM-ready retrieval system, not a fully autonomous knowledge engine.
By default, it uses extractive answer mode, meaning it answers from retrieved passages instead of generating unsupported claims. Optional LLM mode can be enabled with an API key for grounded answer generation.
- Project Overview
- What This Project Does
- What This Project Does Not Do
- Features
- System Behavior
- Architecture
- Answer Modes
- Retrieval Evaluation
- Project Structure
- Installation
- Building the Index
- Running the Backend
- Running the Streamlit App
- API Usage
- Testing
- CI
- Limitations
- Security Notes
- Future Improvements
- Tech Stack
- Author
- License
Many RAG projects behave like black boxes: a question goes in, an answer comes out, and the user cannot easily see why the system selected certain sources.
Graph-RAG-Engine takes a more explainable approach. It combines dense vector search with a lightweight graph layer so retrieved answers can include supporting citations, related concepts, and reasoning paths through documents, chunks, and concepts.
The goal of this project is to demonstrate:
- Hybrid retrieval with vector similarity and graph expansion
- Explainable source paths through a document/chunk/concept graph
- Extractive answers by default for safer local use
- Optional LLM-generated answers when credentials are provided
- Retrieval evaluation using golden queries and ranking metrics
- A testable project structure with CI support
- FastAPI and Streamlit interfaces for practical interaction
This project can:
- Read Markdown documents from
data/docs/ - Split documents into text chunks
- Extract lightweight concepts from each chunk
- Create sentence embeddings with Sentence Transformers
- Build a FAISS vector index
- Build a NetworkX graph connecting documents, chunks, and concepts
- Retrieve relevant chunks for a user question
- Expand retrieval using graph concepts and document relationships
- Rerank results using similarity, concept overlap, and PageRank signals
- Return extractive answers with citations
- Return graph-based reasoning paths
- Recommend related documents
- Run retrieval evaluation with golden queries
- Run unit tests and CI checks
- Optionally generate grounded LLM answers using retrieved context
This project does not:
- Guarantee that every generated or retrieved answer is correct
- Replace human review for important research or business decisions
- Verify facts against the live web
- Use a production graph database by default
- Provide large-scale production observability or monitoring
- Make LLM access mandatory
- Claim that graph expansion always improves retrieval quality
- Protect against unsafe index artifacts from untrusted sources
A production-grade RAG platform would need stronger retrieval evaluation, source governance, monitoring, user feedback loops, access control, observability, and security hardening.
- FAISS vector search over normalized sentence embeddings
- SentenceTransformer embeddings using
all-MiniLM-L6-v2 - NetworkX knowledge graph connecting documents, chunks, and concepts
- Graph expansion through shared concepts
- Hybrid reranking using similarity, concept overlap, and PageRank
- Extractive answer mode as the default behavior
- Optional LLM answer mode through environment variables
- Citations and source paths for explainability
- Document recommendation endpoint
- Golden-query retrieval benchmark
- Retrieval metrics: hit@k, precision@k, recall@k, MRR
- FastAPI backend
- Streamlit frontend
- Unit tests
- GitHub Actions CI
| Component | Current Behavior |
|---|---|
| Document source | Markdown files in data/docs/ |
| Chunking | Simple paragraph-based chunking |
| Concept extraction | Lightweight keyword/frequency-based concept extraction |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Vector search | FAISS inner-product search over normalized embeddings |
| Graph layer | NetworkX graph connecting docs, chunks, and concepts |
| Retrieval expansion | Shared-concept expansion with configurable hop depth |
| Reranking | Embedding similarity + concept overlap + document PageRank |
| Default answer mode | Extractive answer composed from retrieved passages |
| Optional answer mode | LLM-generated answer using retrieved passages as source context |
| Evaluation | Golden-query retrieval benchmark with ranking metrics |
| Interface | FastAPI backend and Streamlit frontend |
Markdown documents
│
▼
Chunking + concept extraction
│
├───────────────► Sentence embeddings ─────► FAISS vector index
│
└───────────────► Docs / chunks / concepts ─► NetworkX graph
│
▼
Question ─► Vector search ─► Graph expansion ─► Hybrid reranking
│
▼
Extractive answer or optional LLM answer
│
▼
Citations + graph explanation paths
| Component | Purpose |
|---|---|
ingest/ |
Builds chunks, concepts, embeddings, FAISS index, and graph artifacts |
backend/retriever.py |
Loads retrieval artifacts lazily and performs hybrid retrieval |
backend/rag.py |
Orchestrates answer generation from retrieved sources |
backend/llm.py |
Provides optional OpenAI-compatible LLM answer generation |
graph/graph_store.py |
Stores and explains graph relationships |
evaluation/ |
Runs retrieval benchmark metrics |
ui/app.py |
Provides a Streamlit interface |
The project supports two answer modes.
Extractive mode is the default.
It uses retrieved passages directly and avoids generating unsupported claims.
{
"question": "What is FAISS?",
"mode": "extractive"
}This mode is useful for:
- local demos
- transparent retrieval debugging
- source-first answer display
- running the system without API keys
LLM mode sends retrieved passages to an OpenAI-compatible chat-completions API and asks the model to answer only from the provided context.
{
"question": "How does FAISS relate to embeddings?",
"mode": "llm"
}If LLM mode is requested without a valid API key, the system falls back to extractive mode and returns an llm_error field.
The project includes a small retrieval benchmark in:
evaluation/golden_queries.json
Run evaluation with:
python -m evaluation.evaluate_retrievalThe evaluation generates a report at:
evaluation/results/retrieval_eval.json
| Metric | Meaning |
|---|---|
hit@k |
Whether at least one relevant document appears in the top-k results |
precision@k |
Fraction of top-k retrieved documents that are relevant |
recall@k |
Fraction of relevant documents retrieved in the top-k results |
MRR |
Reciprocal rank of the first relevant retrieved document |
Example custom run:
python -m evaluation.evaluate_retrieval \
--k 3 \
--base-k 8 \
--top-n 6 \
--expand-hops 1Graph-RAG-Engine/
│
├── .github/
│ └── workflows/
│ └── ci.yml
│
├── backend/
│ ├── api.py
│ ├── llm.py
│ ├── rag.py
│ └── retriever.py
│
├── data/
│ ├── docs/
│ └── index/
│
├── env/
│ └── requirements.txt
│
├── evaluation/
│ ├── __init__.py
│ ├── golden_queries.json
│ ├── metrics.py
│ └── evaluate_retrieval.py
│
├── graph/
│ └── graph_store.py
│
├── ingest/
│ ├── ingest_docs.py
│ └── split.py
│
├── tests/
│ ├── test_graph_store.py
│ ├── test_llm.py
│ ├── test_project_integrity.py
│ ├── test_retrieval_evaluation_dataset.py
│ ├── test_retrieval_metrics.py
│ ├── test_retriever_api_contract.py
│ └── test_split.py
│
├── ui/
│ └── app.py
│
├── run_backend.py
├── README.md
└── LICENSE
git clone https://github.com/AmirhosseinHonardoust/Graph-RAG-Engine.git
cd Graph-RAG-EngineOn Windows CMD:
python -m venv .venv
.venv\Scripts\activateOn Windows PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1On macOS/Linux:
python -m venv .venv
source .venv/bin/activatepython -m pip install --upgrade pip
pip install -r env/requirements.txtThe first run may download the SentenceTransformer model, so an internet connection is required for initial setup.
Run ingestion from the project root:
python -m ingest.ingest_docsThis will:
- read Markdown files from
data/docs/ - split each document into chunks
- extract lightweight concepts
- create sentence embeddings
- save a FAISS index
- build and save the graph
- write retrieval artifacts to
data/index/
Generated index artifacts are saved in:
data/index/
Start the FastAPI backend:
uvicorn backend.api:app --reload --port 8000Open the API docs:
http://localhost:8000/docs
Health check:
curl http://localhost:8000/healthExpected response:
{
"ok": true
}In another terminal, run:
streamlit run ui/app.pyThe UI will open at:
http://localhost:8501
By default, the UI calls the backend at:
http://localhost:8000
You can override this with:
export GRAPH_RAG_API_URL="http://localhost:8000"On Windows PowerShell:
$env:GRAPH_RAG_API_URL="http://localhost:8000"curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is FAISS?", "mode": "extractive"}'Example response shape:
{
"answer": "...",
"answer_mode": "extractive",
"citations": [
{
"doc_title": "faiss_notes.md",
"url": "file:///.../data/docs/faiss_notes.md"
}
],
"paths": [
{
"chunk_id": "faiss_notes_chunk_0",
"doc_id": "faiss_notes",
"doc_title": "faiss_notes.md",
"concepts": ["faiss", "vector", "search"]
}
]
}Set an API key:
export GRAPH_RAG_LLM_API_KEY="your_api_key_here"Or use:
export OPENAI_API_KEY="your_api_key_here"Optional configuration:
export GRAPH_RAG_LLM_MODEL="gpt-4o-mini"
export GRAPH_RAG_LLM_BASE_URL="https://api.openai.com/v1"
export GRAPH_RAG_LLM_TEMPERATURE="0.2"
export GRAPH_RAG_LLM_MAX_TOKENS="500"
export GRAPH_RAG_LLM_TIMEOUT_SECONDS="30"Then call:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How does FAISS relate to embeddings?", "mode": "llm"}'curl -X POST http://localhost:8000/recommend \
-H "Content-Type: application/json" \
-d '{"doc_id": "faiss_notes"}'curl http://localhost:8000/docs_listRun the test suite:
python -m unittest discover -s tests -vThe tests check important project behavior, including:
- Python source compilation
- Chunking behavior
- Concept extraction
- Graph neighbor lookup
- Graph save/load behavior
- Saved index consistency
- API health endpoint contract
- Retrieval metric calculations
- Golden-query dataset validation
- Optional LLM prompt/config behavior
The project includes a GitHub Actions workflow at:
.github/workflows/ci.yml
CI runs on:
- push
- pull request
- manual dispatch
The workflow:
- installs dependencies from
env/requirements.txt - checks that source files compile
- runs the unit test suite
- tests against Python
3.10,3.11, and3.12
This project has important limitations.
The system:
- uses a small sample corpus
- uses simple keyword/frequency-based concept extraction
- stores the graph in memory with NetworkX
- does not use a production graph database by default
- uses extractive answers by default
- depends on an external API provider for optional LLM mode
- uses a small golden-query retrieval benchmark
- does not include full production monitoring or observability
- may retrieve weak or incomplete context for ambiguous questions
- should not be treated as a fully reliable knowledge system
The project is best understood as a clean, testable, and explainable Graph-RAG MVP.
- Index artifacts are loaded from local files, including pickle files.
- Only load artifacts generated by this project from trusted sources.
- Pickle files from untrusted sources can be unsafe.
- The FastAPI CORS configuration is permissive for local development.
- Restrict allowed origins before deploying publicly.
- Do not commit API keys.
- Use environment variables for LLM credentials.
This project is intended for:
- RAG architecture practice
- retrieval evaluation practice
- graph-based retrieval experimentation
- AI engineering portfolio demonstration
- FastAPI and Streamlit application development
- explainable retrieval interface design
It should not be used for:
- high-stakes decision-making
- legal, medical, financial, or safety-critical advice
- fully automated research conclusions
- private document processing without additional security controls
- production deployment without monitoring, access control, and audit logging
Possible future improvements include:
- Add a larger sample corpus
- Improve concept extraction with KeyBERT, spaCy noun chunks, YAKE, or embedding-based clustering
- Add more retrieval evaluation questions
- Add answer-quality evaluation for optional LLM mode
- Add screenshot or demo GIF of the UI
- Add Docker support
- Add a Neo4j-backed graph store option
- Add user feedback collection
- Add feedback-based reranking
- Add source-grounded answer faithfulness checks
- Add observability and retrieval trace logging
- Python
- FastAPI
- Pydantic
- Uvicorn
- Streamlit
- Sentence Transformers
- FAISS
- NetworkX
- NumPy
- Markdown files
- Pickle artifacts
- unittest
- GitHub Actions
Amir Honardoust
GitHub: @AmirhosseinHonardoust
This project is released under the MIT License.