🤖 CodeoGraph

AI-powered codebase understanding agent. Index any GitHub repo and chat with it, analyze change impact, review PRs, and visualize architecture - all grounded in your actual source code.

Features

Feature	Description
AST Chunking	Tree-sitter extracts functions/classes as semantic chunks, not fixed tokens
Hybrid Retrieval	BM25 + vector search fused (0.6/0.4 weighting) for precision and recall
RAG Chat	Streaming Gemini responses grounded in retrieved code context
Symbol Search	"Go to definition" — find any function or class instantly
Change Impact	NetworkX reverse-BFS finds all dependents; Gemini rates each as Low/Medium/High risk
PR Review	Paste a git diff → structured review with risks, tests, and risk score
Architecture Diagram	Auto-generated Mermaid diagram from static import analysis
Multi-repo	Pinecone namespaces isolate repos; switch between them in the sidebar

Watch Demo

CodeoGraphforGit.mp4

pls watch the video on yt 👉🏼👈🏼

Setup

1. Install dependencies

pip install -r requirements.txt

2. Configure API keys

Copy .env and fill in your keys:

cp .env .env.local

GOOGLE_API_KEY=...       # Google AI Studio — free tier works
PINECONE_API_KEY=...     # Pinecone serverless — free tier works
PINECONE_INDEX_NAME=codeograph

Pinecone index is created automatically on first run (dimension=3072, cosine).

3. Start the backend

cd backend
uvicorn main:app --reload --port 8000

4. Start the frontend

cd frontend
streamlit run app.py --server.port 8501

Open http://localhost:8501

Architecture

frontend/app.py (Streamlit)
        │  HTTP / SSE
        ▼
backend/main.py (FastAPI)
    ├── ingestion.py   → GitPython clone → Tree-sitter parse → Google Embed → Pinecone upsert
    ├── retrieval.py   → Pinecone vector search + BM25 → score fusion
    ├── graph.py       → NetworkX reverse-BFS → Gemini risk assessment
    ├── pr_review.py   → Diff parser → context retrieval → Gemini review
    └── diagram.py     → Graph → Mermaid syntax

Tech Stack

LLM: Gemini 2.5 Flash (streaming)
Embeddings: Google models/gemini-embedding-2 (3072d)
Vector DB: Pinecone serverless (cosine)
Code Parsing: Tree-sitter (Python + JavaScript grammars)
Keyword Search: BM25Okapi via rank-bm25
Dependency Graph: NetworkX DiGraph
Repo Ingestion: GitPython
Backend: FastAPI + uvicorn
Frontend: Streamlit

Notes

Server restart clears BM25 corpus and dependency graphs from memory. Pinecone data persists. Re-querying will still work (vector search only); re-indexing restores BM25 + graphs.
Rate limits: Embedding batches sleep 1s between calls. For large repos (500+ files) expect 5–15 min indexing.
Pinecone index dimension: If the existing index was created with a different embedding model, delete it or set a new PINECONE_INDEX_NAME before re-indexing.
Token safety: Prompts are capped at 6000 context tokens before sending to Gemini.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 CodeoGraph

Features

Watch Demo

Setup

1. Install dependencies

2. Configure API keys

3. Start the backend

4. Start the frontend

Architecture

Tech Stack

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
app.py		app.py
diagram.py		diagram.py
graph.py		graph.py
indexed_repos.json		indexed_repos.json
indexing_state.py		indexing_state.py
ingestion.py		ingestion.py
main.py		main.py
pr_review.py		pr_review.py
requirements.txt		requirements.txt
retrieval.py		retrieval.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🤖 CodeoGraph

Features

Watch Demo

Setup

1. Install dependencies

2. Configure API keys

3. Start the backend

4. Start the frontend

Architecture

Tech Stack

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages