Skip to content

Bibyutatsu/Repolect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Repolect

Stop searching for code. Start repolecting it πŸ˜‰.

Semantic code intelligence powered by LLM reasoning. Ask questions, trace execution flows, plan changes, analyze impact β€” all local-first, no vector database needed.

Python 3.10+ License: MIT Version MCP


Who is this for?

Repolect is built primarily for:

  • 🧠 Developers exploring new code: Quickly understand a project's architecture and logic better without reading thousands of lines of code.
  • πŸ€– AI Coding Agent Users: Supercharge agents (like Cursor, Claude Code) with precise structural context to improve edit performance and significantly reduce hallucinations.
  • πŸ“Š Local-First Enthusiasts: Index, query, and beautifully visualize your codebase's dependencies entirely locally.
  • ⚑ SLM Power Users: Maximize the potential of locally hosted Small Language Models (via Ollama) to autonomously analyze, edit, and update your codebases.

Features

  • 🌳 Hierarchical Semantic Tree: Every node (module, file, class, function) gets a bottom-up LLM-generated summary. The abstract meaning of your codebase is indexed, not just the raw text.
  • 🎯 Vectorless Search: Navigate the semantic tree using LLM reasoning (in O(log N) steps). Finds actual answers, saving huge amounts of tokens compared to blind similarity searches.
  • πŸ•ΈοΈ Knowledge Graph: Maps CALLS, IMPORTS, EXTENDS, and IMPLEMENTS relations across your codebase. Useful for tracing execution paths or finding the "blast radius" of a change.
  • πŸ”Œ Full MCP Integration: Exposes 14 powerful tools to AI editors (Cursor, Claude Code etc) out of the box, drastically reducing token usage and round trips.
  • πŸ›‘οΈ Prescriptive Agent Context: Generates "Agent Skills" depending on functional groups (Louvain communities) in your code to inject targeted context when and where it's needed.
  • πŸ”’ Local-First & SLM Optimized: Engineered to run perfectly on efficient local models like qwen3.5 or qwen2.5-coder via Ollama. No data leaves your machine unless you want it to.

repolect viz

Note: This is a graph representation of the codebase of Repolect itself.


How It Works

Repolect builds a hierarchical tree of your codebase where every node β€” module, file, class, function β€” gets an LLM-generated summary. Queries navigate this tree using LLM reasoning, finding relevant code in O(log N) steps without any vector similarity search.

RepoNode: "E-commerce backend in Python/FastAPI..."
β”œβ”€β”€ ModuleNode src/auth: "JWT-based authentication layer..."
β”‚   β”œβ”€β”€ FileNode jwt.py: "Token generation and validation..."
β”‚   β”‚   β”œβ”€β”€ ClassNode JWTService: "Manages token lifecycle..."
β”‚   β”‚   └── FunctionNode verify_token: "Validates Bearer tokens..."
β”‚   └── DocNode README.md: "Auth module documentation..."
└── ModuleNode src/payments: "Stripe payment processing..."

A knowledge graph runs alongside the tree, storing structural relations (CALLS, IMPORTS, EXTENDS, IMPLEMENTS) that power dependency analysis, impact tracing, and execution flow tracking.

Architecture

flowchart LR
    subgraph indexing [Indexing Pipeline]
        Scan[Scan Repo] --> Parse[Parse Files]
        Parse --> Summarize[LLM Summarize]
        Summarize --> Graph[Build Graph]
    end
 
    subgraph storage [Dual Storage]
        Tree["tree.json\n(semantic tree)"]
        GraphDB["graph.pkl / graph.db\n(knowledge graph)"]
    end
 
    subgraph query [Query Layer]
        CLI[CLI Commands]
        MCP[MCP Server]
    end
 
    Graph --> Tree
    Graph --> GraphDB
    Tree --> CLI
    Tree --> MCP
    GraphDB --> CLI
    GraphDB --> MCP
Loading

Quick Start

Recommended: One-liner Installer

The interactive installer sets up Ollama, configures your LLM provider, and makes repolect available system-wide:

curl -fsSL https://raw.githubusercontent.com/Bibyutatsu/Repolect/main/install.sh | bash

The installer uses pipx (isolated environment, no dependency conflicts) with a pip --user fallback. It automatically updates your shell PATH via a Conda-style marker block in .zshrc/.bashrc.

Install via pipx (recommended for CLI tools)

pipx install repolect
pipx inject repolect ollama          # for Ollama support
pipx inject repolect falkordblite    # for FalkorDB graph backend

Install from PyPI

pip install repolect[all]

Install from source

git clone https://github.com/Bibyutatsu/Repolect.git
cd Repolect
pip install -e ".[all]"

Index and query

cd your-project/
repolect analyze          # Index the codebase
repolect ask "how does authentication work?"

Requires an LLM provider. Repolect defaults to Ollama (local, free, private). See Configuration for other providers.


CLI Reference

Command Description Key Flags
repolect analyze Full index: semantic tree + knowledge graph + agent skills --force, --all-branches, --skills, --graph-backend, --parse-workers, --num-workers, --no-git, --quiet
repolect sync Incremental re-index (changed files only) --parse-workers, --num-workers, --quiet, --no-cache
repolect ask "query" Natural-language Q&A with citations --max-results, --quiet
repolect why <path> Explain why a file or symbol exists --repo
repolect tree Print the semantic tree --depth (default 3)
repolect graph "MATCH ..." Run Cypher queries on the knowledge graph --repo
repolect impact <symbol> Blast radius analysis --max-hops (default 3)
repolect diff Map git changes to affected symbols --ref (default HEAD~1), --with-impact
repolect communities Show functional clusters (Louvain) --repo
repolect list List all indexed repositories β€”
repolect mcp Configure editors + start MCP server --serve (skip menu, start server directly), --scope global|project
repolect viz Launch Streamlit graph explorer --port (default 8501)

MCP Server Integration

The Model Context Protocol (MCP) lets AI editors use Repolect as a live code intelligence backend.

Auto-configure with repolect mcp

Running repolect mcp opens an interactive setup flow:

  1. Displays the config snippet you can copy into any editor manually
  2. Detects installed editors (Cursor, Claude Code, Antigravity, Windsurf, VS Code)
  3. Asks which to configure β€” select by number or press a for all
  4. Writes/merges the correct JSON config into each editor automatically
$ repolect mcp

  πŸ”Œ Repolect MCP Server
  ────────────────────────────────────────────────────────

  Add this to your editor's MCP config file:

    {
      "mcpServers": {
        "repolect": {
          "command": "/usr/local/bin/repolect",
          "args": ["mcp", "--serve"]
        }
      }
    }

  Binary resolved to: /usr/local/bin/repolect

  ────────────────────────────────────────────────────────
  Detected editors:  [1] Cursor  ,  [2] Antigravity (Gemini)

  Enter numbers to auto-configure (e.g. 1,3), 'a' for all, or Enter to skip:
  β†’ a

  Cursor  β†’  ~/.cursor/mcp.json  [βœ“ written]
  Antigravity (Gemini)  β†’  ~/.gemini/mcp.json  [βœ“ written]

  βœ… Done! Restart your editor for changes to take effect.

Manual config (all editors use the same format)

{
  "mcpServers": {
    "repolect": {
      "command": "repolect",
      "args": ["mcp", "--serve"]
    }
  }
}
Editor Config file
Cursor (global) ~/.cursor/mcp.json
Cursor (project) .cursor/mcp.json
Claude Code (global) ~/.claude.json β†’ mcpServers
Claude Code (project) .mcp.json
Antigravity / Gemini ~/.gemini/mcp.json
Windsurf ~/.codeium/windsurf/mcp_config.json
VS Code (Copilot) ~/.vscode/mcp.json β†’ servers

--serve flag: Use args: ["mcp", "--serve"] in your mcp.json. This skips the interactive menu and starts the stdio server directly β€” which is what editors need.

MCP Tools

14 tools exposed via MCP:

Tool What It Does
tree_search Semantic search β€” answers "how does X work?" using LLM tree reasoning
get_node 360-degree symbol view: source code, callers, callees, relations
explain_node LLM-powered explanation of why a symbol exists in the codebase
trace_flow Follow CALLS edges from an entry point to build an execution flow
graph_query Run raw Cypher queries against the knowledge graph
impact_analysis Blast radius: what breaks if you change a given symbol
diff_analysis Map git diff to affected symbols + downstream blast radius
plan_change Structured change plan: ADD / MODIFY / READ_ONLY / TEST_AFTER
find_similar Find an existing implementation to use as a template
get_conventions Extract coding conventions from a module's neighborhood
scope_test Find the minimal test set for modified nodes (MUST / SHOULD tiers)
rename Multi-file rename plan with graph + text search, confidence tagging
repo_summary Top-level codebase overview with stats and module descriptions
list_repos Discover all indexed repositories

Resources & Prompts

Resource Description
repolect://tree Full semantic tree as JSON
repolect://summary Top-level codebase overview
Prompt Description
code_search_guide Guided workflow: summary β†’ search β†’ node β†’ trace
explain_codebase Generate a codebase explanation from the tree

Agent Skills & Context

Repolect influences AI agent behavior through three layers:

Layer 1: MCP Tools (what the agent can do)

The 14 tools listed above β€” plan_change, tree_search, impact_analysis, etc.

Layer 2: Prescriptive Context File (what the agent should do)

repolect analyze generates REPOLECT.md at the repo root with:

  • "Always Do" rules β€” call plan_change before changes, find_similar before creating, get_conventions before modifying, diff_analysis before committing, scope_test after changes
  • "Never Do" rules β€” never skip impact analysis on widely-used symbols, never commit without diff_analysis
  • Debugging and Refactoring workflows β€” step-by-step tool chains
  • Community map β€” Louvain-detected functional areas with key symbols
  • Marker-based upsert β€” re-indexing replaces only the Repolect section, preserving any user-written content

Layer 3: Workflow Skills (what the agent does in specific situations)

Static skills (installed every repolect analyze):

Skill Trigger
repolect-exploring Navigating unfamiliar code, "how does X work?"
repolect-planning Before implementing any feature or change
repolect-debugging Tracing bugs, investigating errors
repolect-refactoring Renaming, extracting, restructuring
repolect-reviewing Pre-commit safety checks, code review

Generated community skills (repolect analyze --skills):

Per-community skill files describing each functional area of the codebase β€” key files, entry points, cross-community connections, associated tests, and LLM-synthesized descriptions of what each area does.

Skills are auto-installed into detected editors:

  • Cursor: .cursor/rules/repolect-*.mdc
  • Claude Code: .claude/skills/repolect/*.md

Configuration

Repolect reads from ~/.repolect/config.yaml:

# LLM Provider
provider: ollama                    # or "openai-compatible"
base_url: http://localhost:11434    # or your API endpoint
model_name: qwen3.5:4b             # your preferred model
api_key: ""                         # empty for Ollama
 
# Embeddings (optional β€” enables hybrid vector+tree search)
embedding_provider: ollama
embedding_model: qwen3-embedding:0.6b
Using an OpenAI-compatible API
provider: openai-compatible
base_url: https://api.openai.com/v1
model_name: gpt-4o-mini
api_key: sk-...
 
embedding_provider: openai-compatible
embedding_model: text-embedding-3-small
embedding_api_key: sk-...

Environment variables override config: REPOLECT_PROVIDER, REPOLECT_BASE_URL, REPOLECT_MODEL, REPOLECT_API_KEY, REPOLECT_EMBEDDINGS (1/0).


Why Vectorless?

Vector similarity finds files that are similar to your query β€” not files that answer it.

"How does payment work?" doesn't semantically resemble stripe_adapter.py. LLM reasoning over a structured tree does.

Repolect's tree search operates in O(log N) LLM calls: probe the root, pick the most relevant branch, descend until you reach the answer. Every node has a pre-computed summary, so the LLM reasons about meaning, not similarity.

Embeddings are optional β€” enable them for hybrid search when you want both approaches.


MCP Performance Analysis

Benchmarked across 8 complex real-world coding scenarios on Repolect's own codebase (807 nodes, 28 files).

Summary

Metric Without MCP Tools With MCP Tools Improvement
Input tokens 330,363 10,964 97% reduction
Tool calls 87 17 5.1x fewer
Round trips 34 9 3.8x fewer
Tokens saved β€” β€” 319,399

Tool Tier Ranking

Tier 1 β€” Transformative (use on every task):

Tool Value
plan_change Replaces 15+ calls with 1 structured roadmap
tree_search Answers "how does X work?" without reading any file
trace_flow 82-node call graph impossible to build manually
diff_analysis Pre-commit safety net in 1 call vs 14+

Tier 2 β€” High Value (use frequently):

Tool Value
find_similar Template + copy/replace/match advice
impact_analysis Multi-hop blast radius with test tagging
rename Graph + text confidence tagging
scope_test Specific test names with MUST/SHOULD tiers
get_node 360-degree symbol view replaces 4+ calls

Tier 3 β€” Useful (for specific tasks):

Tool Value
get_conventions 8 convention categories from neighboring code
graph_query Structural questions impossible without a graph
explain_node LLM-powered context for unfamiliar symbols
repo_summary Quick orientation for first interaction

In Practice

For a typical coding session with 5–10 tasks, Repolect MCP tools save approximately:

  • ~150,000–300,000 input tokens (~$0.45–$0.90 per session at $0.003/1K tokens)
  • 30–50 tool calls reduced to 8–15
  • 15–25 round trips reduced to 5–8 (each round trip = 2–5 seconds of latency)
  • 30–120 seconds of latency eliminated from fewer round trips

Project Structure

repolect/
β”œβ”€β”€ __init__.py          # Package exports and version
β”œβ”€β”€ cli.py               # Click CLI commands (analyze, ask, sync, mcp, ...)
β”œβ”€β”€ config.py            # Config loading (~/.repolect/config.yaml)
β”œβ”€β”€ embedder.py          # Optional vector embeddings (Ollama, OpenAI)
β”œβ”€β”€ git_utils.py         # Git operations (branch, diff, hash, etc.)
β”œβ”€β”€ graph_db.py          # Knowledge graph (NetworkX + FalkorDB backends)
β”œβ”€β”€ mcp_server.py        # MCP server with 14 tools, 2 resources, 2 prompts
β”œβ”€β”€ models.py            # Core data models (CodeNode, Relation, TreeMeta)
β”œβ”€β”€ parser.py            # Hybrid parser (tree-sitter + regex enhancer)
β”œβ”€β”€ search.py            # Tree search, explanation, flow tracing
β”œβ”€β”€ skill_installer.py   # Agent skill installer (static + generated community skills)
β”œβ”€β”€ skills/              # Static workflow skills (exploring, planning, debugging, ...)
β”œβ”€β”€ storage.py           # Persistence (tree.json, meta.json, REPOLECT.md)
β”œβ”€β”€ summarizer.py        # Bottom-up LLM summarization pipeline
└── tree_builder.py      # Indexing orchestrator (scan β†’ parse β†’ link β†’ graph)

License

MIT

About

Reasoning-based code intelligence for any codebase with Hierarchical semantic indexing and Knowledge Graphs 🧠

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors