CodeLore

CodeLore is an AI agent that serves as the institutional memory of a codebase. It solves the problem every engineering team faces: critical design context is trapped in closed pull requests, old Slack threads, and stale documentation. New developers waste weeks understanding why code exists. CodeLore turns that scattered history into searchable, AI-powered institutional memory.

Built on Elastic Agent Builder and Elasticsearch Cloud Serverless, CodeLore ingests git commits with diffs, pull request discussions with reviews and comments, and architecture docs from any GitHub repo via OAuth. All text is embedded with Sentence Transformers (all-MiniLM-L6-v2) and stored as 384-dim dense vectors with int8_hnsw quantization across 5 Elasticsearch indices.

The agent operates in three modes.

Ask Mode lets developers ask natural language questions — the agent searches across all indices using 6 custom ES|QL tools, cross-references commits with PR discussions, and returns cited answers.
Onboard Mode generates multi-step guided learning paths using multi-turn Agent Builder conversations.
Explore Mode provides a 5-tab code archaeology dashboard: File Timeline (msearch-batched queries across 4 indices), Decision Browser, Semantic Search (multi-index kNN), Expert Finder (aggregations with on-call scoring), and Impact Analysis (risk assessment with co-change coupling).

Elasticsearch features used:

kNN vector search with int8_hnsw quantization, multi-index kNN queries, msearch API for batched timeline construction, terms/cardinality/date_histogram aggregations for expert and impact analysis, delete_by_query for repo data isolation, force merge for post-ingestion optimization, and _source excludes at mapping level.

Agent Builder features used:

6 custom ES|QL tools with parameterized match() queries, platform tools for dynamic queries, multi-turn conversations for stateful onboarding, and repo-scoped agent instructions.

Features we liked:

The ES|QL tool type made creating parameterized search tools trivial. Platform tools like generate_esql let the agent write dynamic queries for questions pre-built tools don't cover. kNN + filter gives semantic understanding and strict repo isolation in a single query.

Challenges:

Discovering that match() is the correct ES|QL full-text function on serverless took significant debugging.
Cross-repo data leakage from in-memory auth state required passing repo context from the frontend.
Query performance on serverless required multi-index kNN, msearch batching, int8_hnsw quantization, and multi-layer caching to achieve acceptable response times.

Built With

elastic-agent-builder
elastic-cloud-serverless
elasticsearch
es|ql
fastapi
framer-motion
github-api
httpx
kibana-api
python
react
sentence-transformers
tailwind-css
typescript
vite

Built With

Updates