Semantic code search for AI agents.
Ecosystem: Memory API • Dashboard • Local Embeddings • Code Search
Ingest your codebase, search it semantically. Built for AI agents that need to understand and navigate code.
AI agents need to find relevant code — not by filename, but by meaning:
"where is CRUD/FLS checked" → Apex classes with security checks
"authentication and authorization" → Auth-related classes/methods
"trigger handlers for Account" → Account trigger implementations
"DML operations without checks" → Potential security issues
Traditional search: Grep, exact matches, regex patterns
engram-code: Natural language queries, semantic understanding, multi-model ensemble
# Clone
git clone https://github.com/heybeaux/engram-code
cd engram-code
# Install dependencies
pnpm install
# Configure
cp .env.example .env
# Edit DATABASE_URL and ENGRAM_EMBED_URL
# Database setup
pnpm prisma generate
pnpm prisma migrate dev
# Run
pnpm start:devServer starts at http://localhost:3002.
curl -X POST http://localhost:3002/v1/projects \
-H "Content-Type: application/json" \
-d '{
"name": "my-salesforce-app",
"rootPath": "/Users/dev/salesforce/my-app",
"languages": ["apex", "lwc", "typescript"]
}'# Ingest all files in the project
curl -X POST http://localhost:3002/v1/projects/{projectId}/ingest
# Response
{
"success": true,
"stats": {
"filesProcessed": 142,
"chunksCreated": 856,
"chunksStored": 856,
"duration": 12340
}
}# Basic semantic search
curl -X POST http://localhost:3002/v1/search \
-H "Content-Type: application/json" \
-d '{"query": "where is CRUD/FLS checked"}'
# Ensemble search (multi-model, better recall)
curl -X POST http://localhost:3002/v1/search/ensemble \
-H "Content-Type: application/json" \
-d '{
"query": "authentication logic",
"models": ["bge-base", "nomic"]
}' ┌─────────────────┐
│ engram-embed │
│ (Rust, local) │
│ │
│ bge-base (768) │
│ nomic (768) │
│ minilm (384) │
└────────▲────────┘
│ embeddings
┌──────────────┐ ┌───────────────────────┴──────────────────────┐
│ AI Agent │────▶│ engram-code │
│ (OpenClaw, │ │ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ Cursor, │ │ │Discovery │ │ Parsers │ │ Chunker │ │
│ etc.) │ │ │ Service │──│Apex, LWC │──│ Service │ │
└──────────────┘ │ └──────────┘ └──────────┘ └────────────┘ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Search │ │ Vectors │◀────────┘ │
│ │ │ Service │──│ Service │ │
└────────────▶│ └──────────┘ └──────────┘ │
search query │ │ │ │
└───────┼──────────────┼───────────────────────┘
│ │
▼ ▼
┌──────────────────────────┐
│ PostgreSQL + pgvector │
│ │
│ projects │
│ code_chunks │
│ embedding_bge (768) │
│ embedding_nomic (768) │
│ embedding_minilm(384) │
└──────────────────────────┘
The TypeScript parser extracts semantic chunks using the TypeScript compiler API:
| Chunk Type | What's Extracted |
|---|---|
class |
Classes with decorators, extends/implements |
method |
Methods with access modifiers, parameters, return type |
function |
Exported and top-level functions |
interface |
Interface declarations with properties |
type |
Type alias declarations |
Metadata extracted:
- Decorators (
@Injectable,@Controller, etc.) - Export visibility (exported vs internal)
- Import dependencies
- JSDoc comments
Better chunking than line-based splitting — classes and their methods are individually searchable while preserving parent-child relationships.
The Apex parser extracts:
| Chunk Type | What's Extracted |
|---|---|
class |
Classes with annotations, sharing mode, extends/implements |
method |
Methods with access modifiers, parameters, return type |
trigger |
Triggers with sObject and events |
test |
Test classes and @IsTest methods |
Metadata extracted:
- Sharing mode (
with sharing,without sharing,inherited sharing) - Annotations (
@AuraEnabled,@InvocableMethod,@IsTest, etc.) - SOQL queries (inline and dynamic)
- DML operations (
insert,update,delete,upsert)
// Example: This becomes a searchable chunk with rich metadata
@AuraEnabled
public with sharing class AccountService {
public static List<Account> getAccounts() {
return [SELECT Id, Name FROM Account];
}
}The LWC parser extracts:
| Chunk Type | What's Extracted |
|---|---|
component |
LWC component classes extending LightningElement |
method |
Class methods (including event handlers) |
function |
Arrow function properties |
Metadata extracted:
@apiproperties (public API)@trackproperties (reactive)@wiredecorators (data binding)- Event handlers (methods starting with
handle) - Dispatched custom events
- Imports and dependencies
// Example: Fully parsed with decorators and methods
import { LightningElement, api, wire } from 'lwc';
import getAccounts from '@salesforce/apex/AccountService.getAccounts';
export default class AccountList extends LightningElement {
@api recordId;
@wire(getAccounts) accounts;
handleRefresh() {
// ...
}
}| Endpoint | Method | Description |
|---|---|---|
/v1/projects |
POST | Register a new project |
/v1/projects |
GET | List all projects |
/v1/projects/:id |
GET | Get project details |
/v1/projects/:id |
DELETE | Delete project and all chunks |
/v1/projects/:id/stats |
GET | Get project statistics |
/v1/projects/:id/ingest |
POST | Ingest/re-ingest project code |
| Endpoint | Method | Description |
|---|---|---|
/v1/search |
POST | Semantic search (single model) |
/v1/search/ensemble |
POST | Multi-model ensemble search with RRF fusion |
/v1/search/similar/:chunkId |
GET | Find similar code to a chunk |
/v1/search/models |
GET | List available embedding models |
/v1/search/examples |
GET | Get example search queries |
/v1/search/health |
GET | Health check |
POST /v1/projects/:id/ingest
{
"clearExisting": false, // true to re-ingest from scratch
"skipEmbeddings": false // true to skip embedding generation (for testing)
}POST /v1/search
{
"query": "authentication logic",
"projectId": "uuid", // optional: filter by project
"language": "apex", // optional: filter by language
"chunkType": "class", // optional: filter by chunk type
"limit": 10 // max results (1-100)
}POST /v1/search/ensemble
{
"query": "where is CRUD/FLS checked",
"models": ["bge-base", "nomic"], // models to use
"limit": 10
}{
"query": "authentication logic",
"results": [
{
"chunk": {
"id": "uuid",
"filePath": "force-app/main/classes/AuthService.cls",
"lineStart": 1,
"lineEnd": 45,
"content": "public class AuthService { ... }",
"language": "apex",
"chunkType": "class",
"name": "AuthService",
"parentName": null,
"dependencies": []
},
"score": 0.89,
"highlights": ["isAccessible()", "CRUD"]
}
],
"totalFound": 5,
"searchTimeMs": 42
}-- Projects table
CREATE TABLE projects (
id UUID PRIMARY KEY,
name VARCHAR UNIQUE,
root_path VARCHAR,
languages TEXT[],
created_at TIMESTAMP,
updated_at TIMESTAMP,
last_ingested_at TIMESTAMP
);
-- Code chunks with multi-model embeddings
CREATE TABLE code_chunks (
id UUID PRIMARY KEY,
project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
-- Location
file_path VARCHAR,
line_start INT,
line_end INT,
-- Content
content TEXT,
language VARCHAR,
chunk_type VARCHAR,
name VARCHAR,
parent_name VARCHAR,
dependencies TEXT[],
-- Multi-model embeddings (pgvector)
embedding_bge VECTOR(768),
embedding_nomic VECTOR(768),
embedding_gte VECTOR(768),
embedding_minilm VECTOR(384),
-- Change detection
checksum VARCHAR,
created_at TIMESTAMP
);
-- Indexes for fast filtering
CREATE INDEX idx_chunks_project ON code_chunks(project_id);
CREATE INDEX idx_chunks_language ON code_chunks(language);
CREATE INDEX idx_chunks_type ON code_chunks(chunk_type);
CREATE INDEX idx_chunks_file ON code_chunks(file_path);Code is chunked by semantic units, not arbitrary line counts:
┌─────────────────────────────────────────────────────┐
│ AccountService.cls │
│ ┌───────────────────────────────────────────────┐ │
│ │ Chunk 1: file_header │ │
│ │ (imports, top comments) │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Chunk 2: class AccountService │ │
│ │ (entire class with annotations) │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Chunk 3: method getAccounts │ │
│ │ (method with annotations, linked to class) │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Chunk 4: method createAccount │ │
│ │ (method with annotations, linked to class) │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Why semantic chunking?
- Methods are searchable independently AND as part of their class
- Class-level search returns the whole class context
- Parent-child relationships are preserved (
parentNamefield) - Metadata (annotations, sharing mode) is attached to each chunk
Multi-model search uses Reciprocal Rank Fusion for better recall:
┌─────────────────────────────────────────────────────────┐
│ Query: "authentication logic" │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ bge-base │ │ nomic │ │ minilm │ │
│ │ 768-dim │ │ 768-dim │ │ 384-dim │ │
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │ │
│ AuthService AuthService AuthService │
│ LoginController UserManager LoginController │
│ UserManager LoginController AuthHelper │
│ │ │ │ │
│ └────────────────┼───────────────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ RRF Fusion │ │
│ │ score = Σ 1/(k+rank) │ │
│ └───────────────────────┘ │
│ │ │
│ 1. AuthService (found by 3/3 models) │
│ 2. LoginController (found by 3/3 models) │
│ 3. UserManager (found by 2/3 models) │
│ 4. AuthHelper (found by 1/3 models) │
└─────────────────────────────────────────────────────────┘
Benefits:
- Chunks found by multiple models score higher (consensus)
- Different models catch different semantic aspects
- Reduces single-model blind spots
- Response includes per-model rankings for debugging
# Database (PostgreSQL with pgvector)
DATABASE_URL=postgresql://user:pass@localhost:5432/engram_code
# Embedding server (engram-embed)
ENGRAM_EMBED_URL=http://127.0.0.1:8080
# Server
PORT=3002engram-code is designed to work alongside Engram (memory) and share infrastructure:
| Component | Port | Purpose |
|---|---|---|
| engram | 3001 | Agent memory (facts, preferences, events) |
| engram-code | 3002 | Code search (classes, methods, components) |
| engram-embed | 8080 | Local embeddings (shared by both) |
| engram-dashboard | 3000 | Web UI (visualizes both) |
All components use the same embedding server for consistent vector representations.
# Security-focused
"where is CRUD/FLS checked"
"classes using without sharing"
"DML operations without security"
"SOQL injection vulnerabilities"
# Architecture
"trigger handlers"
"service layer methods"
"batch job implementations"
"API integration methods"
# LWC
"wire service usage"
"components with API properties"
"event handling methods"
# General
"authentication and authorization"
"error handling patterns"
"utility functions"
"test classes"On M2 MacBook Pro:
| Operation | Time |
|---|---|
| Ingest 100 Apex files | ~5s |
| Generate embeddings (bge-base) | ~10ms/chunk |
| Single-model search | ~40ms |
| Ensemble search (3 models) | ~80ms |
MIT
Let AI agents understand your code.