engram-code

Semantic code search for AI agents.

Ecosystem: Memory API • Dashboard • Local Embeddings • Code Search

Ingest your codebase, search it semantically. Built for AI agents that need to understand and navigate code.

Why engram-code?

AI agents need to find relevant code — not by filename, but by meaning:

"where is CRUD/FLS checked"           → Apex classes with security checks
"authentication and authorization"     → Auth-related classes/methods
"trigger handlers for Account"         → Account trigger implementations
"DML operations without checks"        → Potential security issues

Traditional search: Grep, exact matches, regex patterns
engram-code: Natural language queries, semantic understanding, multi-model ensemble

Quick Start

# Clone
git clone https://github.com/heybeaux/engram-code
cd engram-code

# Install dependencies
pnpm install

# Configure
cp .env.example .env
# Edit DATABASE_URL and ENGRAM_EMBED_URL

# Database setup
pnpm prisma generate
pnpm prisma migrate dev

# Run
pnpm start:dev

Server starts at http://localhost:3002.

Register a Project

curl -X POST http://localhost:3002/v1/projects \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-salesforce-app",
    "rootPath": "/Users/dev/salesforce/my-app",
    "languages": ["apex", "lwc", "typescript"]
  }'

Ingest Code

# Ingest all files in the project
curl -X POST http://localhost:3002/v1/projects/{projectId}/ingest

# Response
{
  "success": true,
  "stats": {
    "filesProcessed": 142,
    "chunksCreated": 856,
    "chunksStored": 856,
    "duration": 12340
  }
}

Search

# Basic semantic search
curl -X POST http://localhost:3002/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "where is CRUD/FLS checked"}'

# Ensemble search (multi-model, better recall)
curl -X POST http://localhost:3002/v1/search/ensemble \
  -H "Content-Type: application/json" \
  -d '{
    "query": "authentication logic",
    "models": ["bge-base", "nomic"]
  }'

Architecture

                                    ┌─────────────────┐
                                    │  engram-embed   │
                                    │  (Rust, local)  │
                                    │                 │
                                    │  bge-base (768) │
                                    │  nomic (768)    │
                                    │  minilm (384)   │
                                    └────────▲────────┘
                                             │ embeddings
┌──────────────┐     ┌───────────────────────┴──────────────────────┐
│   AI Agent   │────▶│                 engram-code                   │
│  (OpenClaw,  │     │  ┌──────────┐  ┌──────────┐  ┌────────────┐  │
│   Cursor,    │     │  │Discovery │  │ Parsers  │  │  Chunker   │  │
│   etc.)      │     │  │ Service  │──│Apex, LWC │──│  Service   │  │
└──────────────┘     │  └──────────┘  └──────────┘  └────────────┘  │
       │             │                                      │        │
       │             │  ┌──────────┐  ┌──────────┐         │        │
       │             │  │  Search  │  │ Vectors  │◀────────┘        │
       │             │  │ Service  │──│ Service  │                  │
       └────────────▶│  └──────────┘  └──────────┘                  │
      search query   │       │              │                       │
                     └───────┼──────────────┼───────────────────────┘
                             │              │
                             ▼              ▼
                     ┌──────────────────────────┐
                     │   PostgreSQL + pgvector   │
                     │                          │
                     │  projects                │
                     │  code_chunks             │
                     │    embedding_bge (768)   │
                     │    embedding_nomic (768) │
                     │    embedding_minilm(384) │
                     └──────────────────────────┘

Language Support

TypeScript (.ts, .tsx)

The TypeScript parser extracts semantic chunks using the TypeScript compiler API:

Chunk Type	What's Extracted
`class`	Classes with decorators, extends/implements
`method`	Methods with access modifiers, parameters, return type
`function`	Exported and top-level functions
`interface`	Interface declarations with properties
`type`	Type alias declarations

Metadata extracted:

Decorators (@Injectable, @Controller, etc.)
Export visibility (exported vs internal)
Import dependencies
JSDoc comments

Better chunking than line-based splitting — classes and their methods are individually searchable while preserving parent-child relationships.

Apex (.cls, .trigger)

The Apex parser extracts:

Chunk Type	What's Extracted
`class`	Classes with annotations, sharing mode, extends/implements
`method`	Methods with access modifiers, parameters, return type
`trigger`	Triggers with sObject and events
`test`	Test classes and @IsTest methods

Metadata extracted:

Sharing mode (with sharing, without sharing, inherited sharing)
Annotations (@AuraEnabled, @InvocableMethod, @IsTest, etc.)
SOQL queries (inline and dynamic)
DML operations (insert, update, delete, upsert)

// Example: This becomes a searchable chunk with rich metadata
@AuraEnabled
public with sharing class AccountService {
    public static List<Account> getAccounts() {
        return [SELECT Id, Name FROM Account];
    }
}

LWC (.js in lwc/ folders)

The LWC parser extracts:

Chunk Type	What's Extracted
`component`	LWC component classes extending LightningElement
`method`	Class methods (including event handlers)
`function`	Arrow function properties

Metadata extracted:

@api properties (public API)
@track properties (reactive)
@wire decorators (data binding)
Event handlers (methods starting with handle)
Dispatched custom events
Imports and dependencies

// Example: Fully parsed with decorators and methods
import { LightningElement, api, wire } from 'lwc';
import getAccounts from '@salesforce/apex/AccountService.getAccounts';

export default class AccountList extends LightningElement {
    @api recordId;
    @wire(getAccounts) accounts;
    
    handleRefresh() {
        // ...
    }
}

API Reference

Projects

Endpoint	Method	Description
`/v1/projects`	POST	Register a new project
`/v1/projects`	GET	List all projects
`/v1/projects/:id`	GET	Get project details
`/v1/projects/:id`	DELETE	Delete project and all chunks
`/v1/projects/:id/stats`	GET	Get project statistics
`/v1/projects/:id/ingest`	POST	Ingest/re-ingest project code

Search

Endpoint	Method	Description
`/v1/search`	POST	Semantic search (single model)
`/v1/search/ensemble`	POST	Multi-model ensemble search with RRF fusion
`/v1/search/similar/:chunkId`	GET	Find similar code to a chunk
`/v1/search/models`	GET	List available embedding models
`/v1/search/examples`	GET	Get example search queries
`/v1/search/health`	GET	Health check

Ingestion Request

POST /v1/projects/:id/ingest
{
  "clearExisting": false,  // true to re-ingest from scratch
  "skipEmbeddings": false  // true to skip embedding generation (for testing)
}

Search Request

POST /v1/search
{
  "query": "authentication logic",
  "projectId": "uuid",     // optional: filter by project
  "language": "apex",      // optional: filter by language
  "chunkType": "class",    // optional: filter by chunk type
  "limit": 10              // max results (1-100)
}

Ensemble Search Request

POST /v1/search/ensemble
{
  "query": "where is CRUD/FLS checked",
  "models": ["bge-base", "nomic"],  // models to use
  "limit": 10
}

Search Response

{
  "query": "authentication logic",
  "results": [
    {
      "chunk": {
        "id": "uuid",
        "filePath": "force-app/main/classes/AuthService.cls",
        "lineStart": 1,
        "lineEnd": 45,
        "content": "public class AuthService { ... }",
        "language": "apex",
        "chunkType": "class",
        "name": "AuthService",
        "parentName": null,
        "dependencies": []
      },
      "score": 0.89,
      "highlights": ["isAccessible()", "CRUD"]
    }
  ],
  "totalFound": 5,
  "searchTimeMs": 42
}

Database Schema

-- Projects table
CREATE TABLE projects (
  id UUID PRIMARY KEY,
  name VARCHAR UNIQUE,
  root_path VARCHAR,
  languages TEXT[],
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  last_ingested_at TIMESTAMP
);

-- Code chunks with multi-model embeddings
CREATE TABLE code_chunks (
  id UUID PRIMARY KEY,
  project_id UUID REFERENCES projects(id) ON DELETE CASCADE,
  
  -- Location
  file_path VARCHAR,
  line_start INT,
  line_end INT,
  
  -- Content
  content TEXT,
  language VARCHAR,
  chunk_type VARCHAR,
  name VARCHAR,
  parent_name VARCHAR,
  dependencies TEXT[],
  
  -- Multi-model embeddings (pgvector)
  embedding_bge VECTOR(768),
  embedding_nomic VECTOR(768),
  embedding_gte VECTOR(768),
  embedding_minilm VECTOR(384),
  
  -- Change detection
  checksum VARCHAR,
  created_at TIMESTAMP
);

-- Indexes for fast filtering
CREATE INDEX idx_chunks_project ON code_chunks(project_id);
CREATE INDEX idx_chunks_language ON code_chunks(language);
CREATE INDEX idx_chunks_type ON code_chunks(chunk_type);
CREATE INDEX idx_chunks_file ON code_chunks(file_path);

Chunking Strategy

Code is chunked by semantic units, not arbitrary line counts:

┌─────────────────────────────────────────────────────┐
│  AccountService.cls                                  │
│  ┌───────────────────────────────────────────────┐  │
│  │ Chunk 1: file_header                          │  │
│  │ (imports, top comments)                       │  │
│  └───────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────┐  │
│  │ Chunk 2: class AccountService                 │  │
│  │ (entire class with annotations)               │  │
│  └───────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────┐  │
│  │ Chunk 3: method getAccounts                   │  │
│  │ (method with annotations, linked to class)    │  │
│  └───────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────┐  │
│  │ Chunk 4: method createAccount                 │  │
│  │ (method with annotations, linked to class)    │  │
│  └───────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘

Why semantic chunking?

Methods are searchable independently AND as part of their class
Class-level search returns the whole class context
Parent-child relationships are preserved (parentName field)
Metadata (annotations, sharing mode) is attached to each chunk

Ensemble Search (RRF Fusion)

Multi-model search uses Reciprocal Rank Fusion for better recall:

┌─────────────────────────────────────────────────────────┐
│  Query: "authentication logic"                          │
│                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐ │
│  │  bge-base   │  │    nomic    │  │    minilm       │ │
│  │  768-dim    │  │    768-dim  │  │    384-dim      │ │
│  └──────┬──────┘  └──────┬──────┘  └────────┬────────┘ │
│         │                │                   │          │
│    AuthService      AuthService        AuthService      │
│    LoginController  UserManager        LoginController  │
│    UserManager      LoginController    AuthHelper       │
│         │                │                   │          │
│         └────────────────┼───────────────────┘          │
│                          ▼                              │
│              ┌───────────────────────┐                  │
│              │     RRF Fusion        │                  │
│              │  score = Σ 1/(k+rank) │                  │
│              └───────────────────────┘                  │
│                          │                              │
│            1. AuthService     (found by 3/3 models)     │
│            2. LoginController (found by 3/3 models)     │
│            3. UserManager     (found by 2/3 models)     │
│            4. AuthHelper      (found by 1/3 models)     │
└─────────────────────────────────────────────────────────┘

Benefits:

Chunks found by multiple models score higher (consensus)
Different models catch different semantic aspects
Reduces single-model blind spots
Response includes per-model rankings for debugging

Environment Variables

# Database (PostgreSQL with pgvector)
DATABASE_URL=postgresql://user:pass@localhost:5432/engram_code

# Embedding server (engram-embed)
ENGRAM_EMBED_URL=http://127.0.0.1:8080

# Server
PORT=3002

Integration with Engram Ecosystem

engram-code is designed to work alongside Engram (memory) and share infrastructure:

Component	Port	Purpose
engram	3001	Agent memory (facts, preferences, events)
engram-code	3002	Code search (classes, methods, components)
engram-embed	8080	Local embeddings (shared by both)
engram-dashboard	3000	Web UI (visualizes both)

All components use the same embedding server for consistent vector representations.

Example Queries

# Security-focused
"where is CRUD/FLS checked"
"classes using without sharing"
"DML operations without security"
"SOQL injection vulnerabilities"

# Architecture
"trigger handlers"
"service layer methods"
"batch job implementations"
"API integration methods"

# LWC
"wire service usage"
"components with API properties"
"event handling methods"

# General
"authentication and authorization"
"error handling patterns"
"utility functions"
"test classes"

Performance

On M2 MacBook Pro:

Operation	Time
Ingest 100 Apex files	~5s
Generate embeddings (bge-base)	~10ms/chunk
Single-model search	~40ms
Ensemble search (3 models)	~80ms

License

MIT

Let AI agents understand your code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dist		dist
prisma		prisma
scripts		scripts
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
eslint.config.mjs		eslint.config.mjs
nest-cli.json		nest-cli.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

engram-code

Why engram-code?

Quick Start

Register a Project

Ingest Code

Search

Architecture

Language Support

TypeScript (.ts, .tsx)

Apex (.cls, .trigger)

LWC (.js in lwc/ folders)

API Reference

Projects

Search

Ingestion Request

Search Request

Ensemble Search Request

Search Response

Database Schema

Chunking Strategy

Ensemble Search (RRF Fusion)

Environment Variables

Integration with Engram Ecosystem

Example Queries

Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

engram-code

Why engram-code?

Quick Start

Register a Project

Ingest Code

Search

Architecture

Language Support

TypeScript (.ts, .tsx)

Apex (.cls, .trigger)

LWC (.js in lwc/ folders)

API Reference

Projects

Search

Ingestion Request

Search Request

Ensemble Search Request

Search Response

Database Schema

Chunking Strategy

Ensemble Search (RRF Fusion)

Environment Variables

Integration with Engram Ecosystem

Example Queries

Performance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages