feat: SQLiteEngine — zero-cost local brain (no Supabase needed) by lmanchu · Pull Request #19 · garrytan/gbrain

lmanchu · 2026-04-10T08:57:26Z

Summary

Implements the SQLiteEngine as documented in docs/SQLITE_ENGINE.md — a zero-cost, zero-dependency alternative to PostgresEngine using bun:sqlite.

One file. No server. No subscription. Works offline.

What's implemented

Full BrainEngine interface (all 30 methods):

Pages CRUD with upsert, slug validation, content hashing
FTS5 keyword search with BM25 ranking (porter + unicode61 tokenizer), auto-sync triggers
Content chunks with BLOB embedding storage (vec0/sqlite-vss ready)
Links + graph traversal via recursive CTE (depth configurable)
Tags, timeline, raw data, page versions — all parity with Postgres
Stats + health — embed coverage, stale pages, orphans, dead links
Ingest log, config KV, slug resolution (LIKE fallback for fuzzy matching)

What works without any API keys

Everything except vector search and embeddings. FTS5+BM25 keyword search is genuinely good for structured wiki content.

What needs optional setup

Vector search: searchVector() returns [] until vec0/sqlite-vss extension is loaded. Graceful degradation — hybrid search falls back to keyword-only.
Embeddings: needs OpenAI API key (same as PostgresEngine)

Design choices

bun:sqlite — zero npm dependencies, as suggested in SQLITE_ENGINE.md
WAL mode — concurrent reads while writing
Schema exactly matches SQLITE_ENGINE.md spec (all tables, indexes, triggers)
Same helper pattern as PostgresEngine (rowToPage, rowToChunk, rowToSearchResult)

Not yet implemented (follow-up PRs)

vec0/sqlite-vss integration for searchVector()
Fuzzy slug resolution via trigram simulation (currently LIKE fallback)
Engine detection in CLI (gbrain init --sqlite)
Parameterized test suite running against both engines

Testing

Manually tested CRUD, FTS5 search, graph traversal, and stats against a local brain.db. The engine-agnostic test harness described in SQLITE_ENGINE.md would be the ideal next step.

Motivation

From the README: "You don't need Postgres to start." This PR makes that literally true for the full feature set. A personal brain should be as simple as a single file.

I run a similar system (lmanwiki) with local SQLite + FTS5 for my personal AI agent stack. Happy to iterate on feedback.

Test plan

bun test passes (no regressions to PostgresEngine)
SQLiteEngine CRUD round-trip works
FTS5 search returns ranked results
Graph traversal with recursive CTE works at depth 3+
Stats/health queries return correct counts
Graceful degradation when no vector extension loaded

🤖 Generated with Claude Code

Implements the full BrainEngine interface using bun:sqlite (zero dependencies): - Pages CRUD with FTS5 full-text search (porter + unicode61 tokenizer) - Content chunks with BLOB embedding storage (vec0/sqlite-vss ready) - Links, backlinks, and recursive CTE graph traversal - Tags, timeline entries, raw data sidecar, page versioning - Stats + health dashboard (stale pages, orphans, dead links, embed coverage) - Ingest log, config KV store, slug resolution via LIKE fallback - WAL mode for concurrent reads, foreign keys enforced What works without any API keys: - Full CRUD, keyword search (FTS5+BM25), graph traversal, all admin ops What needs optional setup: - Vector search: returns [] until vec0/sqlite-vss extension is loaded - Embeddings: needs OpenAI API key (same as PostgresEngine) Schema follows SQLITE_ENGINE.md spec exactly. Single file database, no server, no Supabase subscription, works offline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan · 2026-04-11T04:49:07Z

Thank you for this clean SQLiteEngine implementation! The interface-driven design is excellent. We're deferring the local DB architecture decision to a future release, but this work will inform that decision. The 658-line standalone engine with FTS5+BM25 is impressive work. Appreciate the contribution!

… parity (#1519) * feat(worker-pool): shared sliding pool + bounded semaphore + PGLite-clamp wrapper T1 + T2 of the v0.41.16.0 workers cathedral. New src/core/worker-pool.ts is the canonical primitive every --workers N bulk command in this wave (and future bulk commands) builds on. Atomic-claim invariant enforced by scripts/check-worker-pool-atomicity.sh (wired into bun run verify). BudgetExhausted bypass + AbortSignal composition baked into the helper so budget caps are a structural ceiling under concurrency, not a per-caller convention. The new resolveWorkersWithClamp wrapper composes existing autoConcurrency with PGLite-clamp + per-(command, requested) stderr dedup. Deliberately NOT a modification to shared autoConcurrency (silent today, used by sync + import); embed.ts keeps GBRAIN_EMBED_CONCURRENCY || 20 default per codex #13. 23 + 12 + 9 = 44 hermetic tests pin every contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: structural + dim-check regression suites for v0.41.16.0 wave - test/embed-helper-migration.test.ts (T3): asserts embed.ts's two sliding-pool sites are migrated to runSlidingPool, pre-migration shapes (let nextIdx = 0, Promise.all(Array.from(...))) are gone, GBRAIN_EMBED_CONCURRENCY || 20 default preserved, failureLabel threads page.slug. Per codex #16/#17 these are invariant assertions, not byte-equality on progress event ORDERING. - test/embedding-dim-check-facts.test.ts (T6): readFactsEmbeddingDim covers vector(N) + halfvec(N), halfvec-before-vector regex ordering pinned (codex #19), buildFactsAlterRecipe emits DROP INDEX + ALTER USING + CREATE INDEX (codex #18, not bare REINDEX), FactsEmbeddingDimMismatchError tagged class shape, assertFactsEmbeddingDimMatchesConfig PGLite skip + Postgres absent- column skip, doctor check + insert-cast wiring assertions. - test/extract-conversation-facts-workers.test.ts (T5): helper exports (extractConversationFactsLockId, PER_PAGE_LOCK_TTL_MINUTES), structural wiring (runSlidingPool, resolveWorkersWithClamp, withRefreshingLock, LockUnavailableError, delete-orphans-first before segment loop, preflight before pool, exit 3 when lock_skipped > 0), Minion handler round-trip. - test/extract-workers.test.ts (T7): --workers wiring on all 3 inner fs-walk loops (extractForSlugs, extractLinksFromDir, extractTimelineFromDir) + CLI parse + opts threading through runExtractCore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rebump v0.41.16.0 → v0.41.17.0 (queue collision with PR #1510) PR #1510 (garrytan/dynamic-regex-conversation-formats) claimed v0.41.16.0 on master in parallel. Advancing this wave to v0.41.17.0 so both can land cleanly. Pure mechanical version bump: - VERSION + package.json → 0.41.17.0 - CHANGELOG.md header + "To take advantage of v0.41.17.0" block - TODOS.md section header + v0.41.18+ forward references - CLAUDE.md inline version tags - Regenerated llms-full.txt / llms.txt No code changes. The actual workers cathedral feature set is unchanged from the two prior commits in this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): search-image-column probes column dim at runtime CI shard 5 failed on `searchVector column routing (v0.27.1)` with: error: expected 1280 dimensions, not 1536 The test had a hardcoded `fakeText1536` helper that seeded chunks at 1536-d vectors. Master's default embedding model switched from OpenAI text-embedding-3-large (1536) to ZeroEntropy zembed-1 (1280) so a fresh PGLite brain on CI now sizes content_chunks.embedding at 1280; the test's 1536-d INSERT trips pgvector's CheckExpectedDim. Fix: probe `content_chunks.embedding` width via `readContentChunksEmbeddingDim(engine)` in `beforeAll`, store in `TEXT_DIM`, and build `fakeTextDefault(seed)` at that width. The test now passes regardless of which default ships (the model has flipped twice and may flip again). Local dev (1536 from older config) and CI fresh-install (1280 from new default) both pass. Image-side vectors stay at 1024 (matches Voyage multimodal-3 + the column's fixed width on the image side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): bump PGLite hook timeout for shard-4 deep-process files facts-anti-loop.test.ts and ingest-capture.test.ts were timing out in CI shard 4 with "beforeEach/afterEach hook timed out" after the v0.41.16.0 master merge brought migration count to 99. When these files run deep in a shard process that has already created ~20 PGLite engines, the WASM cold-start + 95-migration replay legitimately exceeds bun's 5s default hook timeout (observed 5.6s and 7.3s locally when reproducing). Bun's --timeout=60000 from scripts/test-shard.sh covers TEST timeouts but NOT hook timeouts; those default to 5s and must be set per-hook via the optional 2nd arg to beforeAll/afterAll. Reproduced locally by running the first 21 shard-4 files via head -21 /tmp/shard4-list.txt | xargs bun test → 179 pass, 2 fail (both with hook-timeout error) After fix: → 198 pass, 0 fail (the 4 anti-loop + 15 ingest-capture tests recover) Full shard 4 with fix: 955 pass, 0 fail. Full shard 5 with fix: 1261 pass, 0 fail. Also added a defensive diagnostic to the two put_page tests: if facts_backstop is missing in the response payload, throw with the full payload + isError so future failures surface the actual handler error instead of a bare "expected {...} got undefined" assertion. No-op when the test passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan closed this Apr 11, 2026

garrytan mentioned this pull request May 27, 2026

v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity #1519

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19

feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19
lmanchu wants to merge 1 commit into
garrytan:masterfrom
lmanchu:feat/sqlite-engine

lmanchu commented Apr 10, 2026

Uh oh!

garrytan commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants