Skip to content

feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19

Closed
lmanchu wants to merge 1 commit into
garrytan:masterfrom
lmanchu:feat/sqlite-engine
Closed

feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19
lmanchu wants to merge 1 commit into
garrytan:masterfrom
lmanchu:feat/sqlite-engine

Conversation

@lmanchu

@lmanchu lmanchu commented Apr 10, 2026

Copy link
Copy Markdown

Summary

Implements the SQLiteEngine as documented in docs/SQLITE_ENGINE.md — a zero-cost, zero-dependency alternative to PostgresEngine using bun:sqlite.

One file. No server. No subscription. Works offline.

What's implemented

Full BrainEngine interface (all 30 methods):

  • Pages CRUD with upsert, slug validation, content hashing
  • FTS5 keyword search with BM25 ranking (porter + unicode61 tokenizer), auto-sync triggers
  • Content chunks with BLOB embedding storage (vec0/sqlite-vss ready)
  • Links + graph traversal via recursive CTE (depth configurable)
  • Tags, timeline, raw data, page versions — all parity with Postgres
  • Stats + health — embed coverage, stale pages, orphans, dead links
  • Ingest log, config KV, slug resolution (LIKE fallback for fuzzy matching)

What works without any API keys

Everything except vector search and embeddings. FTS5+BM25 keyword search is genuinely good for structured wiki content.

What needs optional setup

  • Vector search: searchVector() returns [] until vec0/sqlite-vss extension is loaded. Graceful degradation — hybrid search falls back to keyword-only.
  • Embeddings: needs OpenAI API key (same as PostgresEngine)

Design choices

  • bun:sqlite — zero npm dependencies, as suggested in SQLITE_ENGINE.md
  • WAL mode — concurrent reads while writing
  • Schema exactly matches SQLITE_ENGINE.md spec (all tables, indexes, triggers)
  • Same helper pattern as PostgresEngine (rowToPage, rowToChunk, rowToSearchResult)

Not yet implemented (follow-up PRs)

  • vec0/sqlite-vss integration for searchVector()
  • Fuzzy slug resolution via trigram simulation (currently LIKE fallback)
  • Engine detection in CLI (gbrain init --sqlite)
  • Parameterized test suite running against both engines

Testing

Manually tested CRUD, FTS5 search, graph traversal, and stats against a local brain.db. The engine-agnostic test harness described in SQLITE_ENGINE.md would be the ideal next step.

Motivation

From the README: "You don't need Postgres to start." This PR makes that literally true for the full feature set. A personal brain should be as simple as a single file.

I run a similar system (lmanwiki) with local SQLite + FTS5 for my personal AI agent stack. Happy to iterate on feedback.

Test plan

  • bun test passes (no regressions to PostgresEngine)
  • SQLiteEngine CRUD round-trip works
  • FTS5 search returns ranked results
  • Graph traversal with recursive CTE works at depth 3+
  • Stats/health queries return correct counts
  • Graceful degradation when no vector extension loaded

🤖 Generated with Claude Code

Implements the full BrainEngine interface using bun:sqlite (zero dependencies):

- Pages CRUD with FTS5 full-text search (porter + unicode61 tokenizer)
- Content chunks with BLOB embedding storage (vec0/sqlite-vss ready)
- Links, backlinks, and recursive CTE graph traversal
- Tags, timeline entries, raw data sidecar, page versioning
- Stats + health dashboard (stale pages, orphans, dead links, embed coverage)
- Ingest log, config KV store, slug resolution via LIKE fallback
- WAL mode for concurrent reads, foreign keys enforced

What works without any API keys:
- Full CRUD, keyword search (FTS5+BM25), graph traversal, all admin ops

What needs optional setup:
- Vector search: returns [] until vec0/sqlite-vss extension is loaded
- Embeddings: needs OpenAI API key (same as PostgresEngine)

Schema follows SQLITE_ENGINE.md spec exactly. Single file database,
no server, no Supabase subscription, works offline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan

Copy link
Copy Markdown
Owner

Thank you for this clean SQLiteEngine implementation! The interface-driven design is excellent. We're deferring the local DB architecture decision to a future release, but this work will inform that decision. The 658-line standalone engine with FTS5+BM25 is impressive work. Appreciate the contribution!

@garrytan garrytan closed this Apr 11, 2026
garrytan added a commit that referenced this pull request May 27, 2026
… parity (#1519)

* feat(worker-pool): shared sliding pool + bounded semaphore + PGLite-clamp wrapper

T1 + T2 of the v0.41.16.0 workers cathedral. New src/core/worker-pool.ts is
the canonical primitive every --workers N bulk command in this wave (and
future bulk commands) builds on. Atomic-claim invariant enforced by
scripts/check-worker-pool-atomicity.sh (wired into bun run verify).
BudgetExhausted bypass + AbortSignal composition baked into the helper so
budget caps are a structural ceiling under concurrency, not a per-caller
convention.

The new resolveWorkersWithClamp wrapper composes existing autoConcurrency
with PGLite-clamp + per-(command, requested) stderr dedup. Deliberately
NOT a modification to shared autoConcurrency (silent today, used by sync
+ import); embed.ts keeps GBRAIN_EMBED_CONCURRENCY || 20 default per
codex #13.

23 + 12 + 9 = 44 hermetic tests pin every contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: structural + dim-check regression suites for v0.41.16.0 wave

- test/embed-helper-migration.test.ts (T3): asserts embed.ts's two
  sliding-pool sites are migrated to runSlidingPool, pre-migration
  shapes (let nextIdx = 0, Promise.all(Array.from(...))) are gone,
  GBRAIN_EMBED_CONCURRENCY || 20 default preserved, failureLabel
  threads page.slug. Per codex #16/#17 these are invariant assertions,
  not byte-equality on progress event ORDERING.
- test/embedding-dim-check-facts.test.ts (T6): readFactsEmbeddingDim
  covers vector(N) + halfvec(N), halfvec-before-vector regex ordering
  pinned (codex #19), buildFactsAlterRecipe emits DROP INDEX + ALTER
  USING + CREATE INDEX (codex #18, not bare REINDEX),
  FactsEmbeddingDimMismatchError tagged class shape,
  assertFactsEmbeddingDimMatchesConfig PGLite skip + Postgres absent-
  column skip, doctor check + insert-cast wiring assertions.
- test/extract-conversation-facts-workers.test.ts (T5): helper
  exports (extractConversationFactsLockId, PER_PAGE_LOCK_TTL_MINUTES),
  structural wiring (runSlidingPool, resolveWorkersWithClamp,
  withRefreshingLock, LockUnavailableError, delete-orphans-first
  before segment loop, preflight before pool, exit 3 when lock_skipped
  > 0), Minion handler round-trip.
- test/extract-workers.test.ts (T7): --workers wiring on all 3 inner
  fs-walk loops (extractForSlugs, extractLinksFromDir,
  extractTimelineFromDir) + CLI parse + opts threading through
  runExtractCore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebump v0.41.16.0 → v0.41.17.0 (queue collision with PR #1510)

PR #1510 (garrytan/dynamic-regex-conversation-formats) claimed v0.41.16.0
on master in parallel. Advancing this wave to v0.41.17.0 so both can land
cleanly. Pure mechanical version bump:

- VERSION + package.json → 0.41.17.0
- CHANGELOG.md header + "To take advantage of v0.41.17.0" block
- TODOS.md section header + v0.41.18+ forward references
- CLAUDE.md inline version tags
- Regenerated llms-full.txt / llms.txt

No code changes. The actual workers cathedral feature set is unchanged
from the two prior commits in this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): search-image-column probes column dim at runtime

CI shard 5 failed on `searchVector column routing (v0.27.1)` with:
  error: expected 1280 dimensions, not 1536

The test had a hardcoded `fakeText1536` helper that seeded chunks at
1536-d vectors. Master's default embedding model switched from OpenAI
text-embedding-3-large (1536) to ZeroEntropy zembed-1 (1280) so a fresh
PGLite brain on CI now sizes content_chunks.embedding at 1280; the
test's 1536-d INSERT trips pgvector's CheckExpectedDim.

Fix: probe `content_chunks.embedding` width via
`readContentChunksEmbeddingDim(engine)` in `beforeAll`, store in
`TEXT_DIM`, and build `fakeTextDefault(seed)` at that width. The test
now passes regardless of which default ships (the model has flipped
twice and may flip again). Local dev (1536 from older config) and CI
fresh-install (1280 from new default) both pass.

Image-side vectors stay at 1024 (matches Voyage multimodal-3 + the
column's fixed width on the image side).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): bump PGLite hook timeout for shard-4 deep-process files

facts-anti-loop.test.ts and ingest-capture.test.ts were timing out in CI
shard 4 with "beforeEach/afterEach hook timed out" after the v0.41.16.0
master merge brought migration count to 99. When these files run deep in
a shard process that has already created ~20 PGLite engines, the WASM
cold-start + 95-migration replay legitimately exceeds bun's 5s default
hook timeout (observed 5.6s and 7.3s locally when reproducing).

Bun's --timeout=60000 from scripts/test-shard.sh covers TEST timeouts
but NOT hook timeouts; those default to 5s and must be set per-hook via
the optional 2nd arg to beforeAll/afterAll.

Reproduced locally by running the first 21 shard-4 files via
  head -21 /tmp/shard4-list.txt | xargs bun test
  → 179 pass, 2 fail (both with hook-timeout error)

After fix:
  → 198 pass, 0 fail (the 4 anti-loop + 15 ingest-capture tests recover)

Full shard 4 with fix:  955 pass, 0 fail.
Full shard 5 with fix:  1261 pass, 0 fail.

Also added a defensive diagnostic to the two put_page tests: if
facts_backstop is missing in the response payload, throw with the full
payload + isError so future failures surface the actual handler error
instead of a bare "expected {...} got undefined" assertion. No-op when
the test passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants