feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19
Closed
lmanchu wants to merge 1 commit into
Closed
feat: SQLiteEngine — zero-cost local brain (no Supabase needed)#19lmanchu wants to merge 1 commit into
lmanchu wants to merge 1 commit into
Conversation
Implements the full BrainEngine interface using bun:sqlite (zero dependencies): - Pages CRUD with FTS5 full-text search (porter + unicode61 tokenizer) - Content chunks with BLOB embedding storage (vec0/sqlite-vss ready) - Links, backlinks, and recursive CTE graph traversal - Tags, timeline entries, raw data sidecar, page versioning - Stats + health dashboard (stale pages, orphans, dead links, embed coverage) - Ingest log, config KV store, slug resolution via LIKE fallback - WAL mode for concurrent reads, foreign keys enforced What works without any API keys: - Full CRUD, keyword search (FTS5+BM25), graph traversal, all admin ops What needs optional setup: - Vector search: returns [] until vec0/sqlite-vss extension is loaded - Embeddings: needs OpenAI API key (same as PostgresEngine) Schema follows SQLITE_ENGINE.md spec exactly. Single file database, no server, no Supabase subscription, works offline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
|
Thank you for this clean SQLiteEngine implementation! The interface-driven design is excellent. We're deferring the local DB architecture decision to a future release, but this work will inform that decision. The 658-line standalone engine with FTS5+BM25 is impressive work. Appreciate the contribution! |
7 tasks
garrytan
added a commit
that referenced
this pull request
May 27, 2026
… parity (#1519) * feat(worker-pool): shared sliding pool + bounded semaphore + PGLite-clamp wrapper T1 + T2 of the v0.41.16.0 workers cathedral. New src/core/worker-pool.ts is the canonical primitive every --workers N bulk command in this wave (and future bulk commands) builds on. Atomic-claim invariant enforced by scripts/check-worker-pool-atomicity.sh (wired into bun run verify). BudgetExhausted bypass + AbortSignal composition baked into the helper so budget caps are a structural ceiling under concurrency, not a per-caller convention. The new resolveWorkersWithClamp wrapper composes existing autoConcurrency with PGLite-clamp + per-(command, requested) stderr dedup. Deliberately NOT a modification to shared autoConcurrency (silent today, used by sync + import); embed.ts keeps GBRAIN_EMBED_CONCURRENCY || 20 default per codex #13. 23 + 12 + 9 = 44 hermetic tests pin every contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: structural + dim-check regression suites for v0.41.16.0 wave - test/embed-helper-migration.test.ts (T3): asserts embed.ts's two sliding-pool sites are migrated to runSlidingPool, pre-migration shapes (let nextIdx = 0, Promise.all(Array.from(...))) are gone, GBRAIN_EMBED_CONCURRENCY || 20 default preserved, failureLabel threads page.slug. Per codex #16/#17 these are invariant assertions, not byte-equality on progress event ORDERING. - test/embedding-dim-check-facts.test.ts (T6): readFactsEmbeddingDim covers vector(N) + halfvec(N), halfvec-before-vector regex ordering pinned (codex #19), buildFactsAlterRecipe emits DROP INDEX + ALTER USING + CREATE INDEX (codex #18, not bare REINDEX), FactsEmbeddingDimMismatchError tagged class shape, assertFactsEmbeddingDimMatchesConfig PGLite skip + Postgres absent- column skip, doctor check + insert-cast wiring assertions. - test/extract-conversation-facts-workers.test.ts (T5): helper exports (extractConversationFactsLockId, PER_PAGE_LOCK_TTL_MINUTES), structural wiring (runSlidingPool, resolveWorkersWithClamp, withRefreshingLock, LockUnavailableError, delete-orphans-first before segment loop, preflight before pool, exit 3 when lock_skipped > 0), Minion handler round-trip. - test/extract-workers.test.ts (T7): --workers wiring on all 3 inner fs-walk loops (extractForSlugs, extractLinksFromDir, extractTimelineFromDir) + CLI parse + opts threading through runExtractCore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rebump v0.41.16.0 → v0.41.17.0 (queue collision with PR #1510) PR #1510 (garrytan/dynamic-regex-conversation-formats) claimed v0.41.16.0 on master in parallel. Advancing this wave to v0.41.17.0 so both can land cleanly. Pure mechanical version bump: - VERSION + package.json → 0.41.17.0 - CHANGELOG.md header + "To take advantage of v0.41.17.0" block - TODOS.md section header + v0.41.18+ forward references - CLAUDE.md inline version tags - Regenerated llms-full.txt / llms.txt No code changes. The actual workers cathedral feature set is unchanged from the two prior commits in this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): search-image-column probes column dim at runtime CI shard 5 failed on `searchVector column routing (v0.27.1)` with: error: expected 1280 dimensions, not 1536 The test had a hardcoded `fakeText1536` helper that seeded chunks at 1536-d vectors. Master's default embedding model switched from OpenAI text-embedding-3-large (1536) to ZeroEntropy zembed-1 (1280) so a fresh PGLite brain on CI now sizes content_chunks.embedding at 1280; the test's 1536-d INSERT trips pgvector's CheckExpectedDim. Fix: probe `content_chunks.embedding` width via `readContentChunksEmbeddingDim(engine)` in `beforeAll`, store in `TEXT_DIM`, and build `fakeTextDefault(seed)` at that width. The test now passes regardless of which default ships (the model has flipped twice and may flip again). Local dev (1536 from older config) and CI fresh-install (1280 from new default) both pass. Image-side vectors stay at 1024 (matches Voyage multimodal-3 + the column's fixed width on the image side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): bump PGLite hook timeout for shard-4 deep-process files facts-anti-loop.test.ts and ingest-capture.test.ts were timing out in CI shard 4 with "beforeEach/afterEach hook timed out" after the v0.41.16.0 master merge brought migration count to 99. When these files run deep in a shard process that has already created ~20 PGLite engines, the WASM cold-start + 95-migration replay legitimately exceeds bun's 5s default hook timeout (observed 5.6s and 7.3s locally when reproducing). Bun's --timeout=60000 from scripts/test-shard.sh covers TEST timeouts but NOT hook timeouts; those default to 5s and must be set per-hook via the optional 2nd arg to beforeAll/afterAll. Reproduced locally by running the first 21 shard-4 files via head -21 /tmp/shard4-list.txt | xargs bun test → 179 pass, 2 fail (both with hook-timeout error) After fix: → 198 pass, 0 fail (the 4 anti-loop + 15 ingest-capture tests recover) Full shard 4 with fix: 955 pass, 0 fail. Full shard 5 with fix: 1261 pass, 0 fail. Also added a defensive diagnostic to the two put_page tests: if facts_backstop is missing in the response payload, throw with the full payload + isError so future failures surface the actual handler error instead of a bare "expected {...} got undefined" assertion. No-op when the test passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the
SQLiteEngineas documented indocs/SQLITE_ENGINE.md— a zero-cost, zero-dependency alternative toPostgresEngineusingbun:sqlite.One file. No server. No subscription. Works offline.
What's implemented
Full
BrainEngineinterface (all 30 methods):What works without any API keys
Everything except vector search and embeddings. FTS5+BM25 keyword search is genuinely good for structured wiki content.
What needs optional setup
searchVector()returns[]until vec0/sqlite-vss extension is loaded. Graceful degradation — hybrid search falls back to keyword-only.Design choices
bun:sqlite— zero npm dependencies, as suggested in SQLITE_ENGINE.mdrowToPage,rowToChunk,rowToSearchResult)Not yet implemented (follow-up PRs)
searchVector()gbrain init --sqlite)Testing
Manually tested CRUD, FTS5 search, graph traversal, and stats against a local brain.db. The engine-agnostic test harness described in SQLITE_ENGINE.md would be the ideal next step.
Motivation
From the README: "You don't need Postgres to start." This PR makes that literally true for the full feature set. A personal brain should be as simple as a single file.
I run a similar system (lmanwiki) with local SQLite + FTS5 for my personal AI agent stack. Happy to iterate on feedback.
Test plan
bun testpasses (no regressions to PostgresEngine)🤖 Generated with Claude Code