v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract#1275
Merged
Conversation
…+ 2 sources
The foundation for the ingestion cathedral (CEO+DX+Eng plan-reviewed).
Plan: ~/.claude/plans/system-instruction-you-are-working-ethereal-riddle.md
WHAT YOU CAN NOW DO
The IngestionSource public contract is locked. Skillpack publishers can
build third-party ingestion sources (Granola, Linear, Mail, voice, OCR,
etc.) and ship them through the v0.37 skillpack registry. The locked
surface lives at the new package subpaths:
import { IngestionSource, IngestionEvent } from 'gbrain/ingestion';
import { IngestionTestHarness, expectEvent } from 'gbrain/ingestion/test-harness';
Both subpaths are pinned by test/public-exports.test.ts — breaking either
is a major-version change.
WHAT THIS COMMIT BUILDS
Foundation:
- src/core/ingestion/types.ts (IngestionSource, IngestionEvent,
IngestionSourceContext, validateIngestionEvent, computeContentHash,
INGESTION_SOURCE_API_VERSION, INGESTION_CONTENT_TYPES)
- src/core/ingestion/dedup.ts (24h content-hash LRU, 5000-entry cap)
- src/core/ingestion/skillpack-load.ts (gbrain.plugin.json discovery for
third-party sources, api_version compat with paste-ready upgrade hints,
in-process trust model for v1)
- src/core/ingestion/daemon.ts (IngestionDaemon: in-process source
supervision sibling to v0.34.3.0 ChildWorkerSupervisor pattern, plus
validate -> dedup -> rate-limit -> dispatch pipeline + health surface)
- src/core/ingestion/test-harness.ts (publisher-facing test utility with
fake clock + in-memory event bus + expectEvent matchers + engine proxy
that throws on access so publishers know what they're depending on)
- src/core/ingestion/index.ts (barrel for gbrain/ingestion subpath)
First two built-in sources prove the abstraction:
- file-watcher (chokidar over the brain repo; 1s debounce; honors
pruneDir from src/core/sync.ts; symlinks rejected; Linux ENOSPC
surfaces a paste-ready sysctl hint at runtime)
- inbox-folder (~/.gbrain/inbox/ target for iOS Shortcuts / AirDrop /
Drafts; auto-archives processed files into .archived/YYYY-MM-DD/;
symlink rejection; world-writable dir warning; routes content-type by
extension)
Public exports surface (count 18 -> 20) pinned in:
- package.json exports map
- test/public-exports.test.ts EXPECTED_EXPORTS + count gate
- scripts/check-exports-count.sh baseline
ARCHITECTURE-LOCKED DECISIONS (from /plan-eng-review)
E1 webhook source process boundary: webhook source will live INSIDE
serve --http (NOT this daemon) when it lands in the next commit. Daemon
supervises only daemon-side sources.
E2 content-type processor execution: hybrid by size (inline <1MB,
Minion handlers >1MB). Processors land in a later commit.
E3 publisher TTHW: chokidar v4.0.3 across platforms; ephemeral PGLite
persistence and Linux inotify-limit doctor probe land in later commits.
E4 migration v80 (provenance columns) + forward-reference bootstrap:
lands with put_page write-through in a later commit.
DX-locked decisions (from /plan-devex-review):
- Source error semantics: throws bubble to daemon; supervisor backoff.
- IngestionTestHarness exported as gbrain/ingestion/test-harness.
- api_version field on gbrain.plugin.json with loud-fail on mismatch.
TESTS
192 cases across 8 test files, 0 failures:
- test/ingestion/types.test.ts (28 cases pinning the contract)
- test/ingestion/dedup.test.ts (15 cases for LRU + TTL + collision)
- test/ingestion/skillpack-load.test.ts (22 cases for manifest
validation + api_version compat + collision policy + module load)
- test/ingestion/test-harness.test.ts (24 cases for harness lifecycle +
clock + healthCheck + every expectEvent matcher)
- test/ingestion/daemon.test.ts (19 cases for supervision + dispatch
pipeline + health surface + per-source config + logger wrapping)
- test/ingestion/sources/file-watcher.test.ts (10 cases including
ENOSPC sysctl-hint surfacing)
- test/ingestion/sources/inbox-folder.test.ts (24 cases including
symlink rejection + world-writable warning + archive-loop-prevention)
- test/public-exports.test.ts (2 new cases for the new subpaths)
typecheck clean. bun run verify gate passes.
NEXT IN WAVE
Subsequent commits in this PR ship webhook source (serve --http route),
cron-scheduler refactor + OpenClaw credential auto-migrate, content-type
processors (PDF + image OCR + audio transcribe + video keyframe), put_page
write-through with serializePageToMarkdown DRY extract, migration v80
+ bootstrap probes, gbrain capture verb, publisher DX cathedral (init
scaffold extension + gbrain ingest test [--watch] + tail + validate),
daemon rename autopilot -> ingest with forever-alias, doctor inotify
probe on Linux, skillpack contract docs + reference pack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n handler Lands the v0.38 ingestion cathedral's webhook source. Per the /plan-eng-review E1 decision, the webhook source lives INSIDE `serve --http` (NOT the ingestion daemon) so there is no new IPC: the HTTP route submits Minion jobs directly into the existing queue, and the daemon supervises only daemon-side sources. WHAT YOU CAN NOW DO With `gbrain serve --http` running and an OAuth client minted, any HTTP caller (Zapier, IFTTT, n8n, Make, Apple Shortcuts) can POST a captured thought into the brain: curl -X POST https://your-brain.example.com/ingest \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: text/markdown" \ -d "# captured from my Shortcut" The route auths via OAuth (write scope required), validates the content-type, enforces a 1MB payload cap and per-IP rate limit (100 events / 10s), submits an `ingest_capture` Minion job tagged `untrusted_payload: true`, and returns 202 Accepted with the job id. The job materializes the page under `inbox/YYYY-MM-DD-<hash6>` by default (overridable via X-Gbrain-Slug header) so the user has a predictable triage location. WHAT THIS COMMIT BUILDS - src/core/minions/handlers/ingest-capture.ts (new) — handler that takes an IngestionEvent payload, resolves a slug via fallback chain (job.data.slug -> event.metadata.slug -> inbox/<date>-<hash6>), validates the event at the handler boundary, REJECTS binary content_types with a paste-ready hint to install a processor skillpack, and routes through importFromContent. Defaults noEmbed: true (embed is a separate Minion job, matching the sync handler's pattern). - src/commands/jobs.ts — registers `ingest_capture` in registerBuiltinHandlers alongside sync/embed/extract. - src/commands/serve-http.ts — POST /ingest route with: - OAuth write-scope gate via requireBearerAuth({requiredScopes:['write']}) - 100 events / 10s rate limiter (sibling to ccRateLimiter) - Content-type allowlist: text/markdown, text/plain, text/html, application/json; binary REJECTED with HTTP 415 - 1 MB payload cap (configurable via GBRAIN_INGEST_MAX_BYTES) - Caller-overridable source identity via X-Gbrain-Source-Id / X-Gbrain-Source-Uri / X-Gbrain-Content-Type / X-Gbrain-Slug headers — useful for downstream tools that want clean provenance - untrusted_payload: true ALWAYS (network input) - Idempotency on (client_id, content_hash) so simultaneous retries collapse to one job - maxWaiting: 50 per client so a runaway integration can't monopolize the queue - Audit row in mcp_request_log + SSE broadcast for the admin feed TESTS test/ingestion/ingest-capture.test.ts (15 cases against PGLite): - defaultSlugForEvent helper (3 cases pinning shape + UTC + determinism) - slug resolution fallback chain (3 cases) - validation + content-type routing (5 cases including binary rejection + untrusted_payload round-trip) - importFromContent integration (3 cases including content_hash dedup via status='skipped' on repeat) 207 total ingestion tests passing. typecheck clean. NEXT IN WAVE cron-scheduler refactor + OpenClaw credential auto-migrate; content-type processors (PDF + image OCR + audio transcribe + video keyframe); put_page write-through + serializePageToMarkdown DRY extract + migration v80 + bootstrap probes; gbrain capture verb; publisher DX cathedral (init scaffold + gbrain ingest test --watch + tail + validate); daemon rename autopilot -> ingest with forever-alias; doctor inotify probe; skillpack contract docs + reference pack + VERSION bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHAT YOU CAN NOW DO
The drift class is dead. Every `gbrain put_page` (CLI or MCP, local
or remote) now lands its markdown file on disk alongside the DB row
whenever `sync.repo_path` is configured. The page is queryable
immediately AND visible to git, your editor, and downstream tools.
Pre-v0.38, put_page wrote ONLY to the DB and synthesize/extract paths
had to reverse-render later. The v0.35.6.0 phantom-redirect pass was
the cleanup for what THIS commit prevents in the first place.
# local CLI
gbrain put inbox/test < my-thought.md
# file lands at ${sync.repo_path}/inbox/test.md AND in the DB
# MCP remote (Zapier / Cursor / Claude Desktop)
curl -X POST /mcp ... '{"method":"tools/call","params":{"name":"put_page",...}}'
# server-side write-through fires, agent gets a normal success response
# untrusted_payload tagging applied (no auto-link, slug-allowlist gate)
Provenance frontmatter stamped on every write so future sync round-trips
know where the page came from:
ingested_via: put_page # local CLI
ingested_via: 'mcp:put_page' # MCP remote
ingested_at: 2026-05-21T04:...
WHAT THIS COMMIT BUILDS
1. Migration v80 — `pages_provenance_columns` adds four nullable
columns to `pages`: `ingested_via`, `ingested_at`, `source_uri`,
`source_kind`. ADD COLUMN with no DEFAULT is metadata-only on
Postgres 11+ and PGLite 17.5; instant on tables of any size. The
four columns get NULL on every historical page (pre-v0.38 pages
never had provenance).
2. DRY extract — `serializePageToMarkdown(page, tags, opts)` and
`resolvePageFilePath(brainDir, slug, sourceId)` in `src/core/markdown.ts`.
The dream-cycle's `renderPageToMarkdown` (synthesize.ts) and the new
put_page write-through path were going to have 90% duplicate bodies.
They now share one foundation; the dream version is a 4-line wrapper
that passes `frontmatterOverrides: {dream_generated: true, ...}`.
Future markdown-shape changes happen in one place.
3. put_page write-through (`src/core/operations.ts`) — after
importFromContent succeeds, resolves sync.repo_path, computes the
v0.32.8 source-aware path layout (default: brainDir/<slug>.md;
non-default: brainDir/.sources/<id>/<slug>.md), serializes the
freshly-written Page via `serializePageToMarkdown`, writes the file.
Returns a `write_through: {written, path}` field in the put_page
response so callers can see what happened.
Trust gating:
- subagent sandbox (viaSubagent without allowedSlugPrefixes) → DB-only
- dry-run → DB-only (handler's early-return short-circuits before
write-through; documented via the dry_run response field)
- no sync.repo_path configured → DB-only, skipped reason returned
- sync.repo_path points at a non-existent dir → DB-only, skipped
- all other writes → write-through
Failure isolation: disk-write failures are LOGGED loud but do NOT
roll back the DB write. DB is the durable record; the
phantom-redirect pass exists for drift cleanup if it ever shows up.
TESTS
- test/ingestion/put-page-write-through.test.ts (10 cases against PGLite):
happy path (file land, provenance stamp local + remote), trust gating
(subagent sandbox, dry-run, trusted-workspace), config edges (no
repo_path, missing dir), multi-source filing (.sources/<id>/),
failure isolation (DB write survives a disk failure).
- Migration v80 verified across both engines via the existing
test/migrate.test.ts + test/bootstrap.test.ts coverage (~125 cases).
369 total tests passing in the ingestion + markdown + migrate bundle.
typecheck clean.
NOTES
- Bootstrap probes for the v80 provenance columns are NOT yet added
to applyForwardReferenceBootstrap on either engine. This is safe
for v0.38 because no SCHEMA_SQL CREATE INDEX or FK references the
new columns — migration v80 is the only consumer, and it runs
AFTER SCHEMA_SQL replay. A future commit may add bootstrap probes
+ REQUIRED_BOOTSTRAP_COVERAGE entries as defense-in-depth (eng
review E4).
- The trusted-workspace path (dream cycle's reverseWriteRefs in
synthesize.ts) still runs its own write at synthesize phase time.
Both paths writing the same file is idempotent (byte-identical
serialization), but a future commit may simplify reverseWriteRefs
to skip pages whose file already matches.
NEXT IN WAVE
gbrain capture verb (the single human-facing entrypoint); daemon
rename autopilot -> ingest with forever-alias + plist migration;
doctor inotify probe (Linux); content-type processor router
(PDF + image OCR + audio transcribe stubs); cron-scheduler refactor
+ OpenClaw credential auto-migrate; skillpack contract docs +
reference pack; VERSION bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHAT YOU CAN NOW DO
One command, local or thin-client, synchronous receipt with the resulting
page slug. The answer to "what is the best way to get data into the brain?"
is now: just type `gbrain capture` and the right thing happens.
# the basic case
gbrain capture "remember to follow up on the X deal"
# from a file
gbrain capture --file ./notes/today.md --slug daily/2026-05-21
# from a pipe (shell pipelines)
echo "from stdin" | gbrain capture --stdin
# script-friendly: print just the slug
SLUG=$(gbrain capture "a thought" --quiet)
# JSON for agents
gbrain capture "..." --json
Default slug is `inbox/YYYY-MM-DD-<hash8>` — deterministic for the same
content so re-running idempotently lands the same page. Receipt block
on stdout shows slug + status + content_hash + on-disk path so you
can confirm where the page went without rerunning `gbrain query`.
The local-install path routes through the put_page operation with the
v0.38 write-through plumbing landed in the prior commit, so the page
hits both the DB AND the file tree in one move. Thin-client installs
route through `callRemoteTool('put_page', ...)` so the server's
write-through handles disk persistence the same way.
WHAT THIS COMMIT BUILDS
- src/commands/capture.ts (new ~290 LOC):
- `defaultSlug(content)` — UTC-stable `inbox/YYYY-MM-DD-<hash8>`
- `parseArgs(args)` — positional + flag parsing with --file / --stdin
/ --slug / --type / --source / --quiet / --json / --help
- `buildContent(rawBody, opts)` — wraps unstructured prose in
frontmatter (type + title + captured_via + captured_at) and a
leading `# Title` heading; passes through if the body already
looks like markdown
- `runCapture(engine, args)` — local install routes through the
in-process put_page operation; thin-client routes through MCP.
`--quiet` prints just the slug; `--json` prints structured output;
default prints a 5-line receipt block.
- src/cli.ts:
- Adds `case 'capture'` dispatch
- Adds `'capture'` to the CLI_ONLY set so cli.ts wires it correctly
TESTS
test/commands/capture.test.ts (21 cases against PGLite):
- defaultSlug helper: shape + determinism + UTC math
- parseArgs: positional + multi-token join + every flag
- buildContent: prose wrapping, --type override, no double-wrap
for pre-frontmattered content, title cap at 80 chars,
--source provenance stamp
- Integration: inline content lands in DB + on disk, default slug
shape, --file reads from disk, --json structured output,
--help returns without engine roundtrip
271 total tests passing in the bundle. typecheck clean.
NOTES
- Thin-client routing relies on `callRemoteTool('put_page', ...)` from
src/core/mcp-client.ts. Identical UX to the local path because the
server's put_page handler runs the same write-through plumbing.
- buildContent's "looks like markdown" heuristic is intentionally
simple — first-line heading or frontmatter delimiter is the trigger.
Users who care about exact formatting pass a pre-formatted --file.
NEXT IN WAVE
Daemon rename autopilot -> ingest with forever-alias + plist migration;
doctor inotify probe (Linux); content-type processor router
(PDF + image OCR + audio transcribe stubs); cron-scheduler refactor
+ OpenClaw credential auto-migrate; skillpack contract docs +
reference pack; VERSION bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…skill VERSION 0.37.1.0 → 0.38.0.0 (trio: VERSION, package.json, CHANGELOG header). CHANGELOG entry written in user-facing ELI10-lead voice per CLAUDE.md release-summary rules. README's pre-loop section gains a new "How to get data in (v0.38+)" block leading with `gbrain capture`. skills/capture/SKILL.md (NEW) so agents route "capture this" / "save this thought" / "remember this" / "drop this in the inbox" / "save to brain" to the capture verb. RESOLVER.md updated with the new triggers (sits above idea-ingest/media-ingest/meeting-ingestion in the content-ingestion section as the "simple thought" path). E2E roundtrip test (test/e2e/ingestion-roundtrip.test.ts) covers the gap: inbox-folder source -> daemon -> ingest_capture handler -> DB page, including: - Full pipeline: file drop appears as page in DB + file moves to .archived/ - Dedup catches byte-identical content from a different filename - Multi-source coordination: two distinct inbox dirs, two sources, daemon ingests both events independently The test runs against an in-memory PGLite (no DATABASE_URL needed) so it exercises the substrate-level wiring in the standard test suite. A follow-up commit can add a full-process e2e (gbrain serve --http + real OAuth client + POST /ingest) that requires DATABASE_URL. 399/399 v0.38 wave tests passing (910 assertions). typecheck clean. bun run verify gate green across all 14 shell checks. DEFERRED TO FOLLOW-UP RELEASES (called out in CHANGELOG) - Daemon rename autopilot -> ingest + forever-alias + plist migration - cron-scheduler skill refactor + OpenClaw credential auto-migrate - Content-type processors (PDF / OCR / audio / video) - gbrain doctor inotify probe (Linux) - Publisher DX cathedral: gbrain skillpack init --kind=ingestion-source, gbrain ingest test --watch, ingest tail, ingest validate - Reference pack at examples/skillpack-ingestion-reference/ + 3-stage tutorial in docs/ingestion-source-skillpack.md These are polish items; the substrate is shipped and queryable, and skillpack publishers can build sources against the IngestionTestHarness public export today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conformance The v0.38.0.0 release-hygiene commit landed cleanly against the v0.38 wave suite but tripped 3 categories of full-suite tests. This commit fixes each. The remaining failure (doctorReportRemote > "healthy" status) was verified pre-existing via `git stash + bun test` and is not caused by v0.38; left alone. Fix 1 — `schema-bootstrap-coverage.test.ts` (s1) The test parses MIGRATIONS for ALTER TABLE ADD COLUMN statements and fails if any column is not covered by `applyForwardReferenceBootstrap` on both engines. Migration v80's four provenance columns triggered the failure. Bootstrap probes added to both engines + 4 entries appended to REQUIRED_BOOTSTRAP_COVERAGE: - src/core/pglite-engine.ts — 4 EXISTS probes + state field + needs flag + ALTER TABLE block when bootstrap fires - src/core/postgres-engine.ts — same pattern - test/schema-bootstrap-coverage.test.ts — 4 coverage entries Fix 2 — `check-resolvable.test.ts` (s3 — orphan_trigger) RESOLVER.md references skills via name; check-resolvable cross-checks against skills/manifest.json. The new `capture` skill was missing the manifest entry; added between `brain-ops` and `idea-ingest` so the manifest order mirrors the resolver order. Fix 3 — `skills-conformance.test.ts` (s8) Every SKILL.md must have `## Contract`, `## Output Format`, and `## Anti-Patterns` sections. skills/capture/SKILL.md was missing all three (initial draft skipped them); now compliant with concrete content per the v0.38 contract. Fix 4 — `build-llms.test.ts` (s6) README + CHANGELOG edits in the release-hygiene commit caused llms-full.txt to drift behind. Regenerated via `bun run build:llms`. Per CLAUDE.md: any user-facing docs edit MUST run build:llms before push. The full bun-test parallel runner now passes everywhere except the pre-existing `doctorReportRemote > healthy status` failure (50/100 score on an empty fresh brain — this is a pre-v0.38 health-score tuning issue and orthogonal to ingestion work). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflicts resolved: - VERSION → 0.38.0.0 (higher semver wins) - package.json → 0.38.0.0 (trio agreement) - CHANGELOG.md → keep my v0.38.0.0 entry on top; master's v0.37.2–v0.37.9 entries stay below - src/core/markdown.ts → combined imports (Page, PageType from ./types.ts AND safeLoad as yamlSafeLoad from js-yaml) - src/core/migrate.ts → migration COLLISION at v80. Master claimed v80 with takes_unresolvable_quality_v0_37_2_0 first; my pages_provenance_ columns renumbered v80 → v81. Both migrations preserved. Bootstrap probe comments + REQUIRED_BOOTSTRAP_COVERAGE comment updated to reference v81 (the migration number, not the column shape, moved). bun install picked up new deps from master's lockfile: @types/js-yaml, fast-check. bun run typecheck → clean.
Renumbers the in-flight ingestion-cathedral release to v0.38.1.0. Trio (VERSION, package.json, CHANGELOG.md) bumped together. bun run typecheck → clean.
Conflicts resolved: - VERSION → 0.38.1.0 (higher semver wins; master bumped 0.37.9.0 → 0.37.10.0) - package.json → 0.38.1.0 (trio agreement) - CHANGELOG.md → my v0.38.1.0 entry stays on top; master's new v0.37.10.0 entry preserved directly below Master's v0.37.10.0 brings the init env-detection + interactive picker + preflight invariants wave (#1278). No collisions with v0.38 ingestion substrate. bun install + bun run typecheck → clean.
Conflicts resolved: - VERSION → 0.38.1.0 (higher semver wins; master bumped 0.37.10.0 → 0.37.11.0) - package.json → 0.38.1.0 (trio agreement) - CHANGELOG.md → my v0.38.1.0 entry stays on top; master's new v0.37.11.0 entry inserted between mine and v0.37.10.0 - src/cli.ts CLI_ONLY Set → union of master's `reinit-pglite` and my `capture` CLI verbs Master's v0.37.11.0 brings the fresh-install PGLite embedding setup fix wave (#1286): default vector(1280) schema matching the gateway's zembed-1 default, `gbrain reinit-pglite` wipe-and-reinit command, and proper ZE API key plumbing. No collisions with v0.38 ingestion substrate beyond the cli.ts dispatcher Set. bun install + bun run typecheck → clean.
Master sits at 0.37.11.0; 0.38.0.0 is the natural next slot rather than skipping a release. Trio (VERSION, package.json, CHANGELOG.md) bumped together. Migration v81 + ingestion substrate stay identical — this is a header-only renumber. bun run typecheck → clean.
…v81 + webhook E2E
Three gaps surfaced from a v0.38 audit against what shipped vs what was
covered. All three filled:
1. **test/markdown-serializer.test.ts** (NEW, 19 cases) — pure-function
coverage of `serializePageToMarkdown` + `resolvePageFilePath`, the
DRY extract that the dream-cycle reverse-render and put_page
write-through both consume. Pre-fix nothing pinned the
frontmatter-override merge precedence, the type/title defaults, or
the source-aware filing layout (default → `<brainDir>/<slug>.md`,
non-default → `<brainDir>/.sources/<source_id>/<slug>.md`). Future
schema-shape changes to either helper now surface immediately.
2. **test/migrate.test.ts — v81 cases** (10 new cases, two describe
blocks) — structural assertions on `pages_provenance_columns`
(four nullable columns, no NOT NULL, no DEFAULT, no index — the
ADD COLUMN stays metadata-only) plus a PGLite round-trip that
asserts the columns appear post-`initSchema`, accept direct UPDATEs,
and survive the historical-page NULL scenario. The
schema-bootstrap-coverage test already pinned the forward-reference
probe contract; this fills the migrate.test.ts contract gap.
3. **test/e2e/serve-http-ingest-webhook.test.ts** (NEW, 16 cases) — HTTP
contract coverage for POST /ingest. The pre-existing
ingestion-roundtrip E2E explicitly notes "e2e (gbrain serve --http +
POST /ingest + real OAuth) is a separate" thing — it covers the
in-process daemon → handler → DB pipeline, NOT the real HTTP route.
This file fills that gap. Spawns real gbrain serve --http against
real Postgres, mints OAuth tokens with various scopes, exercises:
- Auth gate (missing → 401; read-only → 403)
- Body validation (empty → 400 with error: empty_body)
- Content-type allowlist (image/png → 415 with skillpack hint;
application/pdf → 415; text/plain + application/json + text/html
all accepted; unknown text/* falls through to text/plain)
- X-Gbrain-Content-Type / Source-Id / Source-Uri / Slug header
overrides
- Idempotency (same content + same client = identical job_id via
queue dedup on content_hash)
Also wires three new entries into `scripts/e2e-test-map.ts` so changes
to `src/commands/serve-http.ts`, `src/core/ingestion/**`, or the
`ingest-capture` Minion handler auto-trigger the relevant E2Es under
`bun run ci:local:diff`.
Verified locally:
- bun test test/markdown-serializer.test.ts → 19/19 green
- bun test test/migrate.test.ts -t "v81" → 10/10 green
- bun test test/e2e/serve-http-ingest-webhook.test.ts (real Postgres on
ephemeral 5435) → 16/16 green
- bun test test/select-e2e.test.ts → 24/24 green (selector test still
honors the v0.38 entries)
- bun run typecheck → clean
E2E DB lifecycle handled per CLAUDE.md (spin up pgvector:pg16 on a free
port, bootstrap via `gbrain doctor --json`, run, tear down).
5 tasks
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297) v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289) v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275) v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286) v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278) v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252) v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267) v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253) v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246) v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229) feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228) v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215) v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211) v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214) v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208) v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The v0.38 ingestion cathedral. Answers the user-feedback question "what is the best way to get data into the brain?" with one verb:
gbrain capture. Local or hosted, synchronous receipt, DB + disk in one move.gbrain capture "..."— single human-facing entrypoint (local + thin-client routing)put_pagewrite-through — page lands in DB AND on disk in one move. Closes the drift class the v0.35.6 phantom-redirect pass was cleaning up.serve --http— OAuth write-scope-gated, untrusted_payload tagged, for Zapier / IFTTT / Apple ShortcutsIngestionSourceversioned public API atgbrain/ingestion+gbrain/ingestion/test-harness— third-party skillpack publishers (Granola, voice, OCR, mail, etc.) can build sources against the locked contract today~/.gbrain/inbox/) makes iOS Shortcuts / AirDrop / Drafts zero-friction mobile captureingested_via,ingested_at,source_uri,source_kind) with bootstrap probes on both enginesserializePageToMarkdown+resolvePageFilePathDRY extract — the dream-cycle renderer and the write-through renderer share one foundationSix bisect-friendly commits (~6,750 LOC). CEO + DX + Eng plan-reviewed and persisted at
~/.claude/plans/system-instruction-you-are-working-ethereal-riddle.md. CHANGELOG entry written in user-facing ELI10-lead voice.Plan-review trail
/plan-ceo-review— SCOPE_EXPANSION mode, 16 proposals / 10 accepted / 14 deferred, CEO plan at~/.gstack/projects/garrytan-gbrain/ceo-plans/2026-05-20-ingestion-cathedral.md/plan-devex-review— Library/SDK persona (YC founder shipping a weekend skillpack source), Champion-tier publisher TTHW target (<10min), score 5/10 → 9/10/plan-eng-review— 5 architecture findings resolved (E1: webhook source lives in serve --http not daemon; E2: hybrid content-type processor execution; E3: Linux inotify probe + ephemeral PGLite persistence; E4: bootstrap probes for v80; DRY: serializePageToMarkdown extract)Test plan
doctorReportRemote > healthy status— verified pre-existing via git stash; not caused by v0.38)bun run verifygate green (all 14 shell checks)Deferred to follow-up releases (called out in CHANGELOG)
gbrain doctorinotify-limit probe (Linux)gbrain skillpack init --kind=ingestion-source,gbrain ingest test --watch,gbrain ingest tail,gbrain ingest validateexamples/skillpack-ingestion-reference/+ 3-stage tutorial indocs/ingestion-source-skillpack.mdThese are polish items; the substrate ships and is queryable. Skillpack publishers can build sources against the IngestionTestHarness public export today.
🤖 Generated with Claude Code
Master merge — v0.37.9.0 → v0.38.0.0
Merged origin/master at sha
2f645b29into the wave with five conflicts.All resolved; trio (VERSION, package.json, CHANGELOG.md) agrees on
0.38.0.0.Migration version collision resolved. Master's v0.37.2.0 hotfix
(
takes_unresolvable_quality_v0_37_2_0) claimed migration v80 first.My
pages_provenance_columnsrenumbered v80 → v81. Both migrationspreserved; they touch unrelated tables (takes vs pages). Bootstrap probe
comments +
REQUIRED_BOOTSTRAP_COVERAGEcomment updated to reference v81.Other resolved conflicts:
VERSION→0.38.0.0(higher semver wins over master's0.37.9.0)package.json→0.38.0.0(trio agreement)CHANGELOG.md→ my v0.38.0.0 entry on top; master's v0.37.2–v0.37.9entries preserved below
src/core/markdown.ts→ combined imports (Page, PageTypefrom./types.tsplus master'ssafeLoad as yamlSafeLoadfromjs-yaml)Post-merge:
bun install --frozen-lockfilepicked up new deps (@types/js-yaml,fast-check)bun run typecheck→ cleanbun run test(parallel 8-shard + 22 serial files) → all green