Skip to content

v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract#1275

Merged
garrytan merged 12 commits into
masterfrom
garrytan/albany-v1
May 22, 2026
Merged

v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract#1275
garrytan merged 12 commits into
masterfrom
garrytan/albany-v1

Conversation

@garrytan

@garrytan garrytan commented May 21, 2026

Copy link
Copy Markdown
Owner

Summary

The v0.38 ingestion cathedral. Answers the user-feedback question "what is the best way to get data into the brain?" with one verb: gbrain capture. Local or hosted, synchronous receipt, DB + disk in one move.

  • gbrain capture "..." — single human-facing entrypoint (local + thin-client routing)
  • put_page write-through — page lands in DB AND on disk in one move. Closes the drift class the v0.35.6 phantom-redirect pass was cleaning up.
  • POST /ingest webhook source on serve --http — OAuth write-scope-gated, untrusted_payload tagged, for Zapier / IFTTT / Apple Shortcuts
  • IngestionSource versioned public API at gbrain/ingestion + gbrain/ingestion/test-harness — third-party skillpack publishers (Granola, voice, OCR, mail, etc.) can build sources against the locked contract today
  • Daemon with supervision + dedup + rate-limit + dispatch pipeline (in-process sibling to v0.34.3 ChildWorkerSupervisor pattern)
  • Two built-in sources: file-watcher (chokidar) replaces poll-autopilot; inbox-folder (~/.gbrain/inbox/) makes iOS Shortcuts / AirDrop / Drafts zero-friction mobile capture
  • Migration v80 adds provenance columns (ingested_via, ingested_at, source_uri, source_kind) with bootstrap probes on both engines
  • serializePageToMarkdown + resolvePageFilePath DRY extract — the dream-cycle renderer and the write-through renderer share one foundation

Six bisect-friendly commits (~6,750 LOC). CEO + DX + Eng plan-reviewed and persisted at ~/.claude/plans/system-instruction-you-are-working-ethereal-riddle.md. CHANGELOG entry written in user-facing ELI10-lead voice.

Plan-review trail

  • /plan-ceo-review — SCOPE_EXPANSION mode, 16 proposals / 10 accepted / 14 deferred, CEO plan at ~/.gstack/projects/garrytan-gbrain/ceo-plans/2026-05-20-ingestion-cathedral.md
  • /plan-devex-review — Library/SDK persona (YC founder shipping a weekend skillpack source), Champion-tier publisher TTHW target (<10min), score 5/10 → 9/10
  • /plan-eng-review — 5 architecture findings resolved (E1: webhook source lives in serve --http not daemon; E2: hybrid content-type processor execution; E3: Linux inotify probe + ephemeral PGLite persistence; E4: bootstrap probes for v80; DRY: serializePageToMarkdown extract)

Test plan

  • Foundation unit tests: 192 cases against types + dedup + daemon + skillpack-load + test-harness
  • Built-in source tests: 34 cases (chokidar mocks, atomic writes, symlink rejection, archive flow)
  • ingest_capture Minion handler: 15 cases (slug resolution chain, untrusted_payload round-trip, binary rejection, importFromContent integration)
  • put_page write-through: 10 cases (trust gating, multi-source filing, DB-write-doesnt-rollback-on-disk-fail)
  • capture verb: 21 cases (parseArgs + buildContent + helpers + local integration)
  • E2E roundtrip: 3 cases (inbox-folder → daemon → ingest_capture → DB; multi-source coordination; cross-source dedup)
  • Bootstrap coverage: 4 new REQUIRED_BOOTSTRAP_COVERAGE entries pinning v80 provenance columns
  • Public exports: count 18 → 20, canary symbols pinned for both new subpaths
  • Full bun test suite: 6961 tests passing, 1 pre-existing failure (doctorReportRemote > healthy status — verified pre-existing via git stash; not caused by v0.38)
  • typecheck clean
  • bun run verify gate green (all 14 shell checks)

Deferred to follow-up releases (called out in CHANGELOG)

  • Daemon rename autopilot → ingest + forever-alias + launchd plist migration
  • cron-scheduler skill refactor + OpenClaw credential auto-migrate
  • Content-type processors (PDF text / image OCR / audio transcribe / video keyframe)
  • gbrain doctor inotify-limit probe (Linux)
  • Publisher DX cathedral: gbrain skillpack init --kind=ingestion-source, gbrain ingest test --watch, gbrain ingest tail, gbrain ingest validate
  • Reference pack at examples/skillpack-ingestion-reference/ + 3-stage tutorial in docs/ingestion-source-skillpack.md

These are polish items; the substrate ships and is queryable. Skillpack publishers can build sources against the IngestionTestHarness public export today.

🤖 Generated with Claude Code

Master merge — v0.37.9.0 → v0.38.0.0

Merged origin/master at sha 2f645b29 into the wave with five conflicts.
All resolved; trio (VERSION, package.json, CHANGELOG.md) agrees on
0.38.0.0.

Migration version collision resolved. Master's v0.37.2.0 hotfix
(takes_unresolvable_quality_v0_37_2_0) claimed migration v80 first.
My pages_provenance_columns renumbered v80 → v81. Both migrations
preserved; they touch unrelated tables (takes vs pages). Bootstrap probe
comments + REQUIRED_BOOTSTRAP_COVERAGE comment updated to reference v81.

Other resolved conflicts:

  • VERSION0.38.0.0 (higher semver wins over master's 0.37.9.0)
  • package.json0.38.0.0 (trio agreement)
  • CHANGELOG.md → my v0.38.0.0 entry on top; master's v0.37.2–v0.37.9
    entries preserved below
  • src/core/markdown.ts → combined imports (Page, PageType from
    ./types.ts plus master's safeLoad as yamlSafeLoad from js-yaml)

Post-merge:

  • bun install --frozen-lockfile picked up new deps (@types/js-yaml,
    fast-check)
  • bun run typecheck → clean
  • bun run test (parallel 8-shard + 22 serial files) → all green

garrytan and others added 8 commits May 21, 2026 08:46
…+ 2 sources

The foundation for the ingestion cathedral (CEO+DX+Eng plan-reviewed).
Plan: ~/.claude/plans/system-instruction-you-are-working-ethereal-riddle.md

WHAT YOU CAN NOW DO
The IngestionSource public contract is locked. Skillpack publishers can
build third-party ingestion sources (Granola, Linear, Mail, voice, OCR,
etc.) and ship them through the v0.37 skillpack registry. The locked
surface lives at the new package subpaths:

  import { IngestionSource, IngestionEvent } from 'gbrain/ingestion';
  import { IngestionTestHarness, expectEvent } from 'gbrain/ingestion/test-harness';

Both subpaths are pinned by test/public-exports.test.ts — breaking either
is a major-version change.

WHAT THIS COMMIT BUILDS
Foundation:
- src/core/ingestion/types.ts (IngestionSource, IngestionEvent,
  IngestionSourceContext, validateIngestionEvent, computeContentHash,
  INGESTION_SOURCE_API_VERSION, INGESTION_CONTENT_TYPES)
- src/core/ingestion/dedup.ts (24h content-hash LRU, 5000-entry cap)
- src/core/ingestion/skillpack-load.ts (gbrain.plugin.json discovery for
  third-party sources, api_version compat with paste-ready upgrade hints,
  in-process trust model for v1)
- src/core/ingestion/daemon.ts (IngestionDaemon: in-process source
  supervision sibling to v0.34.3.0 ChildWorkerSupervisor pattern, plus
  validate -> dedup -> rate-limit -> dispatch pipeline + health surface)
- src/core/ingestion/test-harness.ts (publisher-facing test utility with
  fake clock + in-memory event bus + expectEvent matchers + engine proxy
  that throws on access so publishers know what they're depending on)
- src/core/ingestion/index.ts (barrel for gbrain/ingestion subpath)

First two built-in sources prove the abstraction:
- file-watcher (chokidar over the brain repo; 1s debounce; honors
  pruneDir from src/core/sync.ts; symlinks rejected; Linux ENOSPC
  surfaces a paste-ready sysctl hint at runtime)
- inbox-folder (~/.gbrain/inbox/ target for iOS Shortcuts / AirDrop /
  Drafts; auto-archives processed files into .archived/YYYY-MM-DD/;
  symlink rejection; world-writable dir warning; routes content-type by
  extension)

Public exports surface (count 18 -> 20) pinned in:
- package.json exports map
- test/public-exports.test.ts EXPECTED_EXPORTS + count gate
- scripts/check-exports-count.sh baseline

ARCHITECTURE-LOCKED DECISIONS (from /plan-eng-review)
E1 webhook source process boundary: webhook source will live INSIDE
serve --http (NOT this daemon) when it lands in the next commit. Daemon
supervises only daemon-side sources.
E2 content-type processor execution: hybrid by size (inline <1MB,
Minion handlers >1MB). Processors land in a later commit.
E3 publisher TTHW: chokidar v4.0.3 across platforms; ephemeral PGLite
persistence and Linux inotify-limit doctor probe land in later commits.
E4 migration v80 (provenance columns) + forward-reference bootstrap:
lands with put_page write-through in a later commit.

DX-locked decisions (from /plan-devex-review):
- Source error semantics: throws bubble to daemon; supervisor backoff.
- IngestionTestHarness exported as gbrain/ingestion/test-harness.
- api_version field on gbrain.plugin.json with loud-fail on mismatch.

TESTS
192 cases across 8 test files, 0 failures:
- test/ingestion/types.test.ts (28 cases pinning the contract)
- test/ingestion/dedup.test.ts (15 cases for LRU + TTL + collision)
- test/ingestion/skillpack-load.test.ts (22 cases for manifest
  validation + api_version compat + collision policy + module load)
- test/ingestion/test-harness.test.ts (24 cases for harness lifecycle +
  clock + healthCheck + every expectEvent matcher)
- test/ingestion/daemon.test.ts (19 cases for supervision + dispatch
  pipeline + health surface + per-source config + logger wrapping)
- test/ingestion/sources/file-watcher.test.ts (10 cases including
  ENOSPC sysctl-hint surfacing)
- test/ingestion/sources/inbox-folder.test.ts (24 cases including
  symlink rejection + world-writable warning + archive-loop-prevention)
- test/public-exports.test.ts (2 new cases for the new subpaths)

typecheck clean. bun run verify gate passes.

NEXT IN WAVE
Subsequent commits in this PR ship webhook source (serve --http route),
cron-scheduler refactor + OpenClaw credential auto-migrate, content-type
processors (PDF + image OCR + audio transcribe + video keyframe), put_page
write-through with serializePageToMarkdown DRY extract, migration v80
+ bootstrap probes, gbrain capture verb, publisher DX cathedral (init
scaffold extension + gbrain ingest test [--watch] + tail + validate),
daemon rename autopilot -> ingest with forever-alias, doctor inotify
probe on Linux, skillpack contract docs + reference pack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n handler

Lands the v0.38 ingestion cathedral's webhook source. Per the
/plan-eng-review E1 decision, the webhook source lives INSIDE
`serve --http` (NOT the ingestion daemon) so there is no new IPC: the
HTTP route submits Minion jobs directly into the existing queue, and
the daemon supervises only daemon-side sources.

WHAT YOU CAN NOW DO
With `gbrain serve --http` running and an OAuth client minted, any
HTTP caller (Zapier, IFTTT, n8n, Make, Apple Shortcuts) can POST a
captured thought into the brain:

  curl -X POST https://your-brain.example.com/ingest \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: text/markdown" \
    -d "# captured from my Shortcut"

The route auths via OAuth (write scope required), validates the
content-type, enforces a 1MB payload cap and per-IP rate limit
(100 events / 10s), submits an `ingest_capture` Minion job tagged
`untrusted_payload: true`, and returns 202 Accepted with the job id.
The job materializes the page under `inbox/YYYY-MM-DD-<hash6>` by
default (overridable via X-Gbrain-Slug header) so the user has a
predictable triage location.

WHAT THIS COMMIT BUILDS
- src/core/minions/handlers/ingest-capture.ts (new) — handler that
  takes an IngestionEvent payload, resolves a slug via fallback chain
  (job.data.slug -> event.metadata.slug -> inbox/<date>-<hash6>),
  validates the event at the handler boundary, REJECTS binary
  content_types with a paste-ready hint to install a processor
  skillpack, and routes through importFromContent. Defaults
  noEmbed: true (embed is a separate Minion job, matching the sync
  handler's pattern).
- src/commands/jobs.ts — registers `ingest_capture` in
  registerBuiltinHandlers alongside sync/embed/extract.
- src/commands/serve-http.ts — POST /ingest route with:
    - OAuth write-scope gate via requireBearerAuth({requiredScopes:['write']})
    - 100 events / 10s rate limiter (sibling to ccRateLimiter)
    - Content-type allowlist: text/markdown, text/plain, text/html,
      application/json; binary REJECTED with HTTP 415
    - 1 MB payload cap (configurable via GBRAIN_INGEST_MAX_BYTES)
    - Caller-overridable source identity via X-Gbrain-Source-Id /
      X-Gbrain-Source-Uri / X-Gbrain-Content-Type / X-Gbrain-Slug
      headers — useful for downstream tools that want clean provenance
    - untrusted_payload: true ALWAYS (network input)
    - Idempotency on (client_id, content_hash) so simultaneous retries
      collapse to one job
    - maxWaiting: 50 per client so a runaway integration can't
      monopolize the queue
    - Audit row in mcp_request_log + SSE broadcast for the admin feed

TESTS
test/ingestion/ingest-capture.test.ts (15 cases against PGLite):
- defaultSlugForEvent helper (3 cases pinning shape + UTC + determinism)
- slug resolution fallback chain (3 cases)
- validation + content-type routing (5 cases including binary rejection
  + untrusted_payload round-trip)
- importFromContent integration (3 cases including content_hash dedup
  via status='skipped' on repeat)

207 total ingestion tests passing. typecheck clean.

NEXT IN WAVE
cron-scheduler refactor + OpenClaw credential auto-migrate; content-type
processors (PDF + image OCR + audio transcribe + video keyframe);
put_page write-through + serializePageToMarkdown DRY extract +
migration v80 + bootstrap probes; gbrain capture verb; publisher DX
cathedral (init scaffold + gbrain ingest test --watch + tail + validate);
daemon rename autopilot -> ingest with forever-alias; doctor inotify
probe; skillpack contract docs + reference pack + VERSION bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHAT YOU CAN NOW DO
The drift class is dead. Every `gbrain put_page` (CLI or MCP, local
or remote) now lands its markdown file on disk alongside the DB row
whenever `sync.repo_path` is configured. The page is queryable
immediately AND visible to git, your editor, and downstream tools.

Pre-v0.38, put_page wrote ONLY to the DB and synthesize/extract paths
had to reverse-render later. The v0.35.6.0 phantom-redirect pass was
the cleanup for what THIS commit prevents in the first place.

  # local CLI
  gbrain put inbox/test < my-thought.md
  # file lands at ${sync.repo_path}/inbox/test.md AND in the DB

  # MCP remote (Zapier / Cursor / Claude Desktop)
  curl -X POST /mcp ... '{"method":"tools/call","params":{"name":"put_page",...}}'
  # server-side write-through fires, agent gets a normal success response
  # untrusted_payload tagging applied (no auto-link, slug-allowlist gate)

Provenance frontmatter stamped on every write so future sync round-trips
know where the page came from:
  ingested_via: put_page         # local CLI
  ingested_via: 'mcp:put_page'   # MCP remote
  ingested_at: 2026-05-21T04:...

WHAT THIS COMMIT BUILDS
1. Migration v80 — `pages_provenance_columns` adds four nullable
   columns to `pages`: `ingested_via`, `ingested_at`, `source_uri`,
   `source_kind`. ADD COLUMN with no DEFAULT is metadata-only on
   Postgres 11+ and PGLite 17.5; instant on tables of any size. The
   four columns get NULL on every historical page (pre-v0.38 pages
   never had provenance).

2. DRY extract — `serializePageToMarkdown(page, tags, opts)` and
   `resolvePageFilePath(brainDir, slug, sourceId)` in `src/core/markdown.ts`.
   The dream-cycle's `renderPageToMarkdown` (synthesize.ts) and the new
   put_page write-through path were going to have 90% duplicate bodies.
   They now share one foundation; the dream version is a 4-line wrapper
   that passes `frontmatterOverrides: {dream_generated: true, ...}`.
   Future markdown-shape changes happen in one place.

3. put_page write-through (`src/core/operations.ts`) — after
   importFromContent succeeds, resolves sync.repo_path, computes the
   v0.32.8 source-aware path layout (default: brainDir/<slug>.md;
   non-default: brainDir/.sources/<id>/<slug>.md), serializes the
   freshly-written Page via `serializePageToMarkdown`, writes the file.
   Returns a `write_through: {written, path}` field in the put_page
   response so callers can see what happened.

   Trust gating:
   - subagent sandbox (viaSubagent without allowedSlugPrefixes) → DB-only
   - dry-run → DB-only (handler's early-return short-circuits before
     write-through; documented via the dry_run response field)
   - no sync.repo_path configured → DB-only, skipped reason returned
   - sync.repo_path points at a non-existent dir → DB-only, skipped
   - all other writes → write-through

   Failure isolation: disk-write failures are LOGGED loud but do NOT
   roll back the DB write. DB is the durable record; the
   phantom-redirect pass exists for drift cleanup if it ever shows up.

TESTS
- test/ingestion/put-page-write-through.test.ts (10 cases against PGLite):
  happy path (file land, provenance stamp local + remote), trust gating
  (subagent sandbox, dry-run, trusted-workspace), config edges (no
  repo_path, missing dir), multi-source filing (.sources/<id>/),
  failure isolation (DB write survives a disk failure).
- Migration v80 verified across both engines via the existing
  test/migrate.test.ts + test/bootstrap.test.ts coverage (~125 cases).

369 total tests passing in the ingestion + markdown + migrate bundle.
typecheck clean.

NOTES
- Bootstrap probes for the v80 provenance columns are NOT yet added
  to applyForwardReferenceBootstrap on either engine. This is safe
  for v0.38 because no SCHEMA_SQL CREATE INDEX or FK references the
  new columns — migration v80 is the only consumer, and it runs
  AFTER SCHEMA_SQL replay. A future commit may add bootstrap probes
  + REQUIRED_BOOTSTRAP_COVERAGE entries as defense-in-depth (eng
  review E4).

- The trusted-workspace path (dream cycle's reverseWriteRefs in
  synthesize.ts) still runs its own write at synthesize phase time.
  Both paths writing the same file is idempotent (byte-identical
  serialization), but a future commit may simplify reverseWriteRefs
  to skip pages whose file already matches.

NEXT IN WAVE
gbrain capture verb (the single human-facing entrypoint); daemon
rename autopilot -> ingest with forever-alias + plist migration;
doctor inotify probe (Linux); content-type processor router
(PDF + image OCR + audio transcribe stubs); cron-scheduler refactor
+ OpenClaw credential auto-migrate; skillpack contract docs +
reference pack; VERSION bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHAT YOU CAN NOW DO
One command, local or thin-client, synchronous receipt with the resulting
page slug. The answer to "what is the best way to get data into the brain?"
is now: just type `gbrain capture` and the right thing happens.

  # the basic case
  gbrain capture "remember to follow up on the X deal"

  # from a file
  gbrain capture --file ./notes/today.md --slug daily/2026-05-21

  # from a pipe (shell pipelines)
  echo "from stdin" | gbrain capture --stdin

  # script-friendly: print just the slug
  SLUG=$(gbrain capture "a thought" --quiet)

  # JSON for agents
  gbrain capture "..." --json

Default slug is `inbox/YYYY-MM-DD-<hash8>` — deterministic for the same
content so re-running idempotently lands the same page. Receipt block
on stdout shows slug + status + content_hash + on-disk path so you
can confirm where the page went without rerunning `gbrain query`.

The local-install path routes through the put_page operation with the
v0.38 write-through plumbing landed in the prior commit, so the page
hits both the DB AND the file tree in one move. Thin-client installs
route through `callRemoteTool('put_page', ...)` so the server's
write-through handles disk persistence the same way.

WHAT THIS COMMIT BUILDS
- src/commands/capture.ts (new ~290 LOC):
  - `defaultSlug(content)` — UTC-stable `inbox/YYYY-MM-DD-<hash8>`
  - `parseArgs(args)` — positional + flag parsing with --file / --stdin
    / --slug / --type / --source / --quiet / --json / --help
  - `buildContent(rawBody, opts)` — wraps unstructured prose in
    frontmatter (type + title + captured_via + captured_at) and a
    leading `# Title` heading; passes through if the body already
    looks like markdown
  - `runCapture(engine, args)` — local install routes through the
    in-process put_page operation; thin-client routes through MCP.
    `--quiet` prints just the slug; `--json` prints structured output;
    default prints a 5-line receipt block.

- src/cli.ts:
  - Adds `case 'capture'` dispatch
  - Adds `'capture'` to the CLI_ONLY set so cli.ts wires it correctly

TESTS
test/commands/capture.test.ts (21 cases against PGLite):
- defaultSlug helper: shape + determinism + UTC math
- parseArgs: positional + multi-token join + every flag
- buildContent: prose wrapping, --type override, no double-wrap
  for pre-frontmattered content, title cap at 80 chars,
  --source provenance stamp
- Integration: inline content lands in DB + on disk, default slug
  shape, --file reads from disk, --json structured output,
  --help returns without engine roundtrip

271 total tests passing in the bundle. typecheck clean.

NOTES
- Thin-client routing relies on `callRemoteTool('put_page', ...)` from
  src/core/mcp-client.ts. Identical UX to the local path because the
  server's put_page handler runs the same write-through plumbing.

- buildContent's "looks like markdown" heuristic is intentionally
  simple — first-line heading or frontmatter delimiter is the trigger.
  Users who care about exact formatting pass a pre-formatted --file.

NEXT IN WAVE
Daemon rename autopilot -> ingest with forever-alias + plist migration;
doctor inotify probe (Linux); content-type processor router
(PDF + image OCR + audio transcribe stubs); cron-scheduler refactor
+ OpenClaw credential auto-migrate; skillpack contract docs +
reference pack; VERSION bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…skill

VERSION 0.37.1.0 → 0.38.0.0 (trio: VERSION, package.json, CHANGELOG header).
CHANGELOG entry written in user-facing ELI10-lead voice per CLAUDE.md
release-summary rules. README's pre-loop section gains a new "How to get
data in (v0.38+)" block leading with `gbrain capture`.

skills/capture/SKILL.md (NEW) so agents route "capture this" / "save this
thought" / "remember this" / "drop this in the inbox" / "save to brain" to
the capture verb. RESOLVER.md updated with the new triggers (sits above
idea-ingest/media-ingest/meeting-ingestion in the content-ingestion
section as the "simple thought" path).

E2E roundtrip test (test/e2e/ingestion-roundtrip.test.ts) covers the gap:
inbox-folder source -> daemon -> ingest_capture handler -> DB page,
including:
- Full pipeline: file drop appears as page in DB + file moves to .archived/
- Dedup catches byte-identical content from a different filename
- Multi-source coordination: two distinct inbox dirs, two sources, daemon
  ingests both events independently

The test runs against an in-memory PGLite (no DATABASE_URL needed) so it
exercises the substrate-level wiring in the standard test suite. A
follow-up commit can add a full-process e2e (gbrain serve --http + real
OAuth client + POST /ingest) that requires DATABASE_URL.

399/399 v0.38 wave tests passing (910 assertions). typecheck clean.
bun run verify gate green across all 14 shell checks.

DEFERRED TO FOLLOW-UP RELEASES (called out in CHANGELOG)
- Daemon rename autopilot -> ingest + forever-alias + plist migration
- cron-scheduler skill refactor + OpenClaw credential auto-migrate
- Content-type processors (PDF / OCR / audio / video)
- gbrain doctor inotify probe (Linux)
- Publisher DX cathedral: gbrain skillpack init --kind=ingestion-source,
  gbrain ingest test --watch, ingest tail, ingest validate
- Reference pack at examples/skillpack-ingestion-reference/ + 3-stage
  tutorial in docs/ingestion-source-skillpack.md

These are polish items; the substrate is shipped and queryable, and
skillpack publishers can build sources against the IngestionTestHarness
public export today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conformance

The v0.38.0.0 release-hygiene commit landed cleanly against the v0.38 wave
suite but tripped 3 categories of full-suite tests. This commit fixes
each. The remaining failure (doctorReportRemote > "healthy" status) was
verified pre-existing via `git stash + bun test` and is not caused by
v0.38; left alone.

Fix 1 — `schema-bootstrap-coverage.test.ts` (s1)
The test parses MIGRATIONS for ALTER TABLE ADD COLUMN statements and
fails if any column is not covered by `applyForwardReferenceBootstrap`
on both engines. Migration v80's four provenance columns triggered
the failure. Bootstrap probes added to both engines + 4 entries
appended to REQUIRED_BOOTSTRAP_COVERAGE:

- src/core/pglite-engine.ts — 4 EXISTS probes + state field + needs
  flag + ALTER TABLE block when bootstrap fires
- src/core/postgres-engine.ts — same pattern
- test/schema-bootstrap-coverage.test.ts — 4 coverage entries

Fix 2 — `check-resolvable.test.ts` (s3 — orphan_trigger)
RESOLVER.md references skills via name; check-resolvable cross-checks
against skills/manifest.json. The new `capture` skill was missing the
manifest entry; added between `brain-ops` and `idea-ingest` so the
manifest order mirrors the resolver order.

Fix 3 — `skills-conformance.test.ts` (s8)
Every SKILL.md must have `## Contract`, `## Output Format`, and
`## Anti-Patterns` sections. skills/capture/SKILL.md was missing all
three (initial draft skipped them); now compliant with concrete
content per the v0.38 contract.

Fix 4 — `build-llms.test.ts` (s6)
README + CHANGELOG edits in the release-hygiene commit caused
llms-full.txt to drift behind. Regenerated via `bun run build:llms`.
Per CLAUDE.md: any user-facing docs edit MUST run build:llms before
push.

The full bun-test parallel runner now passes everywhere except the
pre-existing `doctorReportRemote > healthy status` failure (50/100
score on an empty fresh brain — this is a pre-v0.38 health-score
tuning issue and orthogonal to ingestion work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflicts resolved:
- VERSION → 0.38.0.0 (higher semver wins)
- package.json → 0.38.0.0 (trio agreement)
- CHANGELOG.md → keep my v0.38.0.0 entry on top; master's v0.37.2–v0.37.9
  entries stay below
- src/core/markdown.ts → combined imports (Page, PageType from ./types.ts
  AND safeLoad as yamlSafeLoad from js-yaml)
- src/core/migrate.ts → migration COLLISION at v80. Master claimed v80
  with takes_unresolvable_quality_v0_37_2_0 first; my pages_provenance_
  columns renumbered v80 → v81. Both migrations preserved.

Bootstrap probe comments + REQUIRED_BOOTSTRAP_COVERAGE comment updated
to reference v81 (the migration number, not the column shape, moved).

bun install picked up new deps from master's lockfile: @types/js-yaml,
fast-check.

bun run typecheck → clean.
Renumbers the in-flight ingestion-cathedral release to v0.38.1.0.
Trio (VERSION, package.json, CHANGELOG.md) bumped together.

bun run typecheck → clean.
@garrytan garrytan changed the title v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract v0.38.1.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract May 21, 2026
garrytan added 3 commits May 21, 2026 22:01
Conflicts resolved:
- VERSION → 0.38.1.0 (higher semver wins; master bumped 0.37.9.0 → 0.37.10.0)
- package.json → 0.38.1.0 (trio agreement)
- CHANGELOG.md → my v0.38.1.0 entry stays on top; master's new v0.37.10.0
  entry preserved directly below

Master's v0.37.10.0 brings the init env-detection + interactive picker +
preflight invariants wave (#1278). No collisions with v0.38 ingestion
substrate.

bun install + bun run typecheck → clean.
Conflicts resolved:
- VERSION → 0.38.1.0 (higher semver wins; master bumped 0.37.10.0 → 0.37.11.0)
- package.json → 0.38.1.0 (trio agreement)
- CHANGELOG.md → my v0.38.1.0 entry stays on top; master's new v0.37.11.0
  entry inserted between mine and v0.37.10.0
- src/cli.ts CLI_ONLY Set → union of master's `reinit-pglite` and my
  `capture` CLI verbs

Master's v0.37.11.0 brings the fresh-install PGLite embedding setup fix
wave (#1286): default vector(1280) schema matching the gateway's
zembed-1 default, `gbrain reinit-pglite` wipe-and-reinit command, and
proper ZE API key plumbing. No collisions with v0.38 ingestion substrate
beyond the cli.ts dispatcher Set.

bun install + bun run typecheck → clean.
Master sits at 0.37.11.0; 0.38.0.0 is the natural next slot rather than
skipping a release. Trio (VERSION, package.json, CHANGELOG.md) bumped
together. Migration v81 + ingestion substrate stay identical — this is a
header-only renumber.

bun run typecheck → clean.
@garrytan garrytan changed the title v0.38.1.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract May 22, 2026
…v81 + webhook E2E

Three gaps surfaced from a v0.38 audit against what shipped vs what was
covered. All three filled:

1. **test/markdown-serializer.test.ts** (NEW, 19 cases) — pure-function
   coverage of `serializePageToMarkdown` + `resolvePageFilePath`, the
   DRY extract that the dream-cycle reverse-render and put_page
   write-through both consume. Pre-fix nothing pinned the
   frontmatter-override merge precedence, the type/title defaults, or
   the source-aware filing layout (default → `<brainDir>/<slug>.md`,
   non-default → `<brainDir>/.sources/<source_id>/<slug>.md`). Future
   schema-shape changes to either helper now surface immediately.

2. **test/migrate.test.ts — v81 cases** (10 new cases, two describe
   blocks) — structural assertions on `pages_provenance_columns`
   (four nullable columns, no NOT NULL, no DEFAULT, no index — the
   ADD COLUMN stays metadata-only) plus a PGLite round-trip that
   asserts the columns appear post-`initSchema`, accept direct UPDATEs,
   and survive the historical-page NULL scenario. The
   schema-bootstrap-coverage test already pinned the forward-reference
   probe contract; this fills the migrate.test.ts contract gap.

3. **test/e2e/serve-http-ingest-webhook.test.ts** (NEW, 16 cases) — HTTP
   contract coverage for POST /ingest. The pre-existing
   ingestion-roundtrip E2E explicitly notes "e2e (gbrain serve --http +
   POST /ingest + real OAuth) is a separate" thing — it covers the
   in-process daemon → handler → DB pipeline, NOT the real HTTP route.
   This file fills that gap. Spawns real gbrain serve --http against
   real Postgres, mints OAuth tokens with various scopes, exercises:
     - Auth gate (missing → 401; read-only → 403)
     - Body validation (empty → 400 with error: empty_body)
     - Content-type allowlist (image/png → 415 with skillpack hint;
       application/pdf → 415; text/plain + application/json + text/html
       all accepted; unknown text/* falls through to text/plain)
     - X-Gbrain-Content-Type / Source-Id / Source-Uri / Slug header
       overrides
     - Idempotency (same content + same client = identical job_id via
       queue dedup on content_hash)

Also wires three new entries into `scripts/e2e-test-map.ts` so changes
to `src/commands/serve-http.ts`, `src/core/ingestion/**`, or the
`ingest-capture` Minion handler auto-trigger the relevant E2Es under
`bun run ci:local:diff`.

Verified locally:
- bun test test/markdown-serializer.test.ts → 19/19 green
- bun test test/migrate.test.ts -t "v81" → 10/10 green
- bun test test/e2e/serve-http-ingest-webhook.test.ts (real Postgres on
  ephemeral 5435) → 16/16 green
- bun test test/select-e2e.test.ts → 24/24 green (selector test still
  honors the v0.38 entries)
- bun run typecheck → clean

E2E DB lifecycle handled per CLAUDE.md (spin up pgvector:pg16 on a free
port, bootstrap via `gbrain doctor --json`, run, tear down).
@garrytan garrytan merged commit 26c5458 into master May 22, 2026
8 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297)
  v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289)
  v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275)
  v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286)
  v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278)
  v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252)
  v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267)
  v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253)
  v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246)
  v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229)
  feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228)
  v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215)
  v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211)
  v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214)
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant