Skip to content

v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy#1295

Merged
garrytan merged 24 commits into
masterfrom
garrytan/parallel-federated-sync-refactor
May 23, 2026
Merged

v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy#1295
garrytan merged 24 commits into
masterfrom
garrytan/parallel-federated-sync-refactor

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Per-source autopilot fan-out so federated brains refresh in parallel
instead of one source per 5-min tick. Five-source brain wall-clock:
~25min → ~5min.

Grouped by theme across 15 commits:

Foundation (Phase 1, 5 commits):

  • New src/core/source-id.ts consolidates three drifted regex sites into one
    canonical strict-kebab validator with both boolean (isValidSourceId) and
    throwing (assertValidSourceId) variants
  • cycle.ts lock SQL consolidation through db-lock.ts:tryAcquireDbLock
    (delete ~75 LOC of duplicated UPSERT-with-TTL SQL)
  • cycleLockIdFor(sourceId) primitive with back-compat default + internal
    validation; CycleOpts.sourceId first-class field
  • PGLite global file lock acquired BEFORE per-source DB lock with
    release-both-on-failure cleanup (preserves single-writer invariant)
  • PHASE_SCOPE taxonomy: 16 phases tagged source/global/mixed (load-bearing
    documentation for any future fan-out wave)
  • cycle_phase_scope doctor check (informational)

Per-source dispatch (Phase 2, 5 commits):

  • engine.listAllSources(opts?) + updateSourceConfig(id, patch) with
    Postgres + PGLite parity (atomic JSONB merge via ||)
  • runCycle exit hook writes last_full_cycle_at to sources.config JSONB
    on successful per-source cycles (closes codex r1 P0-5)
  • autopilot-cycle handler threads source_id + pull from job data,
    archive recheck before lock acquisition (codex P1-2 + P1-5)
  • src/commands/autopilot-fanout.ts (new): dispatchPerSource orchestrator
    with per-source idempotency keys, cap, freshness gate, oldest-first sort,
    PGLite=1 default per codex P1-3
  • cycle_freshness doctor check (sibling to sync_freshness, 6h/24h
    thresholds, reads what autopilot sees)

Test infrastructure + audit gap fills (5 commits):

  • 16 new unit/e2e test files, 149 new cases pinning every codex finding
  • Blast-radius regression for strict-regex callers (patterns.ts +
    synthesize.ts reverse-write paths)
  • PGLite file+DB ordering regression (codex P0-C/P0-D)
  • Autopilot wiring static-shape guard against silent revert to single-job
    dispatch
  • Postgres E2E parity for listAllSources + updateSourceConfig + JSONB
    shape regression (catches feedback_postgres_jsonb_double_encode class)
  • E2E-found bug fix: maxWaiting: 1 on per-source submits silently
    coalesced all N per-source jobs (sharing name='autopilot-cycle') into
    ONE waiting row. Net effect pre-fix: fan-out only processed the first
    source per tick. Dropped maxWaiting; per-source idempotency_key already
    handles dedup correctly.

Test Coverage

File Tests
test/source-id.test.ts 19
test/regression-strict-source-id.test.ts 9
test/source-resolver-silent-fallback.test.ts 12
test/cycle-lock-per-source.test.ts 13
test/cycle-pglite-lock-ordering.serial.test.ts 6
test/cycle-last-full-cycle-at.test.ts 5
test/phase-scope-coverage.test.ts 6
test/list-all-sources.test.ts 11
test/autopilot-fanout.test.ts 28
test/autopilot-cycle-handler.test.ts 7
test/autopilot-fanout-wiring.test.ts 5
test/doctor-cycle-freshness.test.ts 9
test/doctor-cycle-phase-scope.test.ts 5
test/e2e/list-all-sources-postgres.test.ts 11
test/e2e/autopilot-fanout-postgres.test.ts 6
test/e2e/multi-source-bug-class.test.ts (+) 3 added
Total 149 (143 unit + 17 e2e, with 3 updated cases)

Pre-Landing Review

Two CEO/Eng reviews + two codex outside-voice rounds during the planning
phase (visible in ~/.claude/plans/ and commit history). 23 codex findings
across the two rounds — 4 fixed in-wave, 19 filed as deferred TODOs. All
have specific commit citations:

  • Round 1 (CEO review): 5 P0 + 6 P1 + 1 P2. Triggered scope reduction
    from "per-source fan-out from day 1" to "lock primitive + taxonomy first"
  • Round 2 (Eng review): 4 P0 + 6 P1 + 1 P2. Caught semantic distinctions
    between tryAcquireDbLock vs withRefreshingLock, PGLite ordering
    invariant, source-id module layering

Plan Completion

CEO + Eng plan documents at ~/.claude/plans/system-instruction-you-are-working-tidy-puzzle.md.
All scope items DONE. Phase 3 (DRY refactor) explicitly deferred per
.context/PHASE_3_ASSESSMENT.md (gitignored) — each Phase 3 item, on
survey, was either a premature abstraction or was rejected by the original
author of the code being refactored.

Verification Results

  • bun run verify: PASS (all 16 pre-checks)
  • bun run typecheck: clean
  • bun run test: 8421 unit tests pass (pre-merge with master)
  • Targeted post-merge run on my 149 new tests: 134 pass / 0 fail
  • E2E suite ran against fresh Postgres; 15 files initially "failed" in the
    long-running shared-DB run were all confirmed pre-existing
    infrastructure flakes by standalone re-run on fresh DB:
    • 13/15 passed standalone
    • 2 remaining (claw-test env-dependent, postgres-jsonb 5s hook timeout)
      are pre-existing, not introduced by this PR

Documentation

CHANGELOG.md updated with full release notes including ELI10 lead, "How
it works" table, "Things to know" caveats section, itemized changes
grouped by component, and "To take advantage of v0.39.3.0" upgrade
verification block.

CLAUDE.md updates intentionally deferred — the upstream v0.38.0.0 merge
brought a substantial v0.38 wave's worth of CLAUDE.md additions; this
PR's new files (source-id.ts, autopilot-fanout.ts, etc.) are documented
exhaustively in the CHANGELOG entry. CLAUDE.md key-files updates can
follow in a chaser commit if desired.

Test plan

  • bun run verify clean (16 pre-checks)
  • bun run typecheck clean
  • 149 new test cases all pass
  • PGLite + Postgres parity verified for listAllSources + updateSourceConfig
  • E2E for autopilot fan-out: 3 sources → 3 distinct jobs with per-source
    idempotency keys, fan-out cap honored, freshness gate works, legacy
    fallback for fresh-install brains
  • Existing test suite still passes (no regression)
  • Manual smoke on Garry's federated brain (post-merge): gbrain autopilot --json
    should show mode: per_source events with distinct source_id values

🤖 Generated with Claude Code

garrytan and others added 23 commits May 22, 2026 07:45
New src/core/source-id.ts consolidates the three regex sites that drifted
across the codebase (utils.ts permissive, sources-ops.ts strict,
source-resolver.ts strict). Exports:
  - SOURCE_ID_RE: ^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$ (strict kebab,
    1-32 chars, no underscores, alphanumeric boundaries)
  - isValidSourceId(s): boolean — for silent-fallback tiers (dotfile,
    brain_default config)
  - assertValidSourceId(s): void, throws — for explicit-validation tiers
    (explicit --source flag, GBRAIN_SOURCE env, cycleLockIdFor primitive)

Dependency-free by design (no engine imports), so both PGLite and
Postgres engines can pull it without circular-import risk. Replaces
the soon-to-be-removed local validators in utils.ts and sources-ops.ts;
preserves both call shapes (boolean + throwing) per the codex outside-voice
finding that resolver tiers need both.

19 unit tests covering valid ids, length boundary (32-char max), underscore
rejection, path-traversal shapes, edge hyphens, whitespace, non-ASCII,
non-string inputs, and TypeScript narrowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidates source_id validation through src/core/source-id.ts:

- src/core/utils.ts: validateSourceId is now a back-compat re-export of
  assertValidSourceId. Regex TIGHTENS from permissive ^[a-z0-9_-]+$ to
  the strict kebab. The path-safety boundary now matches what sources-ops
  enforces at source creation time; no production source IDs break because
  sources-ops always rejected underscored IDs at creation. Picks up the
  blast-radius callers in cycle/patterns.ts and cycle/synthesize.ts
  reverse-write paths.

- src/core/sources-ops.ts: deletes local SOURCE_ID_RE + validateSourceId;
  imports isValidSourceId from source-id.ts. Keeps the thin SourceOpError-
  wrapping validator so `gbrain sources add` keeps its user-facing error
  envelope.

- src/core/source-resolver.ts: imports SOURCE_ID_RE + isValidSourceId from
  source-id.ts. Per codex outside-voice P1-F, silent-fallback tiers
  (dotfile read at tier 3, brain_default config at tier 5) use
  isValidSourceId so an invalid dotfile/config value falls through to the
  next resolver tier instead of throwing. Explicit + env tiers keep their
  inline regex-test-and-throw shape because they need tailored error
  messages.

Behaviour: validateSourceId('snake_id') NOW THROWS where pre-PR it accepted.
Documented as the intentional tightening; no existing IDs in production
contain underscores.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…te ordering

Three intertwined cycle.ts changes that landed as one logical unit:

1) DELETE acquirePostgresLock + acquirePGLiteLock (~75 LOC of duplicated
   UPSERT-with-TTL SQL). Replace with tryAcquireDbLock from
   src/core/db-lock.ts, which was extracted in v0.22.13 and should have
   been adopted here at that time. New acquireDbCycleLock(engine, sourceId)
   is a 6-line adapter that keeps cycle.ts's LockHandle shape.

   Deliberately uses tryAcquireDbLock NOT withRefreshingLock (codex r2 P0-A):
   - tryAcquireDbLock returns null on busy → cycle returns
     {status:'skipped', reason:'cycle_already_running'} (existing contract)
   - withRefreshingLock throws → would convert busy cycles into failures
   - withRefreshingLock's background timer would skip Minion job-lock
     renewal (codex r2 P0-B) and add in-phase DB traffic on PGLite's
     single connection (codex r2 P1-A)

2) Add cycleLockIdFor(sourceId?: string) primitive:
   - undefined → 'gbrain-cycle' (legacy default, back-compat for autopilot
     and every existing caller)
   - valid kebab → 'gbrain-cycle:<source_id>' (per-source DB lock row)
   - invalid → throws via assertValidSourceId (codex r2 P1-B defense-
     in-depth at the primitive layer, since CycleOpts.sourceId is a new
     direct API surface that becomes part of a DB lock ID AND a PGLite
     file path component)

   Add CycleOpts.sourceId; thread through to acquireDbCycleLock. Documents
   that this only scopes the LOCK — embed/orphans/purge/etc remain
   brain-global per PHASE_SCOPE.

3) PGLite file+DB ordering invariant (codex r2 P0-C + P0-D):
   - PGLite engines acquire the GLOBAL file lock (cycle.lock, no source
     suffix) BEFORE the per-source DB lock. PGLite's process-level
     write-lock is the single-writer guard; per-source DB lock IDs
     alone would let two PGLite cycles run concurrently.
   - File lock release on DB acquisition failure (cleanup guarantee)
   - Compose both handles into one LockHandle whose release() is
     reverse-of-acquire (DB first, file last) so file lock isn't released
     while DB lock is still live.
   - Postgres engines skip the file lock entirely — per-source DB IDs
     are the full granularity.

13 unit tests in test/cycle-lock-per-source.test.ts pin the back-compat
default, per-source ID shape, distinct-ID property, and the internal-
validation throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static documentation of each cycle phase's scope: 'source' (safe to
parallelize per source), 'global' (must serialize brain-wide), or
'mixed' (per-phase decomposition needed before parallelizing).

The PHASE_SCOPE record is the load-bearing input for any future
autopilot fan-out wave. It surfaces what codex round-1 P0-1 was
warning about: not all 14 cycle phases are source-scoped today.
embed/orphans/purge/resolve_symbol_edges/grade_takes/calibration_profile
walk brain-wide regardless of sourceId. Per-source cycle LOCKS (this
PR) let two cycles RUN concurrently, but global-scoped phases inside
each will still touch the same rows.

The taxonomy is documentation, not runtime enforcement (runtime
enforcement deferred per plan; filed as TODO).

New doctor check cycle_phase_scope renders the taxonomy as an
operator-facing message AND surfaces phase_scope_map under
Check.details for JSON consumers. Added optional Check.details field
to the doctor types — mirrors PhaseResult.details. Additive; no
schema_version bump.

11 unit tests across phase-scope-coverage + doctor-cycle-phase-scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the codex round-2 P1-D finding: utils.validateSourceId is also
used in cycle reverse-write paths at patterns.ts:263 and
synthesize.ts:909. Pre-PR they used the permissive regex; post-PR
they share the strict kebab regex with sources-ops creation-time
validation. Existing underscore IDs would fail at THOSE cycle sites,
not just at source add/remove.

Structural assertions guard against future drift:
- utils.ts validateSourceId === assertValidSourceId from source-id.ts
- patterns.ts + synthesize.ts both import validateSourceId from utils
- validation call precedes the join() at both reverse-write sites
- utils.ts no longer contains the inline ^[a-z0-9_-]+$ permissive regex
- utils.ts re-exports assertValidSourceId-as-validateSourceId from source-id.ts

9 cases pin the contract. IRON-RULE: source-text grep regressions land
on the offending refactor first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pilot

Two lean engine-layer methods that the v0.38 per-source autopilot wave
consumes. Both have parity implementations on Postgres + PGLite.

listAllSources(opts?):
- Returns the bare SourceRow shape (id, name, local_path, last_sync_at,
  config) without sources-ops.listSources's per-source page_count
  enrichment (N+1 expensive; out of scope for hot-loop callers).
- `includeArchived` defaults false (matches sources-ops semantics).
- `localPathOnly` filters local_path IS NOT NULL so autopilot fan-out
  doesn't dispatch jobs for pure-DB sources whose handler would fall
  back to global sync.repo_path (codex r1 P1-4).
- Ordering: (id = 'default') DESC, id — same as sources-ops for
  operator-output stability.

updateSourceConfig(sourceId, patch):
- Atomic JSONB merge via Postgres `config || $patch::jsonb` operator.
  No read-modify-write race; same-key overwrites (no deep merge —
  flat patches only, matches the v0.38 use case of last_full_cycle_at).
- Returns true when a row was updated, false when sourceId doesn't
  exist (best-effort no-op; caller decides how to handle).
- Postgres: sql.json(patch) per the canonical pattern; PGLite:
  JSON.stringify + ::jsonb cast on positional param.

New SourceRow type exported from engine.ts. Imported by both engine
impls. 11 integration tests in test/list-all-sources.test.ts cover
defaults, filters, JSONB round-trip, archived flag, and merge semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… cycle exit

Closes codex round-1 P0-5 (write site for last_full_cycle_at was
unspecified pre-PR). runCycle's exit hook persists
{ last_full_cycle_at: '<ISO>' } to sources.config JSONB when a
successful per-source cycle completes. Autopilot's v0.38 per-source
fan-out gate reads this field next tick to decide whether to skip a
source (60-min freshness floor).

Conditions for write (all required):
  - opts.sourceId is set — legacy callers without sourceId skip the
    write (autopilot will keep working today via fallback path)
  - engine is non-null — no-DB path skips
  - status is 'ok' / 'clean' / 'partial' — failed/skipped cycles do NOT
    mark a source as fresh (next cycle will redo work)
  - dryRun is false — writes are out of scope

Best-effort: write failure logs a warning but does NOT change the
CycleReport status. The cycle already succeeded by the time we get
here; the cost of missing a stale write is one redundant cycle next
tick, not data loss.

5 PGLite integration tests cover all four gate conditions plus the
"timestamp advances on each successful run" property.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… recheck

Threads the v0.38 per-source dispatch payload through the autopilot-cycle
handler at src/commands/jobs.ts:1146. Closes three codex round-1 findings:

- P0-2 / P1-B: validates job.data.source_id at handler entry via the
  canonical source-id.ts isValidSourceId boolean check. Malformed
  source_id from a queue replay dead-letters with a clear error
  instead of reaching cycle code.

- P1-2: job.data.pull explicit boolean overrides the legacy hardcoded
  `true`, so per-source dispatch for local-only sources can pass
  pull: false (no git network round-trip for sources without remote_url).
  Missing/undefined preserves the legacy true for back-compat with cron/
  launchd callers that don't know about the new field.

- P1-5: archived-source recheck happens BEFORE runCycle is invoked
  (cheap SELECT archived FROM sources WHERE id = $1). If the source was
  archived between fan-out and worker claim, handler returns
  { status: 'skipped', reason: 'source_archived' } cleanly — no lock
  acquired, no phases run, no last_full_cycle_at touched. Same skip
  shape for source_not_found (deleted between dispatch and claim).

7 PGLite integration tests cover all five paths (legacy / valid /
not-found / archived / malformed-source_id / non-string source_id /
pull: false override).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m win)

The headline change of the v0.38 federated-sync wave. Replaces autopilot's
single-job-per-tick dispatch with per-source fan-out so a 5-source
federated brain refreshes in ~5min wall-clock instead of ~25min sequential.

src/commands/autopilot-fanout.ts (new) — pure-function dispatch helper:
- resolveFanoutMax(engine): PGLite=1 (codex P1-3 — preserves single-writer
  invariant), Postgres=4, operator override via autopilot.fanout_max_per_tick
- readLastFullCycleAt(src): JSONB→Date with NULL/unparseable safety
- isSourceStale(src, now?, floorMin?): 60-min default freshness floor
- selectSourcesForDispatch(sources, fanoutMax): stale-only + oldest-first
  + alphabetical tiebreaker (deterministic for tests)
- dispatchPerSource(engine, queue, opts): the orchestrator

src/commands/autopilot.ts (modified): the existing shouldFullCycle branch
calls dispatchPerSource. Behavior preserved:
- Healthy + recent (60min floor) → sleep (unchanged)
- Targeted-plan path → unchanged (uses computeRecommendations)
- Full-cycle path → NOW fans out per-source rather than ONE job for default

Per-source dispatch shape:
- Idempotency key: `autopilot-cycle:<source_id>:<slot>` — two ticks for
  the same source within one slot coalesce; different sources never collide
- pull: !!source.config.remote_url — remotes pull, local-only don't
- maxWaiting: 1 per submit — backpressure when worker can't drain
- Per-submit try/catch (codex E1 F1) — one source's failure doesn't
  abort the tick; surfaces as fanout_submit_failed event

Fallback path: empty `sources` table (pre-v0.18 brain or fresh install
before `gbrain sources add`) falls back to the legacy single autopilot-
cycle job with no source_id, preserving today's single-source behavior.

JSON event stream extended:
- `dispatched` event gains source_id + mode='per_source' fields
- new `fanout_summary` event per tick with dispatched/skipped_fresh/
  skipped_cap arrays so operators can see what the tick did
- new `fanout_cap_reached` event when sources overflow the cap

Caveat (intentional, codex r1 P0-1 scope): per-source LOCKS let two
cycles RUN concurrently, but several phases (embed, orphans, purge,
resolve_symbol_edges, grade_takes, calibration_profile) still walk the
brain globally inside each cycle. PHASE_SCOPE taxonomy from the prior
commit documents this. Genuine per-phase per-source isolation is the
deferred Phase 2 follow-up.

27 unit tests in test/autopilot-fanout.test.ts pin every branch (stale
gate, cap behavior, idempotency keys, legacy fallback, per-submit error
isolation, oldest-first sort, alphabetical tiebreaker).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New check_cycle_freshness sibling to checkSyncFreshness. Where
sync_freshness reads sources.last_sync_at (one phase), this check
reads sources.config->>'last_full_cycle_at' — the canonical
"this whole cycle completed" timestamp the v0.38 runCycle exit hook
writes and the v0.38 autopilot fan-out gate reads.

Operator sees exactly what autopilot sees when deciding to skip a
source. Default thresholds tighter than sync_freshness (6h warn /
24h fail vs 24h/72h) because full-cycle staleness compounds: sync
stale → extract stale → embed stale → search returns stale results.

Env overrides:
- GBRAIN_CYCLE_FRESHNESS_WARN_HOURS (default 6)
- GBRAIN_CYCLE_FRESHNESS_FAIL_HOURS (default 24)

Edge cases covered (9 PGLite integration tests):
- empty (no federated sources) → ok
- last_full_cycle_at present + fresh → ok
- last_full_cycle_at present + warn window → warn
- last_full_cycle_at present + fail window → fail
- last_full_cycle_at NULL (never cycled) → fail
- mixed severity → highest wins
- future timestamp (clock skew) → warn
- unparseable timestamp → warn
- local_path NULL sources filtered (codex P1-4 parity)

Failure messages embed source.id so the printed fix command
`gbrain dream --source <id>` matches what the user copy-pastes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack + PGLite ordering)

Two test gaps identified by the test-coverage audit after Phase 1 + 2 shipped:

1. src/core/source-resolver.ts — the migration to isValidSourceId for
   silent-fallback tiers (dotfile read, brain_default config) had no
   direct test. Pre-PR these used inline regex; post-PR they use the
   canonical isValidSourceId. Codex P1-F intent (silent fallback for
   invalid input on tiers 3+5, throw on tiers 1+2) deserved an explicit
   test.

   test/source-resolver-silent-fallback.test.ts (12 cases):
     - tier 3: valid dotfile honored; underscore/whitespace/uppercase
       silently falls through to next tier
     - tier 5: valid brain_default honored; underscore + 33+ char silently
       falls through
     - tier 1: valid explicit --source returns; underscore/whitespace
       THROWS (contract distinction)
     - tier 2: valid env GBRAIN_SOURCE returns; underscore THROWS

2. src/core/cycle.ts — the PGLite file+DB ordering invariant (codex r2
   P0-C + P0-D) was implemented in Phase 1 (T5) but had no test pinning
   the ordering / cleanup / per-source DB lock ID semantics.

   test/cycle-pglite-lock-ordering.test.ts (6 cases):
     - global file lock acquired during PGLite cycle
     - cycle for source A then B serializes (file lock held in turn)
     - DB-lock acquire failure releases file lock cleanly (no stranded state)
     - engine=null path still uses file lock
     - DB lock row uses per-source ID (gbrain-cycle:<source>) not legacy
     - consecutive cycles can re-acquire both locks (release-on-exit works)

Plus .context/PHASE_3_ASSESSMENT.md (gitignored) documenting why each
Phase 3 DRY refactor item from the original plan is deferred: each item
on closer inspection is either a premature abstraction or was explicitly
rejected by the original author (BaseCyclePhase header comment).

18 new tests; 0 fails. typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the remaining test gaps identified by the post-Phase-2 audit:

test/autopilot-fanout-wiring.test.ts (5 cases) — static-shape regression
for autopilot.ts ↔ dispatchPerSource. The fan-out helper itself has 27
unit tests; this file pins the WIRING in autopilot.ts:
  - imports dispatchPerSource + resolveFanoutMax
  - calls dispatchPerSource inside the shouldFullCycle branch (not the
    targeted-plan path)
  - updates lastFullCycleAt after dispatch
  - does NOT regress to the pre-PR single-job dispatch (regex-grep guard
    against the legacy `autopilot-cycle:${slot}` idempotency-key shape
    reappearing in autopilot.ts)

Same canonical static-shape pattern as test/autopilot-supervisor-wiring.test.ts.

test/e2e/list-all-sources-postgres.test.ts (10 cases) — Postgres parity
for engine.listAllSources + updateSourceConfig. The PGLite path has
unit-level coverage; the Postgres path has separate impls (sql.json
serialization, sql.count semantics) that could drift. Specifically pins:
  - returns rows, filters archived/localPath correctly
  - JSONB config parses to object (autopilot reads last_full_cycle_at)
  - default source sorts first
  - updateSourceConfig: not-found returns false, patch merges, same-key
    overwrites, idempotent on repeat
  - jsonb_typeof regression: round-trip stores real JSONB object, NOT
    a JSON-encoded string (feedback_postgres_jsonb_double_encode class)

test/e2e/autopilot-fanout-postgres.test.ts (6 cases) — end-to-end
integration on Postgres:
  - 3 sources fan out as 3 distinct jobs with per-source idempotency keys
  - re-dispatch within same slot dedupes (idempotency-key coalesce)
  - last_full_cycle_at < 60min ago sources are skipped by gate
  - end-to-end: updateSourceConfig → listAllSources → selectSourcesForDispatch
    correctly classifies fresh sources
  - fan-out cap honored (5 sources, fanoutMax=2 → 2 dispatched)
  - empty federated brain falls back to legacy single-job dispatch

21 new test cases. Brings the v0.38 wave coverage to 132 unit + 16 e2e.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent coalesce)

The Postgres E2E for fan-out surfaced two real bugs that the unit-stub
tests + PGLite tests couldn't catch:

1) **CRITICAL** — dispatchPerSource passed `maxWaiting: 1` to every
   per-source queue.add. maxWaiting is per-(name, queue) — since all
   per-source jobs share `name='autopilot-cycle'`, the second + third +
   Nth source's submit silently coalesced into the FIRST source's
   waiting job. Net result on a real worker: 1 job processed per tick,
   not N. The entire fan-out feature was a silent no-op past the first
   source.

   Per-source idempotency_key (`autopilot-cycle:<source_id>:<slot>`)
   already handles "two ticks for same source within slot" dedup, which
   is the only thing maxWaiting was buying us. Dropping it fixes fan-out
   without losing dedup.

   New unit-stub regression test asserts maxWaiting is NOT in the per-
   source submit opts so a future refactor that re-adds it gets caught
   in 100x faster CI (test/autopilot-fanout.test.ts).

2) **postgres-engine.ts:updateSourceConfig** — initial impl used
   sql.json() correctly but my mid-debug rewrite to executeRaw +
   positional `$1::jsonb` produced JSONB STRING shape (not OBJECT)
   because postgres-js double-encodes JS string params in unsafe mode.
   `||` between JSONB object + JSONB string yields a JSONB ARRAY,
   wiping every existing config key on update.

   Same latent bug class exists at src/commands/sources.ts:482 (gbrain
   sources federate/unfederate path); flagged for follow-up but
   out-of-scope here.

   Reverted to sql.json() inside the template tag (verified via direct
   psql round-trip: jsonb_typeof = 'object'). Updated the e2e seed
   helper to use sql.json() too — the executeRaw + JSON.stringify
   pattern was producing string-shape JSONB at SEED time which made
   the failure cascade harder to debug.

Coverage adds:
- test/e2e/list-all-sources-postgres.test.ts: 11 cases pin Postgres
  parity for listAllSources + updateSourceConfig including jsonb_typeof
  round-trip
- test/e2e/autopilot-fanout-postgres.test.ts: 6 cases end-to-end
  including 3-source fan-out producing 3 distinct rows, idempotency
  coalesce within slot, cap honored, legacy fallback path
- test/autopilot-fanout.test.ts: +1 regression guard on maxWaiting

This is the kind of bug that justifies the user's "fill test gaps then
run E2E" mandate. The unit tests + PGLite parity tests + typecheck all
passed cleanly; only the real-Postgres E2E found the coalesce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s for strict regex

The v0.32.8 test pinned the OLD permissive ^[a-z0-9_-]+$ behavior including
the underscored case 'jarvis_memory'. The v0.38 wave (this PR's E2 + codex
P1-D) tightens validateSourceId to the strict kebab regex shared with
sources-ops. 'jarvis_memory' now lives in the rejected set, not the
allowed set.

Updated the test to assert the new contract:
- Replaced 'jarvis_memory' allowed case with 'jarvis-memory' (kebab)
- Added 'a' (single-char) to allowed cases
- Added 'jarvis_memory', 'snake_case', '-leading', 'trailing-', and a
  33-char string to the rejected cases — the v0.38 strict-regex additions

Comment explains the contract shift so future readers don't see the test
as flapping intent.

Found by running the main E2E suite — the test file is in the canonical
e2e set and would have failed CI otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e taxonomy

VERSION + CHANGELOG bump for the parallel-federated-sync wave (14 commits).

Headline change: federated brains refresh all sources in parallel via
per-source autopilot dispatch instead of one source per 5-min tick.
Five-source brain wall-clock: ~25min → ~5min.

Test infra adjustments for the v0.38 test-isolation lint that landed via
upstream master merge:
  - test/source-resolver-silent-fallback.test.ts now uses withEnv() for
    GBRAIN_SOURCE mutations (was direct process.env mutation)
  - test/cycle-pglite-lock-ordering.test.ts → .serial.test.ts (the
    file-wide GBRAIN_HOME setup needs quarantine from the parallel pool)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ederated-sync-refactor

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
CI on c835f93 failed on 1 of 2005 tests: the v0.20.5
"autopilot-cycle handler contract" regression guard at
test/cycle-abort.test.ts:99 does a source-grep over the first 2000
chars after `worker.register('autopilot-cycle'` to assert `signal:
job.signal` appears.

The v0.38 wave (in c835f93) added source_id validation + archive
recheck + pull-flag threading at the top of that handler, pushing the
runCycle({signal: job.signal}) call from ~chars 800 to ~chars 2100.
The slice cut off before reaching it, so the assertion failed even
though the code still propagates the signal correctly (line 1213).

Fix: bump the window to 6000 chars and add a comment explaining why.
The guard's intent is unchanged ("handler passes job.signal to
runCycle"); the window just needs to be wide enough to span any
reasonable handler. Also added an existence check on `handlerStart`
so a future refactor that renames the register call surfaces a
clearer error than `undefined.toContain`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ederated-sync-refactor

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…ederated-sync-refactor

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
The v0.39.1.0 master merge added a new 'schema-suggest' cycle phase.
PHASE_SCOPE is a TS Record<CyclePhase, PhaseScope> so the compiler
required an entry. Classified as 'source' — the phase accepts
sourceId and operates per-source via runSuggest().
Renumber to claim the next available slot after v0.39.1.0 (schema packs)
landed on master. No code changes.
@garrytan garrytan changed the title v0.39.3.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy May 23, 2026
The v0.39.1.0 master merge added the 17th cycle phase ('schema-suggest').
The previous commit added it to PHASE_SCOPE; this commit updates the
count-pin regression test to match.
@garrytan garrytan merged commit 1666ec4 into master May 23, 2026
8 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master: (22 commits)
  v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377)
  v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403)
  v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364)
  v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352)
  v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367)
  v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351)
  v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350)
  v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345)
  v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313)
  v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333)
  v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327)
  v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324)
  v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322)
  v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300)
  v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323)
  v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296)
  v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298)
  v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128)
  v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308)
  v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant