Skip to content

v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions#1193

Merged
garrytan merged 12 commits into
masterfrom
garrytan/puebla-v3
May 19, 2026
Merged

v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions#1193
garrytan merged 12 commits into
masterfrom
garrytan/puebla-v3

Conversation

@garrytan

@garrytan garrytan commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

Your agent can now drive your brain to 90/100 by itself, on a cron, without you watching.

Brain accumulates rot: stale pages, missing embeddings, broken back-links, queued transcripts. Each fix is a one-line command — but knowing which to run when, and in what order, was on you. This wave hands that loop to the agent.

gbrain doctor --remediation-plan --json
gbrain doctor --remediate --yes --target-score 90 --max-usd 5

Sequential submit with dependency-cascade-on-failure, scoped re-doctor between steps, content-hash idempotency keys (no time-slot), three-state remediation classification (remediable / human_only / blocked), max_reachable_score ceiling refusal, and a cost-budget gate. Autopilot adopts the same logic — small plans get targeted handlers, large plans get the full cycle, healthy brains sleep for 60min.

Commits (10 atomic, bisect-friendly)

Decisions (15 locked during /plan-eng-review)

D1 per-job re-eval, D2→D10 phases acquire shared gbrain-cycle lock, D3 sequential --remediate submit, D4 schema_version stays at 2, D5 depends_on cascade on failure, D6 5 critical decision-pinning tests, D7 scoped runRemediatableChecks() between steps, D9 content-hash idempotency keys, D11 synthesize/patterns/consolidate to PROTECTED_JOB_NAMES, D12 DB-backed checkpoints with fingerprint, D13 three-state remediation, D14 stable remediation_id, D15 features --auto-fix contract preserved.

Codex outside-voice review surfaced 23 findings → 15 resolved by D9–D15, 5 plan-body fixes, 3 expanded scope (cost-budget gate, doctor_run_id GIN index, op_checkpoints GC in purge phase).

Test Coverage

42 new unit tests across test/brain-score-recommendations.test.ts (22) + test/op-checkpoint.test.ts (20) pin:

Final test counts on wave-touched-module suites: 387 pass / 0 fail across 6 files.

Pre-Landing Review

No prior eng review entry for this branch within the dashboard window — review was performed via /plan-eng-review before implementation (see plan file). Architecture decisions locked there. No additional code-review findings on the wave commits.

Scope Drift

Intent: audit long-running ops + use Minions for a lot more (1 + 2 from initial scope question).
Delivered: 11 new handlers, doctor --remediate CLI surface, autopilot targeted-submit, op-checkpoint primitive, 42 unit tests, docs sync.

Clean — scope matches what the user explicitly accepted.

Plan Completion

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md (12 implementation tasks).

Status Task
DONE T1 Schema migration (v75 + v76)
DONE T2 op-checkpoint.ts module
CHANGED T3 Phase extraction — superseded by runCycle({phases:[name]}) wrapper pattern (smaller diff, equivalent correctness)
DONE T4 brain-score-recommendations.ts
DONE T5 11 handlers + PROTECTED + sync noExtract fix
DONE T6 doctor --remediation-plan + --remediate
PARTIAL T7 maybeBackground helper shipped + applied to gbrain embed; 6 other commands deferred to follow-up
DONE T8 Autopilot targeted-submit + cycle purge GC
DONE T9 features.ts contract preserved per D15
DONE T10 8 unit-test files (3 critical written; 5 lower-priority deferred per pragmatic scoping)
PARTIAL T11 E2E tests deferred to follow-up (Postgres harness setup)
DONE T12 Docs + CHANGELOG + build:llms + version bump

10 of 12 DONE; 1 CHANGED (correctness-equivalent superseding); 2 PARTIAL (load-bearing parts shipped, follow-up integrations in CHANGELOG's "For contributors").

Documentation

Docs updated for v0.36.4.0 brain-health-100 wave (commit 1afd0940):

  • README.md — added a "New in v0.36.4.0" callout above the v0.35.7 entry, leading with gbrain doctor --remediate --yes --target-score 90 --max-usd 5 as the headline one-command loop. Covers autopilot's new health-aware tick (sleep on healthy, targeted-submit on small plans, full cycle on large plans + 60-min phase-coupling floor), the eleven new background-job types (three PROTECTED), and the new --background flag on gbrain embed.
  • CLAUDE.md — added 8 Key Files entries (op-checkpoint.ts, brain-score-recommendations.ts, and v0.36.4.0 extensions on doctor.ts / jobs.ts / protected-names.ts / autopilot.ts / cycle.ts / embed.ts / cli-options.ts). Added a new "Key commands added in v0.36.4.0 (brain-health-100 wave)" section right after the v0.7 block.
  • AGENTS.md — added a Common-tasks entry pointing non-Claude agents at gbrain doctor --remediation-plan / --remediate.
  • skills/maintain/SKILL.md — added an "Autonomous path (v0.36.4.0)" Phase at the top of the Phases block. The existing per-dimension manual walk is preserved as the fallback path for cases auto can't cover.
  • llms-full.txt — regenerated via bun run build:llms.

Documentation Debt

  • No tutorial for the autonomous-remediation loop. A "drive your brain to 90 in 5 minutes" walkthrough would fit under docs/guides/. Filed for /document-generate follow-up.
  • Six --background follow-up integrations (extract, lint, backlinks, reindex, integrity, pages) are noted as deferred in the CHANGELOG's "For contributors" section. Re-surface coverage when the follow-up wave ships.

No architecture diagrams were impacted by this wave.

Test plan

  • typecheck clean
  • 387 wave-touched-module tests pass (brain-score-recommendations, op-checkpoint, migrate, schema-bootstrap-coverage, doctor, minions)
  • llms.txt + llms-full.txt regenerated
  • Trio audit clean (VERSION = package.json = CHANGELOG header = 0.36.4.0)
  • /document-release run and committed (1afd094)
  • Run gbrain doctor --remediation-plan against a degraded brain to smoke-test the loop post-merge
  • Run full unit suite + E2E once landed (T11 follow-up)

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md

🤖 Generated with Claude Code

garrytan and others added 12 commits May 18, 2026 18:20
T1 of brain-health-100 wave. Two new migrations underpin autonomous
remediation via Minions:

- v67 op_checkpoints — shared checkpoint table for long-running ops
  (embed, extract, lint, backlinks, reindex, integrity). Pre-fix each
  op had its own file-backed checkpoint or none. PRIMARY KEY (op,
  fingerprint) lets `extract links` and `extract timeline` (or
  `reindex --markdown` vs `--code`) coexist without colliding on
  shared keys.

- v68 minion_jobs_doctor_run_id_idx — partial GIN on
  `minion_jobs.data WHERE data ? 'doctor_run_id'`. Indexes only
  doctor-submitted jobs so audit-trail queries don't sequential-scan
  months of unrelated cron history. PGLite skips via empty sqlFor.

Applied to src/schema.sql + src/core/pglite-schema.ts so both engines
get the table on fresh-install. Bootstrap coverage test +
122-case migrate test both pass.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + folded scope B from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T2 of brain-health-100 wave. Six exports plus per-op fingerprint helpers:

  loadOpCheckpoint(engine, key)     → string[]   (completed keys; [] if none)
  recordCompleted(engine, key, ks)  → void       (UPSERT atomic)
  clearOpCheckpoint(engine, key)    → void       (clean-exit drop)
  resumeFilter(all, completed)      → string[]   (pure; drives batched walks)
  purgeStaleCheckpoints(engine, ttl)→ number     (cycle purge phase consumer)

Fingerprint helpers:
  fingerprint(params)               — sha8 of canonical-JSON
  embedFingerprint(p)               — model+dim+slug+source variation
  extractFingerprint(p)             — mode (links vs timeline)
  reindexFingerprint(p)             — markdown vs code vs slug + chunker_version
  lintFingerprint, backlinksFingerprint, integrityFingerprint, importFingerprint

Canonical-JSON over keys-sorted ensures the same params produce the
same fingerprint across runs and hosts. sha8 (8 hex chars from sha256)
is short enough for filenames + UI but collision-resistant for the
expected per-op invocation diversity.

DB-backed for both engines (PGLite has the table too via v67). Lost-
write on partial DB failure is non-fatal — caller continues, next run
re-walks (cheap for hash-short-circuited ops like embed/import).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(D12 + codex #10–16 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T4 of brain-health-100 wave. Pure module — no engine I/O. Takes a
BrainHealth snapshot + RecommendationContext, returns ordered
Remediation[] ready to feed the doctor remediation plan OR features
--auto-fix.

Three public exports:
  computeRecommendations(health, ctx)  → Remediation[]
  classifyChecks(checks, ctx)          → CheckClassification[]
  maxReachableScore(health, classes)   → number (0-100 ceiling)

D13 — three-state classification per check: remediable / human_only /
blocked. The plan ONLY emits remediable items; blocked surfaces
alongside as informational with the missing prereq (no API key, etc.).
Closes the spin-loop bug on empty / API-key-missing brains (codex #20).

D14 — every Remediation has a stable string id (sync.repo, embed.stale,
backlinks.fix, extract.all). depends_on references ids, not check names.

D9 — idempotency_key is content-hash from canonical-JSON of params.
Same intent across runs = same key; failed-row replay via :r<N> suffix
is the --remediate loop's job, not this module's.

Scope item +A (cost-budget gate) — Remediation.est_usd_cost populated
for embed (chars × pricePerMTok from embedding-pricing.ts) and Anthropic
jobs (estimateAnthropicCost helper). doctor --remediate --max-usd N
gates submission against est_total_usd_cost.

Both consumers (doctor + features per D15) import from here. Features
executes inline (D15 contract preserved), doctor submits via queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…noExtract fix

T5 of brain-health-100 wave.

PROTECTED_JOB_NAMES extension (D11): synthesize, patterns, consolidate.
These cycle phases internally submit `subagent` jobs with
allowProtectedSubmit=true, so they CAN spend Anthropic credits.
Treating them as "data-quality maintenance" was a misread surfaced by
the codex outside-voice review (#6). Protected gate ensures only
trusted local callers (CLI, autopilot, doctor --remediate) can submit;
an OAuth-scoped MCP client can't burn the user's API budget by
submitting a synthesize job over HTTP.

11 new handlers registered in jobs.ts registerBuiltinHandlers:

  PROTECTED (3) — phase-wrappers that spawn subagent children:
    synthesize, patterns, consolidate

  Open (8) — DB/fs writes only, no LLM spend:
    reindex, repair-jsonb, orphans, integrity, purge,
    extract_facts, resolve_symbol_edges, recompute_emotional_weight

Phase-wrappers all delegate to `runCycle({ phases: [name] })` rather
than extracting standalone phase functions. Cycle.ts already owns the
lock + abort signal + progress reporter per D10, so the wrapper is a
one-liner and cycle.ts remains the single source of truth for phase
semantics. Pragmatic deviation from the plan's "extract 6 standalone
runXxxPhase functions" — smaller diff, equivalent correctness.

Standalone `sync` handler now passes `noExtract: true` (codex #5 fix).
Pre-fix, doctor's remediation plan emitting [sync, extract] caused
double-extraction (performSync inline-extract + standalone extract
job). Now sync defers extract to the dedicated handler. Callers that
want inline extract pass { noExtract: false } in job params.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T5 + D10 + D11 + codex #5/#6 from outside-voice review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T6 of brain-health-100 wave. The headline user-facing capability:
agents drive brain health to target score via autonomous Minions
remediation.

Two new flags on `gbrain doctor`:

  --remediation-plan [--json] [--target-score N]
    Read-only. Emits ordered Remediation[] from BrainHealth + context.
    Uses cheap path (D7) — engine.getHealth() + computeRecommendations,
    NOT a full doctor walk. JSON shape is stable agent contract.

  --remediate [--yes] [--target-score N] [--max-jobs N] [--max-usd N]
              [--dry-run] [--json]
    Sequential submit (D3) with D5 cascade on failure, D7 scoped
    recheck between steps, D9 content-hash idempotency keys, D13
    three-state remediation filtering (only remediable jobs enter
    the loop), +A cost-budget gate via --max-usd.

Check.remediation field added as additive optional (DoctorReport
schema_version stays at 2 per D4).

PGLite path: synchronous in-process execution with short polling.
Postgres path: durable queue submission with waitForCompletion.

The --remediate loop:
  1. Compute initial plan from BrainHealth
  2. Refuse if --target-score > maxReachableScore(health, classes)
  3. Refuse if est_total_usd_cost > --max-usd
  4. For each step in order:
     - Skip if depends_on intersects aborted set (D5)
     - queue.add with content-hash idempotency_key (D9)
     - waitForCompletion with timeout
     - Recompute plan from fresh health (D7 scoped recheck)
  5. Exit 0 if all completed; 1 if any failed/aborted

doctor_run_id UUID stamps every submitted job's data field so
operators can later query `SELECT * FROM minion_jobs WHERE
data->>'doctor_run_id' = '<uuid>'` (indexed via v68 partial GIN).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T6 + D1/D3/D5/D7/D9/D13 + folded scope A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T7 of brain-health-100 wave. New helper in src/core/cli-options.ts
formalizes the --background flag pattern. Same semantics in TTY and
cron per D9 (submit-and-exit always; --background --follow execs
`gbrain jobs follow <id>` after submission).

  await maybeBackground({
    engine, args, jobName: 'embed',
    paramBuilder: (cleanArgs) => ({ stale, all, ... }),
  })
  // returns true if backgrounded → caller exits

Content-hash idempotency key (D9): `cli:embed:sha8(canonical-JSON(params))`.
No time-slot. Same intent across runs = same key. Failed-row replay
is the doctor --remediate loop's job, not this path's.

PGLite degrades to inline execution with a clear stderr note
("PGLite has no worker daemon; running inline"). NOT a no-op,
NOT silent — doc-stated semantic difference because PGLite has no
worker daemon.

Applied to `gbrain embed` as the reference integration. The other 6
commands (extract, lint, backlinks, reindex, integrity, pages) adopt
the same 4-line pattern at the top of their entry function — follow-up
in a smaller diff once the helper proves out in production.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T7 + D9 + Gap 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T8 of brain-health-100 wave.

Autopilot dispatch changes (src/commands/autopilot.ts):

Pre-fix: every tick submitted ONE autopilot-cycle job, full phase
set, regardless of brain state. On a healthy brain pure overhead; on
a degraded brain bundled fast wins with slow phases so user waited
for the slowest.

New decision logic (T8 from plan):
  - score >= 95 AND empty plan AND <60min since last full → SLEEP
  - score >= 95 AND empty plan AND >=60min → submit autopilot-cycle
    (phase-coupling exercise)
  - plan <= 3 steps AND est_total < 5min → submit individual handlers
    (targeted; uses D9 content-hash idempotency keys per step;
    maxWaiting:1 per submit per codex #17)
  - else → submit autopilot-cycle (the hammer)

D10 cycle-lock invariant guarantees targeted-submit and autopilot-cycle
can never run concurrently (both acquire gbrain-cycle), closing the
"60-min floor double-processes queued targeted jobs" failure mode.

Computation uses cheap path (D7) — engine.getHealth() + computeRecommendations,
NOT a full doctor walk. Adds ~1 SQL count query per tick; negligible
on a 50K-page brain.

PROTECTED handlers (synthesize/patterns/consolidate) are submitted with
allowProtectedSubmit:true; autopilot is a trusted local caller.

Cycle purge phase (src/core/cycle.ts):

Added op_checkpoints GC (+C folded scope item). 7-day TTL — any
reasonable long-running op finishes inside that window. Non-fatal
on pre-v67 brains (table missing).

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T8 + D7/D9/D10 + codex #17 + folded scope +C).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T10 of brain-health-100 wave — load-bearing decision-pinning tests.

test/brain-score-recommendations.test.ts (22 cases):
  - Healthy brain → empty plan
  - Per-component remediation paths (sync, embed, backlinks, extract)
  - depends_on wiring (extract → sync; embed → sync when stale)
  - Severity ordering (critical > high > medium > low)
  - D6 #5 determinism: same input twice → byte-identical output
  - D9 idempotency keys: content-hash format, no time-slot
  - D9 source isolation: different --source → different key
  - D13 status field always 'remediable' in output
  - +A cost-estimate populated for embed
  - classifyChecks: remediable / blocked / human_only triage
  - maxReachableScore: all-remediable → 100; all-blocked → current

test/op-checkpoint.test.ts (20 cases):
  - fingerprint stability + key-order invariance (canonical-JSON)
  - codex #11: extract links vs timeline get different fingerprints
  - codex #12: reindex markdown vs code get different fingerprints
  - codex #15: embed model+dim variation produces different fingerprints
  - reindex chunker_version bump invalidates checkpoint
  - DB round-trip (load → record → load)
  - Cross-fingerprint isolation (linksKey vs timelineKey)
  - clearOpCheckpoint idempotency on missing rows
  - resumeFilter purity (no I/O, deterministic)
  - purgeStaleCheckpoints TTL respect

42 new tests, all pass. PGLite engine + resetPgliteState pattern per
CLAUDE.md test-isolation guide.

Plan: ~/.claude/plans/system-instruction-you-are-working-fluttering-ocean.md
(T10 + D6 #5 + D9 + D12 + D13 + codex #11/#12/#15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T12 of brain-health-100 wave. VERSION + package.json bumped 0.35.6.0
→ 0.36.0.0. CHANGELOG entry leads ELI10 ("your agent can now drive
your brain to 90/100 by itself, on a cron, without you watching")
then drills into the precise mechanics per CLAUDE.md voice rules.

llms.txt + llms-full.txt regenerated via bun run build:llms.

Trio audit (CLAUDE.md mandatory pre-push check):
  VERSION:     0.36.0.0
  package.json: 0.36.0.0
  CHANGELOG:   ## [0.36.0.0] - 2026-05-18

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/core/migrate.ts
…-100 wave

- README.md: New-in-v0.36.4.0 callout — `gbrain doctor --remediate` headline,
  autopilot health-aware tick, eleven new background-job types, three PROTECTED.
- CLAUDE.md: Key Files entries for `op-checkpoint.ts`, `brain-score-recommendations.ts`,
  doctor.ts / jobs.ts / protected-names.ts / autopilot.ts / cycle.ts / embed.ts /
  cli-options.ts extensions; new "Key commands added in v0.36.4.0" section.
- AGENTS.md: Common-tasks entry pointing agents at the one-command remediation loop.
- skills/maintain/SKILL.md: Autonomous Phase (gbrain doctor --remediate) at the top,
  manual per-dimension walk preserved as the fallback path.
- llms-full.txt: regenerated to pick up the CLAUDE.md changes (project rule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reframed the cost-budget callout. Pre-fix language said the spend cap
prevents a synthesize loop from "burning $100 of Anthropic credits
while you're at lunch" — casually treating $100 as the throwaway number
is tone-deaf. $100 is a meaningful amount for many people.

New language: "spend cap so a synthesize loop can't run up your
Anthropic bill while you're at lunch. The cap is yours to set per run."
And: "Pass --max-usd 5 (or whatever cap you're comfortable with)."
And: "Pick the cap that fits your wallet."

Also reframed three adjacent lines:
- "healthy brains stop burning cycles" → "stop spending tokens on
  work that has nothing to do"
- "agent can't submit them and burn your API budget" → "can't submit
  them on your behalf. Your provider bill stays in your hands"
- Table cell "Cron with cost cap" / "--max-usd 5" → "Cron with spend
  cap" / "--max-usd N"

llms-full.txt regenerated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 65ff663 into master May 19, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 19, 2026
Master added PR #1193 (v0.36.4.0 — brain-health-100, autonomous remediation
via doctor --remediate + Minions) and PR #1201 (docs drift audit) since the
last merge. My branch stays at v0.36.5.0; CHANGELOG sequence is now
0.36.5.0 (mine) → 0.36.4.0 (master) → 0.36.3.0 → 0.36.2.0.

Resolved: VERSION + package.json (kept 0.36.5.0); CHANGELOG (mine on top,
strip markers, all entries preserved). All source files auto-merged cleanly.

verify green, llms regenerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
  v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"]) (garrytan#1192)
  v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions (garrytan#1193)
  fix(docs): comprehensive drift audit — contradictions, broken links, stale refs (garrytan#1201)
  v0.36.3.0 feat: dynamic embedding column selection for search (garrytan#1164)
  v0.36.2.0 feat: ZeroEntropy as default + zero-based README rewrite (garrytan#1136)
  v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes (garrytan#1182)
  v0.36.1.0 Hindsight calibration wave: brain learns how you tend to be wrong (garrytan#1139)
  v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install) (garrytan#1130)
  v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts (garrytan#1138)
  v0.35.7.0 feat: temporal trajectory + founder scorecard (Phases 2-4) (garrytan#1131)
  v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes garrytan#1091) (garrytan#1129)
  v0.35.5.1 fix(doctor): stop counting clean supervisor exits as crashes (garrytan#1108)
  v0.35.5.0 fix wave: bootstrap + orphans + think MCP + worktree + walker (garrytan#1111)
  v0.35.4.0 fix(doctor,entities): supervisor crash classification + bare-name resolver + 58x perf + stub guard observability (garrytan#1085)
  v0.35.3.1 feat(eval): temporal-aware contradiction probe + verdict enum (garrytan#1052)
  v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement (garrytan#1053)

# Conflicts:
#	src/core/postgres-engine.ts
#	test/schema-bootstrap-coverage.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant