Skip to content

v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install)#1130

Merged
garrytan merged 25 commits into
masterfrom
garrytan/addis-ababa-v1
May 18, 2026
Merged

v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install)#1130
garrytan merged 25 commits into
masterfrom
garrytan/addis-ababa-v1

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Skillpacks are scaffolding now, not amber. Scaffold once, own the files, fork freely.

v0.36 retires the managed-block install model that shipped in v0.19 and accumulated machinery through v0.32. install and uninstall are gone — replaced by six new subcommands that treat skills as first-class code in your agent repo, not vendor packages.

New CLI surface:

  • scaffold <name> — one-time, additive copy into your repo (refuses to overwrite). Includes paired source files declared in each SKILL.md's frontmatter sources: array. Partial-state fills missing paired sources even when the skill dir already exists.
  • reference <name> — read-only diff lens with an agent-readable framing line. --apply-clean-hunks runs a two-way auto-apply via a pure-JS unified-diff parser + applier (documented two-way limitation: no scaffold-time base tracking).
  • migrate-fence — one-shot strip of the legacy <!-- gbrain:skillpack:begin --> fence. Cumulative-slugs receipt → row-parsing fallback when stale. Preserves rows verbatim as user-owned routing during the transition.
  • scrub-legacy-fence-rows — opt-in cleanup after migrate-fence. Two-condition gate: removes a row only when skills/<slug>/ exists AND the skill's frontmatter declares non-empty triggers:.
  • harvest <slug> --from <host-repo-root> — the inverse loop. Lift a proven skill from ~/git/wintermute (or any host repo) back into gbrain. Symlink-reject + canonical-path containment + default-on privacy linter (catches \bWintermute\b, email, Slack channel patterns) with rollback on match.
  • check --strict — exit non-zero on bundle drift for CI gating.

Companion editorial skill skills/skillpack-harvest/SKILL.md drives the genericization checklist (scrub fork names, generalize triggers, lift fork-specific conventions to references) before running the harvest CLI.

autoDetectSkillsDir gains a cwd_walk_up tier ahead of ~/.openclaw/workspace. cd ~/git/wintermute && gbrain skillpack scaffold ... now auto-detects wintermute without needing a RESOLVER.md. R5 regression preserves $OPENCLAW_WORKSPACE precedence.

Subtractive surface gone: managed block, cumulative-slugs receipt, content-hash gates, lockfile, prune semantics. ~600 LOC of mechanism deleted from installer.ts (kept around in v0.36 to back the informational diff command; slated for v0.37 cleanup).

11 bisectable commits.

Test Coverage

NEW MODULES (10 src + 11 test files):
  src/core/skillpack/copy.ts            +  test/skillpack-copy.test.ts (16 cases)
  src/core/skillpack/scaffold.ts        +  test/skillpack-scaffold.test.ts (12 cases)
                                        +  test/skillpack-frontmatter-sources.test.ts (10 cases)
  src/core/skillpack/diff-text.ts       
  src/core/skillpack/reference.ts       +  test/skillpack-reference.test.ts (9 cases)
                                        +  test/skillpack-reference-apply.test.ts (7 cases)
  src/core/skillpack/apply-hunks.ts     +  test/skillpack-apply-hunks.test.ts (16 cases)
  src/core/skillpack/migrate-fence.ts   +  test/skillpack-migrate-fence.test.ts (18 cases)
  src/core/skillpack/scrub-legacy.ts    +  test/skillpack-scrub-legacy.test.ts (8 cases)
  src/core/skillpack/harvest.ts         +  test/skillpack-harvest.test.ts (9 cases)
  src/core/skillpack/harvest-lint.ts    +  test/skillpack-harvest-lint.test.ts (11 cases)
  src/core/skillpack/bundle.ts (ext)    
  src/core/repo-root.ts (ext, +5 cases)
  + test/e2e/skillpack-flow.test.ts     (9 real-subprocess flows)

DELETED:
  test/skillpack-uninstall.test.ts      (412 lines — uninstall command gone)
  test/skillpack-sync-guard.test.ts     (managed-block sync-guard gone)

Tests: 518/518 pass across 17 skillpack-related files. ~140 new cases.

Pre-Landing Review

Plan was reviewed via /plan-eng-review with codex outside-voice during plan mode (14 codex findings, all addressed in plan or implementation). All 16 implementation tasks complete; design surface matches the locked plan. No new findings in code review.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

All 16 tasks from ~/.claude/plans/system-instruction-you-are-working-glittery-crab.md:

  • ✅ T1: copyArtifacts shared helper
  • ✅ T2: scaffold.ts + bundle.ts frontmatter sources
  • ✅ T3: skillpack.ts CLI rewrite (drop install + uninstall)
  • ✅ T4: reference subcommand
  • ✅ T5: migrate-fence
  • ✅ T6: harvest
  • ✅ T7: harvest privacy linter
  • ✅ T8: skillpack-harvest editorial skill
  • ✅ T9: autoDetectSkillsDir cwd_walk_up tier
  • ✅ T10: skillpack-check --strict
  • ✅ T11: 9-case E2E flow test
  • ✅ T12: delete obsolete tests (folded into T1 commit)
  • ✅ T13: docs sweep (README + CLAUDE.md + new guide + CHANGELOG)
  • ✅ T14: repo-wide stale-reference grep
  • ✅ T15: apply-hunks + reference --apply-clean-hunks (TODO-3 folded)
  • ✅ T16: scrub-legacy-fence-rows (TODO-2 folded)

Verification Results

No dev server flow needed — skillpack is filesystem-only. Verified via E2E real-subprocess tests (test/e2e/skillpack-flow.test.ts, 9 cases covering scaffold first-run + re-run no-op, reference diff, reference --apply-clean-hunks, migrate-fence, scrub-legacy-fence-rows, harvest privacy-lint catch, harvest --no-lint bypass, install removed-error).

TODOS

All three TODOs originally deferred during plan-eng-review were folded into this PR as bisect commits per user direction:

  • TODO-1 (install alias removal) → clean break, no alias
  • TODO-2 (fence-row scrub) → scrub-legacy-fence-rows command
  • TODO-3 (clean-hunk merge driver) → reference --apply-clean-hunks

Documentation

  • README: skillpack section rewritten to scaffold + reference framing
  • CLAUDE.md: skillpack entry rewritten to reflect v0.36 contract change
  • New guide: docs/guides/skillpacks-as-scaffolding.md (model + workflow)
  • CHANGELOG: loud v0.36.0.0 entry with migration instructions
  • llms-full.txt regenerated

Test plan

  • All 518 unit + E2E tests pass on merged code
  • typecheck clean (bunx tsc --noEmit)
  • Skills conformance pass (skillpack-harvest meets the frontmatter contract)
  • check-resolvable pass (resolver row + routing-eval fixture wired)
  • CLI smoke verified: `scaffold book-mirror` lands 15 files; `install` exits non-zero with hint

🤖 Generated with Claude Code

garrytan and others added 18 commits May 17, 2026 15:01
Pure file-copy primitive for scaffold (gbrain→host) and harvest (host→gbrain).
Atomic-refusal contract: symlink-reject + canonical-path containment validate
every item before any write. Used by both directions of the v0.33 loop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New scaffold.ts replaces the managed-block installer. One-time additive copy
into the user's repo via copyArtifacts; refuses to overwrite existing files
(user owns them). Partial-state policy: copies missing paired sources even
when the skill dir already exists.

bundle.ts extended with loadSkillSources + enumerateScaffoldEntries — paired
source files declared in each SKILL.md's frontmatter sources: array, not in
openclaw.plugin.json. Single source of truth, co-located with the skill.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
reference is the read-only diff lens with an agent-readable framing line. Pure-JS
unified-diff producer + parser + applier (no patch(1) dependency). Two-way merge
with documented limitation: without scaffold-time base tracking, applied hunks
align everything to gbrain. The agent dry-runs reference first, then decides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
migrate-fence is the one-shot transition from the pre-v0.36 managed-block model.
Strips begin/end markers and the cumulative-slugs receipt comment; preserves
fence rows verbatim as user-owned routing during the transition to frontmatter
discovery. Receipt-then-row fallback (F-CDX-8) covers stale/missing receipts.

scrub-legacy-fence-rows is the opt-in cleanup after migrate-fence. Two-condition
gate: removes a row only when skills/<slug>/ exists AND that skill's frontmatter
declares non-empty triggers (proof frontmatter discovery covers it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The inverse loop: lift a proven skill from a host repo (~/git/wintermute, etc.)
back into gbrain so other clients can scaffold it. --from <host-repo-root> is
symmetric with scaffold's --workspace.

Security: symlink rejection + canonical-path containment (mirrors validateUploadPath).
Privacy: default-on linter scans harvested files against ~/.gbrain/harvest-private-patterns.txt
plus built-in defaults (Wintermute, email, Slack channel patterns). Any match
rolls back the copy and exits non-zero. --no-lint bypasses for the editorial
workflow after a manual scrub.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
autoDetectSkillsDir now walks up from cwd looking for any skills/ directory,
ahead of the implicit ~/.openclaw/workspace fallback. cd ~/git/wintermute &&
gbrain skillpack scaffold ... finds wintermute automatically without requiring
a RESOLVER.md/AGENTS.md to exist yet.

R5 regression preserved: $OPENCLAW_WORKSPACE still wins when explicitly set.
+5 test cases in test/repo-root.test.ts pin the new tier order and the R5 guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… T10)

skillpack.ts dispatcher rewritten for the v0.36 contract: scaffold, reference
(+ --apply-clean-hunks), migrate-fence, scrub-legacy-fence-rows, harvest, plus
the existing list / diff / check.

install and uninstall are gone — both exit non-zero with a hint pointing at
scaffold / migrate-fence. Clean break, no deprecated alias.

skillpack-check gains --strict for CI gating. When invoked as the subcommand
`gbrain skillpack check`, default is informational (exit 0 even with drift);
--strict opts back into the cron-friendly exit-1-on-issues behavior. Top-level
gbrain skillpack-check preserves its existing exit semantics for backwards compat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…(T8)

The companion editorial skill for the gbrain skillpack harvest CLI. Walks the
genericization checklist (scrub fork names, generalize triggers, lift fork-
specific conventions to references) before the CLI runs. Routing-eval fixtures
use paraphrased intents to avoid the intent_copies_trigger lint.

Wires the new slug into openclaw.plugin.json#skills, skills/manifest.json, and
skills/RESOLVER.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Spawns gbrain as a subprocess against tempdir workspaces. Covers: scaffold
first-run + re-run no-op, reference diff + --apply-clean-hunks, migrate-fence,
scrub-legacy-fence-rows, harvest privacy-lint catch + --no-lint bypass, and
the install removed-error path. No DATABASE_URL needed — skillpack is
filesystem-only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Skillpacks as scaffolding, not amber.

v0.36 retires the managed-block install model. Six new subcommands replace
install + uninstall: scaffold, reference (with --apply-clean-hunks), migrate-fence,
scrub-legacy-fence-rows, harvest, plus the existing list / diff / check
(check gains --strict for CI gating). Routing comes from each skill's
frontmatter triggers — gbrain does not touch your RESOLVER.md or AGENTS.md.

Companion editorial skill skillpack-harvest drives the genericization
checklist; default-on privacy linter catches Wintermute / email / Slack
references before they leak into gbrain core.

New docs guide at docs/guides/skillpacks-as-scaffolding.md walks the model
and the migration path for pre-v0.36 installs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-v1

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
…acing fork-name references

CI's check-privacy.sh and check-test-real-names.sh both flagged the literal
fork name across the v0.36 skillpack diff. Two failure modes, two fixes:

1. **Meta-rule-enforcement files** added to both allow-lists. The harvest
   privacy linter's whole job is to catch the banned literal leaking into
   gbrain; its source has the regex pattern, its tests verify the linter
   fires by feeding it the banned string, and the skill markdown documents
   the substitution policy. Same exception status as check-privacy.sh and
   check-proposal-pii.sh themselves. Files allow-listed:
   - src/core/skillpack/harvest-lint.ts
   - test/skillpack-harvest-lint.test.ts
   - test/skillpack-harvest.test.ts
   - test/e2e/skillpack-flow.test.ts
   - skills/skillpack-harvest/SKILL.md

2. **User-facing references** swapped for canonical phrasing per CLAUDE.md's
   responsible-disclosure rule. README + new docs guide + 4 src docstrings
   + 1 test now say 'your OpenClaw' / 'host agent repo' / 'agentRepo' var
   name. Behavior unchanged — only documentation strings touched.

Verify gate (the script CI runs) passes locally: EXIT=0.
Tests still pass: 60/60 across the affected files.
llms-full.txt regenerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sister fix to the test/repo-root.test.ts update in commit a31418e. The new
v0.33 cwd_walk_up tier fires before repo_root when running from inside the
gbrain repo — same skills/ dir matched, different source label. Behavior
unchanged; the legacy repo_root tier is now functionally subsumed (kept in
the type union for back-compat).

CI shard 3 failure: test/check-resolvable-cli.test.ts:171.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 24h and 72h exact-boundary tests scheduled last_sync_at relative to
Date.now() at construction time, then let the check call Date.now() again
internally. CI scheduler jitter between the two reads pushed ageMs past
the strict > thresholds by microseconds, dropping the 72h-boundary case
into the fail branch instead of warn.

Fix: add an optional `opts.now` test seam to checkSyncFreshness. The two
boundary tests now capture t0 once and pass it both to the timestamp
constructor and to the check, making ageMs deterministically equal to
the boundary. The non-boundary tests (4d, 30h, 2h, etc.) don't need
pinning — they're comfortably away from the > comparison.

CI shard 1 flake: test/doctor.test.ts:479. Locally 48/48 doctor tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-v1

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
… CLI surface (DX review)

DX audit of the v0.36 scaffold model surfaced one structural gap and four
output gaps. When scaffolded files land on a downstream agent's disk, the
agent had no agent-facing manifest telling it what to do — no routing
contract, no upgrade flow, no two-way merge warning at the right surface.

Fixes:

1. **New shared dep: skills/_AGENT_README.md.** Lands on every scaffold +
   migrate-fence alongside the existing _brain-filing-rules.md and
   _output-rules.md. Short, agent-readable contract: walk *.SKILL.md
   frontmatter triggers: for routing, gbrain is reference not law on
   upgrade, no managed-block fence anymore, two-way merge has known
   limitations. Single source of truth for the agent operating contract.

2. **scaffold stdout** prints a next-action hint pointing at the readme
   (with absolute path) and the reference --all upgrade-sweep command.

3. **reference stdout** adds per-category decision policy:
   - missing → scaffold again
   - differs → was edit intentional? keep it. Accidental? patch by hand or
     apply-clean-hunks after reading the two-way warning.

4. **reference --apply-clean-hunks** prints the two-way merge WARNING
   BEFORE the apply (to stderr, survives stdout redirect). Spells out
   that gbrain has no scaffold-time base and local edits in differing
   sections WILL be aligned to gbrain. Skipped in --json mode for
   machine consumers. On conflicts, prints how to inspect and patch.

5. **migrate-fence stdout** tells the agent its routing model just
   changed (fence gone, walk frontmatter now) and points at
   scrub-legacy-fence-rows as the eventual cleanup. References the new
   _AGENT_README for fresh-install agents.

Smoke verified end-to-end: 16 files land (was 15, +1 for _AGENT_README),
hint prints with absolute path, readme lands on disk. Tests + verify gate
pass clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sion filter (DX deferred items)

Closes the last two DX gaps from the v0.36 audit:

1. **Post-upgrade reference sweep.** New `postUpgradeReferenceSweep`
   helper called at the end of `gbrain post-upgrade`. After migrations
   apply, auto-runs `reference --all` against the detected host
   workspace and prints a one-line-per-skill summary of drift. Five
   gates: GBRAIN_SKIP_REFERENCE_SWEEP env-var bypass, no detected
   workspace (silent), workspace IS gbrain repo (dev-mode silent),
   zero drift (silent), and pure-missing skills the host never
   scaffolded are filtered out as noise. All errors swallowed —
   never blocks post-upgrade. Helper accepts test-seam opts
   (gbrainRoot, targetWorkspace) for unit testability.

2. **`reference --all --since <version>`.** Filters the sweep to
   skills whose source actually changed in gbrain between
   <version> and HEAD, using a new `changedSlugsSinceVersion`
   helper in bundle.ts. Pure-JS git wrapper (spawnSync), no deps.
   Accepts bare '0.X.Y.Z' or 'v0.X.Y.Z' or commit SHA. Falls back
   loudly to full sweep when git can't resolve the ref (tarball
   install, missing tag).

Test coverage added — total +32 new test cases:

UNIT (15 cases):
- test/skillpack-changed-since-version.test.ts (9 cases): git-aware
  filter against a fixture git repo. Covers null on non-repo,
  null on bad tag, empty array on no changes, single + multi-slug
  drift (deduped + sorted), bare + v-prefix version forms, non-
  skills/ path filtering, SHA-prefix ref form.
- test/upgrade-reference-sweep.test.ts (6 cases): gate logic.
  Covers env-var bypass, zero drift, empty-host suppression,
  drift-detected output shape, dev-mode workspace==gbrain guard,
  error-swallowing contract.

E2E (8 new cases in test/e2e/skillpack-flow.test.ts):
- 10: scaffold lands skills/_AGENT_README.md
- 11: scaffold stdout prints the Next: hint
- 12: scaffold re-run (skipped-existing) suppresses the hint
- 13: reference stdout prints per-category decision policy
- 14: --apply-clean-hunks WARNING on stderr, not stdout
- 15: --apply-clean-hunks --json suppresses the WARNING (bug fix
  surfaced here: code originally printed unconditionally, now
  gated on !json)
- 16: migrate-fence stdout points at the new routing model
- 17: --since with a bad tag falls back to full sweep with warn

Local sweep: 579/579 pass across 18 affected test files, verify
gate EXIT=0, llms regenerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-v1

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
garrytan added a commit that referenced this pull request May 18, 2026
Three open PRs were claiming v0.36.0.0 (#1130 skillpack, #1139
hindsight, #1136 this PR). Ship-aware queue allocator says this
branch lands at v0.36.2.0.

Trio audit:
  VERSION       0.36.2.0
  package.json  0.36.2.0
  CHANGELOG     ## [0.36.2.0] - 2026-05-17

Updates: VERSION, package.json, CHANGELOG header + body refs,
README "New default in v0.36.2.0" announcement + credit line,
skills/migrations/v0.36.0.0.md renamed to v0.36.2.0.md with
frontmatter + body refs updated. llms-full.txt regenerated.
garrytan and others added 7 commits May 17, 2026 19:54
… MECE structure

The README had drifted into a changelog dumping ground. Four 'New in vX.Y'
paragraphs competed for the lead, 16 version tags scattered through
headings, the production-numbers hook (17,888 pages, 4,383 people) was
six months stale, and skills were described in three places (Skills section,
Commands section, inline marketing prose).

Zero-based rewrite:

**Refreshed catalog** (surveyed live brain + live agent fork, broad strokes
per CLAUDE.md privacy rules):
- ~100K total brain items (was 17,888 in the old README — 6x stale)
- ~16K people (was 4,383)
- ~5K companies (was 723)
- ~8K concepts, ~4K originals, ~3.5K daily notes
- ~31K media (30K tweets, 179 books, papers/films/games/interviews)
- 108 cron jobs running (was 21)
- 273 skills in the live agent fork (35 bundled + 238 user-built)

**Structure** — MECE, single source of truth per concept:
1. Hook + at-a-glance table (refreshed numbers)
2. Install (3 paths, terse)
3. What it does (5 capability areas — replaces 12 scattered sections)
4. Skills (categorized one-liners — 35 lines, was ~200)
5. How it works (one coherent flow — replaces 4 overlapping sections:
   Architecture, Knowledge Model, Knowledge Graph, Search, Why It Works)
6. Commands (terse cheatsheet — every command, one line each)
7. Docs (link map — points to docs/ for the heavy stuff)
8. Origin / Contributing / License

**Cut entirely** (moved or deleted):
- 4 'New in vX.Y' leads (→ CHANGELOG.md is the changelog)
- 16 (vX.Y) version tags in section headings
- Minions stats subsection (subsumed into hook + 'durable background work')
- Voice section (was 12 lines of brand prose)
- Engine Architecture detail (→ docs/architecture/)
- File Storage section (→ docs/guides/storage-tiering.md)
- Per-skill marketing prose (one-liner per skill in the table)

The README is no longer the changelog. Future releases append to
CHANGELOG.md; the README only changes when a structural capability does.

llms-full.txt regenerated. Privacy check + verify gate pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two fixes in one:

1. **Markdown bug fix.** The OAuth 2.1 paragraph had `+ PKCE,` on a line
   start (column 1), which GitHub-flavored markdown interprets as a list
   marker — the line break before it broke the paragraph and rendered as
   an orphan first line followed by a bullet. Rewrote the OAuth 2.1
   capabilities as inline-comma-separated, escaped the `+` semantics.
   Swept the whole file for the same bug class — no other instances.

2. **Maximum-sell mode for evals.** Surveyed every published benchmark
   in both this repo and ~/git/gbrain-evals. Strongest evidence pulled
   to the top:

   - **97.60% R@5 on the public LongMemEval _s (500 questions).** No LLM
     in the retrieval loop. $0.50 per 1000 queries. Beats MemPalace raw
     by a point on the same dataset, beats every academic dense
     retriever (Stella, Contriever, BM25). Mastra/Supermemory measure
     a different metric (QA accuracy with LLM judge) — flagged honestly.

   - **+31.4 points P@5 from the self-wiring knowledge graph** on
     BrainBench v0.20.0 (240-page rich-prose corpus, 145 relational
     gold queries). Separable, measured, load-bearing. Zero retrieval
     regression across seven releases (v0.16 → v0.20).

   New '## Benchmarks' section after Install:
   - Public benchmark table with cross-system comparison
   - In-house BrainBench scorecard with per-adapter Δ vs gbrain
   - Source-swamp resistance result (93.3% top-1 vs 80% grep-only)
   - Skill/prompt compression: 25KB → 13KB AGENTS.md, +13-17pp accuracy
     across Opus 4.7 / Sonnet 4.6 / Haiku 4.5
   - 'Run your own evals' subsection with copy-pasteable commands for
     every eval surface (longmemeval, cross-modal, eval capture/replay,
     BrainBench)

   Tightened the lead's cost-comparison claim to what's defensible per
   the underlying eval doc (MemPal LLM-rerank $0.001/q vs gbrain
   $0.0005/q; dropped the overstated '6x' I'd written initially).

Privacy + verify gate + build-llms test all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o 'Receipts on the evals'

Previous lead dumped metric acronyms (R@5, P@5, P@5 deltas, MemPalace,
Stella, Contriever, BM25) before the reader knew what gbrain does. A
'somewhat technical' reader hits the wall of jargon and bounces.

Rewritten:

**Lead (jargon-free, 3 paragraphs)** — describes the value in plain
English, with two anchor numbers:
- 'right answer in top 5 results 97.6% of the time' (not 'R@5 97.60%')
- 'roughly 4x more relevant than plain vector RAG' (not '+31.4 pts P@5')
- 'better than every comparable system that doesn't pay for a language-
  model call on every retrieval' (the load-bearing honest framing,
  without naming the competitors mid-hook)
- ends with '[Receipts on the evals →]' linking down

**'## Benchmarks' renamed '## Receipts on the evals'** with a glossary
at the top defining R@5, P@5, and 'no LLM in the loop' in one line each.
Then the full tables: LongMemEval cross-system (with the metric-mismatch
flag for Mastra/Supermemory), in-house BrainBench scorecard, source-swamp
resistance, and prompt compression. The competitor names + metrics stay
here where readers who want the receipts can find them, with the
glossary so the acronyms don't tax cold readers.

Net: lead reads as 'here's what it does and the proof' instead of 'here
are the benchmark numbers, figure out what they mean.' Comparison facts
unchanged.

Privacy + verify gate + build-llms test all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-v1

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/commands/doctor.ts
#	test/doctor.test.ts
Two specific edits from user feedback:

1. 'the standard public benchmark for AI memory systems' → 'LongMemEval'
   (linked to the HuggingFace dataset). The benchmark has a name; use it.

2. 'Built by the President and CEO of Y Combinator to run his own AI
   agents' (passive third-person) → 'I'm the President and CEO of Y
   Combinator, and I use this 16 hours a day' (active first-person).
   Carried the voice change through the rest of the README — the
   downstream 'Garry's personal agent' line and the Origin section's
   'Garry Tan needed... he'd ever drafted... so he built one' all flip
   to first person ('my personal agent', 'I needed', 'I'd ever drafted',
   'so I built one'). The README is now consistently first-person from
   the author's voice instead of a hagiographic third-person framing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three deployment patterns documented:

1. Single GBrain server + thin MCP clients (recommended). Tailscale
   private networking, OAuth scope, source-scoped clients, exhaustive
   what-clients-can/cannot-do lists.
2. Local PGLite + GStack for per-worktree code search.
3. Federated repos (advanced) — multiple servers indexing the same
   brain repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ceral query example

Self-eval as a cold reader surfaced four gaps blocking a 10/10 first read:

1. Lead never says WHAT it is technically — CLI? service? cloud? local?
   Added a "What it is, technically" block right after the hook: open-source
   MIT, Bun CLI + MCP server, local-first, data stays on disk, MCP-native.
2. Install path optimized for committed users not evaluators. The old
   "recommended" path (deploy OpenClaw on Render, 8GB RAM) blocked anyone
   trying gbrain for the first time. Reordered into 3 paths by commitment:
   60-second standalone CLI first, MCP for Claude Code / Cursor second,
   full agentic install third.
3. No example output showing what success looks like. Added a real sample
   `gbrain query` invocation with the hybrid-search result format so a
   reader can feel the experience before they install.
4. Privacy / data-locality unaddressed in lead. Now stated up front:
   embedding calls only hit external APIs if you configure them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 0394766 into master May 18, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 18, 2026
Master shipped v0.36.0.0 (skillpack scaffold / reference / harvest;
retired managed-block install, #1130) — naming overlap with this
branch's slot. This branch's slot stays 0.36.1.0 (already higher);
master's v0.36.0.0 entry preserved in CHANGELOG.

VERSION trio resolved: my 0.36.1.0 wins over master's 0.36.0.0 on
VERSION, package.json, and CHANGELOG.md top entry. llms-full.txt
regenerated. All other files auto-merged cleanly (CLAUDE.md,
README.md, skills/RESOLVER.md, etc).

Verification:
- bun run typecheck: green
- bun install: lockfile up to date

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 19, 2026
…1136)

* feat(dims): OpenAI text-embedding-3 Matryoshka range validation (D13)

dimsProviderOptions now fail-loud at the embed boundary when the
configured embedding_dimensions is outside the model's native range
(1..1536 for -small, 1..3072 for -large). Paste-ready fix hint in the
AIConfigError.fix field. Closes the silent-HTTP-400 path that would
have bit OpenAI-fallback users on v0.36.0.0 ZE-default installs.

16 new test cases in test/ai/dims-openai.test.ts pinning the contract
across native-openai and openai-compatible adapter paths.

* feat(ai): flip defaults to ZeroEntropy zembed-1 1280d + zerank-2 reranker

Default embedding model is now zeroentropyai:zembed-1 at 1280d via
Matryoshka. Real-corpus benchmark: 2.2x faster than OpenAI, 2.6x
cheaper at regular pricing, wins 11/20 head-to-head queries.

1280 is the closest valid ZE Matryoshka step to the prior OpenAI 1536d
default (valid set: 2560/1280/640/320/160/80/40). 1024 (Voyage's step)
is NOT on ZE's list — pinned by AIConfigError fail-loud in dims.ts.

balanced mode bundle now defaults reranker_enabled=true. zerank-2
reshuffles 60% of top-1 results in benchmarks. Missing-key fail-open
contract in src/core/search/rerank.ts handles unauthenticated cases.
Opt out with: gbrain config set search.reranker.enabled false

Existing tests updated (gateway.test.ts, search-mode.test.ts) and a
new test/balanced-reranker-default.test.ts (10 cases) pins the fail-
open invariants.

* feat(retrieval-upgrade): RetrievalUpgradePlanner + interactive prompt UX

New src/core/retrieval-upgrade-planner.ts is the consolidated planner
that computes the brain's pending retrieval-upgrade work (chunker
bumps + ZE switch) in one pass and applies the schema transition +
config updates atomically.

Tagged-union ApplyResult enum (D15): 'applied' | 'skipped_already_
applied' | 'skipped_no_work' | 'declined' | 'planned' | 'failed'.
No string-parsing reasons.

Three config keys (D12): ze_switch_prompt_shown (UI state),
ze_switch_requested (user intent), ze_switch_applied (work done).
Plus ze_switch_previous_snapshot (JSON, full prior config for --undo
per D16) and ze_switch_declined_at (90-day re-ask window).

Schema transition (D18) is atomic: DROP indexes + ALTER COLUMN +
CREATE INDEX inside a single engine.transaction(). HNSW recreation
is part of the same transaction — no silent slow-search window.

C3 eligibility logic: ze_switch_offered iff NOT on ZE + NOT declined
recently + NOT applied + (legacy default OR >100 pages).

C4 cost math: MAX(chunker_pending, dim_pending) not SUM — one
re-embed pass invalidates both surfaces simultaneously.

New src/core/retrieval-upgrade-prompt.ts wires the planner to a
TTY-only interactive prompt with two-line cost split (D10) and
privacy callout for the reranker flip.

Tests: test/retrieval-upgrade-planner.test.ts (24 cases) pins the
state machine. test/asymmetric-encoding-contract.test.ts (6 cases)
pins D17: search read path uses gateway.embedQuery() not embed(),
asserted via __setEmbedTransportForTests mock.

* feat(cli): gbrain ze-switch — manual lever for the ZE switch

New gbrain ze-switch CLI with --dry-run, --json, --resume, --force,
--undo, --non-interactive, --confirm-reembed, --ignore-missing-key
flags. Mirrors the upgrade prompt's UX symmetry: --undo presents a
cost-warning before re-embedding back to the prior width.

src/cli.ts: dispatch case + CLI_ONLY entry. ze-switch owns its own
engine lifecycle (mirrors the doctor pattern).

test/ze-switch-cli.test.ts (11 cases): --help, --dry-run, --json,
--non-interactive, --ignore-missing-key, --resume, --undo,
--confirm-reembed. Uses captureExit harness to test process.exit()
paths without breaking the test process.

* feat(doctor): ze_embedding_health + embedding_width_consistency checks

Two new doctor checks (D-A5):

ze_embedding_health: when embedding_model starts with zeroentropyai:,
verify ZEROENTROPY_API_KEY is set (env or config). Paste-ready setup
hint with the signup URL on failure.

embedding_width_consistency: cross-check that the configured
embedding_dimensions matches the actual vector(N) column width on
content_chunks.embedding. Catches the half-applied switch state
(schema migrated but config write crashed) with a paste-ready
gbrain ze-switch --resume hint.

Wired into runDoctor between reranker_health and the existing
sync_freshness checks. Both checks gracefully no-op on non-ZE
embedding configs.

test/doctor-ze-checks.test.ts (8 cases) pins both checks across
happy + missing-key + missing-config + drift paths. Uses withEnv()
helper to clear ZEROENTROPY_API_KEY for the no-key path so tests
are hermetic against contributor env state.

test/e2e/v0_28_5-fix-wave.test.ts + test/openai-compat-multimodal.test.ts:
updated to explicit-configure the gateway when the test depends on
specific dims that diverge from the v0.36.0.0 default (1280d).

* docs: README zero-based rewrite (884 -> 139 lines) + new docs files

Strip 4 months of accreted "New in v0.X.Y" hero blocks and reorganize
around what gbrain does today. 33 H2s -> 8. The Commands section
(136 lines duplicating gbrain --help) moved out; the 6-table skills
enumeration collapsed to a one-paragraph capability description with
a link to skills/RESOLVER.md.

Hero retains load-bearing facts: OpenClaw + Hermes credit, production
numbers (17,888 pages / 4,383 people / 723 companies), BrainBench
numbers (P@5 49.1% / R@5 97.9% / +31.4 lift), ZE comparison numbers,
30-min install claim. Adds one paragraph announcing the v0.36.0.0 ZE
default with the explicit gbrain config set escape for OpenAI/Voyage
users.

New files:
- docs/INSTALL.md: every install path consolidated (agent platform,
  CLI standalone, MCP server). Thin-client mode covered.
- docs/architecture/RETRIEVAL.md: why the hybrid + graph stack works.
  BrainBench numbers, why each strategy alone fails, the source-aware
  ranking + intent classification + multi-query expansion story.
- docs/ethos/ORIGIN.md: origin story lifted from the old README so
  the front door stays factual + concrete.

test/readme-hero-anchors.test.ts (5 cases) is the D9 regression
guard. Five load-bearing strings: OpenClaw, Hermes, ZE,
production-numbers regex, P@5/R@5. Light anchors that let voice/
structure evolve but block accidental loss of headline facts.

scripts/check-test-real-names.sh: allowlist entries for OpenClaw +
Hermes literals in the anchor test (it explicitly asserts those
strings appear in README).

* chore: bump version and changelog (v0.36.0.0)

ZeroEntropy as the new default for embedding (zembed-1 at 1280d via
Matryoshka) and reranker (zerank-2 cross-encoder, on by default in
balanced mode bundle). README zero-based rewrite (884 -> 139 lines).
3 new docs files. Two new doctor checks. New gbrain ze-switch CLI
with --undo for symmetric reversibility.

skills/migrations/v0.36.0.0.md tells the agent how to surface the
retrieval-upgrade prompt post-upgrade.

llms-full.txt regenerated via bun run build:llms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): scrub Wintermute from RETRIEVAL.md per privacy rule

* chore: rebump version 0.36.0.0 → 0.36.2.0 (queue collision)

Three open PRs were claiming v0.36.0.0 (#1130 skillpack, #1139
hindsight, #1136 this PR). Ship-aware queue allocator says this
branch lands at v0.36.2.0.

Trio audit:
  VERSION       0.36.2.0
  package.json  0.36.2.0
  CHANGELOG     ## [0.36.2.0] - 2026-05-17

Updates: VERSION, package.json, CHANGELOG header + body refs,
README "New default in v0.36.2.0" announcement + credit line,
skills/migrations/v0.36.0.0.md renamed to v0.36.2.0.md with
frontmatter + body refs updated. llms-full.txt regenerated.

* fix(test): pin gateway dim=1536 in cross-file-stateful PGLite tests

CI shard 1 reported 10 failures across `query-cache.test.ts` (6) and
`consolidate-valid-until.test.ts` (4). Both files hardcode 1536-dim
vectors but rely on `PGLiteEngine.initSchema()` to size
`vector(__EMBEDDING_DIMS__)` at the right width.

Root cause: v0.36.2.0 flipped DEFAULT_EMBEDDING_DIMENSIONS from 1536
to 1280 (ZE Matryoshka step). The gateway module is process-singleton;
when ANOTHER test file in the same shard's bun-test process configures
the gateway before us, `pglite-engine.ts:216` reads
`getEmbeddingDimensions() === 1280` and sizes the schema columns at
vector(1280). The hardcoded 1536-dim INSERTs then fail with
"expected 1280 dimensions, not 1536".

Locally these tests pass in isolation because the gateway falls back
through the try/catch at pglite-engine.ts:218 (1536 default). CI runs
multiple test files in one process, so cross-file state poisons the
schema width.

Fix: explicit `resetGateway()` + `configureGateway({embedding_dimensions:
1536, ...})` at the top of `beforeAll`, plus `resetGateway()` in
`afterAll`. Pins the schema width regardless of cross-file state.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
  v0.36.5.0 feat: secure DATABASE_URL access for shell jobs (inherit: ["database_url"]) (garrytan#1192)
  v0.36.4.0 feat: brain-health-100 — autonomous remediation via doctor --remediate + Minions (garrytan#1193)
  fix(docs): comprehensive drift audit — contradictions, broken links, stale refs (garrytan#1201)
  v0.36.3.0 feat: dynamic embedding column selection for search (garrytan#1164)
  v0.36.2.0 feat: ZeroEntropy as default + zero-based README rewrite (garrytan#1136)
  v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes (garrytan#1182)
  v0.36.1.0 Hindsight calibration wave: brain learns how you tend to be wrong (garrytan#1139)
  v0.36.0.0 feat(skillpack): scaffold + reference + harvest (retire managed-block install) (garrytan#1130)
  v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts (garrytan#1138)
  v0.35.7.0 feat: temporal trajectory + founder scorecard (Phases 2-4) (garrytan#1131)
  v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes garrytan#1091) (garrytan#1129)
  v0.35.5.1 fix(doctor): stop counting clean supervisor exits as crashes (garrytan#1108)
  v0.35.5.0 fix wave: bootstrap + orphans + think MCP + worktree + walker (garrytan#1111)
  v0.35.4.0 fix(doctor,entities): supervisor crash classification + bare-name resolver + 58x perf + stub guard observability (garrytan#1085)
  v0.35.3.1 feat(eval): temporal-aware contradiction probe + verdict enum (garrytan#1052)
  v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement (garrytan#1053)

# Conflicts:
#	src/core/postgres-engine.ts
#	test/schema-bootstrap-coverage.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant