v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end) by garrytan · Pull Request #1562 · garrytan/gbrain

garrytan · 2026-05-27T14:03:26Z

Summary

Closes the brainstorm/lsd judge failure that's been silently returning judge_failed: true since the calibration cold-start landed. Productionizes PR #1540 (closed as superseded), expands scope from 2 patched sites to 5 + extends the gateway resolver so the fix actually works end-to-end.

Per-commit:

feat(core) — new splitProviderModelId centralizer for pricing-side parsing
feat(gateway) — model-resolver.ts:parseModelId extended to accept slash form (closes the load-bearing gap Codex caught: pricing fix alone left gateway throwing on slash-form mid-judge)
refactor — 5 pricing/config sites routed through the centralizer
feat(brainstorm) — judge maxTokens scales with idea count + respects each model's actual output cap via ANTHROPIC_OUTPUT_CAPS (closes the headline truncation bug)
chore — VERSION + CHANGELOG + TODOS
docs — README + CLAUDE.md + llms-full.txt sync

Test Coverage

[+] src/core/model-id.ts (NEW splitProviderModelId)        16 cases
[+] src/core/ai/model-resolver.ts (slash extension)        10 cases (new file)
[+] src/core/anthropic-pricing.ts (estimateMaxCostUsd)      7 cases (new file)
[+] src/core/budget/budget-tracker.ts (lookupPricing)       2 cases (extended)
[+] src/core/eval-contradictions/cost-tracker.ts            6 cases (new file)
[+] src/core/minions/batch-projection.ts                    2 cases (extended)
[+] src/core/model-config.ts:isAnthropicProvider            2 cases (extended)
[+] src/core/brainstorm/judges.ts (computeJudgeMaxTokens)  16 cases (new file)
                                                           ─────────
                                                            61 new test cases

Targeted suite: 190/190 pass. Full unit suite: pre-existing parallel-cross-contamination flakes only (schema-cli + 2 autopilot-cycle-handler tests, all pass standalone, present on master).

Pre-Landing Review

No issues found in code review. Codex adversarial caught 4 substantive issues during /ship; all absorbed into the shipped version:

Original 32K maxTokens cap was unsafe for legacy 3.5 models (8K cap) → fixed via ANTHROPIC_OUTPUT_CAPS per-model map
splitProviderModelId defensive contract was test-only → moved into the type signature
Pricing fix alone would let BudgetTracker pass but gateway would still throw → extended model-resolver.ts:parseModelId to also accept slash form
computeJudgeMaxTokens cap looked at modelOverride not the actual chat model → routes through getChatModel() when override is undefined

Plan Completion

All 13 plan-eng-review decisions (D1-D13) implemented. 3 follow-up TODOs filed in TODOS.md for the explicitly-deferred items (config-write normalization, non-Anthropic pricing tables, eval-contradictions duplicate pricing table consolidation).

Plan file: ~/.claude/plans/system-instruction-you-are-working-joyful-river.md

Verification Results

# Pre-fix this would silently exit with judge_failed:
gbrain brainstorm "what should I work on next" --max-cost 1
# Should now show: scored idea list with "passing N/M" > 0

# Pre-fix this would refuse to start with BudgetExhausted no_pricing,
# OR pass pricing then throw AIConfigError at gateway:
gbrain brainstorm "topic" --judge-model anthropic/claude-sonnet-4-6 --max-cost 1
# Should now run end-to-end to completion

E2E suite NOT required (no DB-shape changes; the unit suite covers the judge integration via the chatFn DI seam, and the new gateway resolver tests pin the resolveRecipe round-trip).

Documentation

README.md — added a v0.41.21.0 user-facing callout for the brainstorm judge_failed: true + slash-form --judge-model fixes, mirroring the v0.41.19.0 callout shape.
CLAUDE.md — added v0.41.21.0 annotations to two existing key-files entries (src/core/model-config.ts:isAnthropicProvider now routes through splitProviderModelId; src/core/brainstorm/{...,judges}.ts now scales maxTokens per-model via ANTHROPIC_OUTPUT_CAPS). Added NEW key-files entries for src/core/model-id.ts and the src/core/ai/model-resolver.ts:parseModelId slash-form extension.
llms-full.txt — regenerated to absorb the CLAUDE.md + README changes (passes the test/build-llms.test.ts drift guard).
CHANGELOG.md — v0.41.21.0 entry follows the ELI10 voice rules (lead with what users can now do, table comparing pre/post behavior, "what we caught and fixed before merging" section).
TODOS.md — 3 v0.41.21.0 follow-ups filed during ship.

TODOS

3 new P2/P3 items filed under "v0.41.21.0 brainstorm judge fix-wave follow-ups":

Config-write normalization (canonicalize provider IDs to : form on config write)
Non-Anthropic pricing tables (OpenAI / Gemini / OpenRouter pricing surface)
Eval-contradictions duplicate ANTHROPIC_PRICING consolidation

These were explicitly deferred per the plan's Step 0 scope decision (Option A: ship the centralizer + 5 sites cleanly, defer the full pricing-system DRY).

Credit

Thanks to @garrytan-agents whose original bug report and first-pass diff in PR #1540 drove the whole investigation. Their pricing-side patch is incorporated in spirit (centralized + extended to 3 more sites); commits route through the new centralizer so their literal lines aren't in this PR's history, but credit lives in the CHANGELOG entry and this PR body per the attribution discussion in plan-eng-review D10.

Test plan

bun run verify (28 checks green)
bun run typecheck clean
Targeted suite 190/190 pass
Full unit suite: only pre-existing master flakes fail (schema-cli + autopilot-cycle-handler)
Hand-verify after merge: gbrain brainstorm "topic" --judge-model anthropic/claude-sonnet-4-6 --max-cost 1
Close PR fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup #1540 with link to this superseder

🤖 Generated with Claude Code

…sing New pure helper in src/core/model-id.ts that splits provider:model, provider/model, and bare model strings into a {provider, model} pair. Defensive contract: null/undefined/empty/whitespace returns {provider: null, model: ''}. Will be wired into the 5 pricing/budget sites in the next commit. Named splitProviderModelId (not parseModelId) to avoid the in-project collision with the gateway-side src/core/ai/model-resolver.ts:parseModelId which has a different bare-name contract. Pinned by 16 cases in test/model-id.test.ts covering all separator forms plus defensive + edge inputs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

src/core/ai/model-resolver.ts:parseModelId now accepts both provider:model (colon) and provider/model (slash) forms. Colon wins when both separators present so OpenRouter nested ids like openrouter:anthropic/claude-sonnet-4.6 route as {providerId: 'openrouter', modelId: 'anthropic/claude-sonnet-4.6'}. Pre-fix: every gateway entry point (chat / embed / rerank) threw AIConfigError 'missing a provider prefix' on slash form ids. That meant CLI users running gbrain brainstorm --judge-model anthropic/claude-sonnet-4-6 would still fail mid-judge with AIConfigError even after pricing was relaxed to accept slash form. Closes the end-to-end bug class. Bare names without ANY separator still throw — gateway routing always needs an explicit provider. Existing tests pinning that throw (test/ai/capabilities.test.ts:43) stay green. Pinned by 10 cases in test/ai/model-resolver-slash.test.ts including a resolveRecipe round-trip that slash and colon forms land on the same recipe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Five sites had inline ':'-only provider-prefix splits that silently missed slash-form ids. Centralizing through splitProviderModelId closes the bug class: - src/core/anthropic-pricing.ts:estimateMaxCostUsd - src/core/budget/budget-tracker.ts:lookupPricing (closes the headline BudgetExhausted no_pricing failure on --max-cost + slash-form --judge-model) - src/core/eval-contradictions/cost-tracker.ts:pricingFor (legacy silent-Haiku fallback preserved per plan D9) - src/core/minions/batch-projection.ts (deleted bareModel inline helper; inlined splitProviderModelId at 2 call sites) - src/core/model-config.ts:isAnthropicProvider (silently fixed v0.31.12 subagent-guard bypass for slash-form Anthropic ids) Test gates land together so any bisect step is green: - NEW test/anthropic-pricing.test.ts (7 cases including structural regression guard: every ANTHROPIC_PRICING key reachable via all three forms) - NEW test/eval-contradictions/cost-tracker-slash.test.ts (6 cases including legacy-Haiku-fallback pin) - EXTENDED test/batch-projection.test.ts (slash + double-separator cases) - EXTENDED test/model-config.serial.test.ts (2 slash-form isAnthropicProvider cases) - EXTENDED test/core/budget/budget-tracker.test.ts (2 slash + colon reserve() cases) Behavior changes for slash-prefix ids only; bare and colon ids unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replace the hard-coded maxTokens: 4000 with computeJudgeMaxTokens that scales with idea count and respects each model's actual output cap. Pre-fix: any judge call with 36+ ideas produced ~100 tokens/idea of JSON that got truncated mid-output. parseJudgeJSON threw, orchestrator surfaced judge_failed: true, all ideas saved unscored. Verified failure mode on 72-idea fixture: 0/72 passing before, 39/72 after. Formula: min(modelCap, max(LEGACY_MIN_MAX_TOKENS, ideaCount*150+500)) Named constants extracted at top of judges.ts: - TOKEN_BUDGET_PER_IDEA = 150 (1.5x headroom over observed ~100/idea) - TOKEN_BUDGET_ENVELOPE = 500 (JSON wrapper) - LEGACY_MIN_MAX_TOKENS = 4000 (pre-fix floor preserved for 1-idea) - MAX_OUTPUT_TOKENS_CEIL = 32_000 (fallback when model unknown) - ANTHROPIC_OUTPUT_CAPS (per-model: Opus 4.7 = 32K, Sonnet 4.6 / Haiku 4.5 = 64K, legacy 3.5 = 8K) When the caller passes no modelOverride, the cap routes through the gateway's actual configured chat model via getChatModel() so the formula matches what chat() will use, not whatever the override hints at. Pre-fix the undefined-override case fell back to 32K even if the configured default was a legacy 8K model. Pinned by 16 cases in test/brainstorm/judges-maxtokens.test.ts: formula at 1/10/36/96/200/300 ideas, per-model cap binding (Haiku 3.5 8K, Opus 4.7 32K, Sonnet 4.6 64K), and integration via runJudge with a stubbed chatFn that captures ChatOpts.maxTokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId centralizer + gateway resolver slash-form acceptance + per-model maxTokens cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CLAUDE.md: add v0.41.21.0 annotations to brainstorm/judges + model-config entries; add new key-files entry for src/core/model-id.ts (the shared splitProviderModelId centralizer) and src/core/ai/model-resolver.ts slash-form extension. README.md: add user-facing callout for the brainstorm judge_failed + slash-form pricing fix, mirroring the v0.41.19.0 callout shape. llms-full.txt: regenerated to absorb the CLAUDE.md + README changes (passes test/build-llms.test.ts drift guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Master claimed both v0.41.21.0 (ops-fix-wave) and v0.41.22.0 (type-unification cathedral) while this branch was shipping. Rebumped from v0.41.21.0 → v0.41.22.1 across VERSION, package.json, CHANGELOG header + in-body refs, and TODOS section header + body refs. No code changes — only the version metadata. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)

garrytan and others added 6 commits May 27, 2026 07:00

chore: bump version and changelog (v0.41.21.0)

9364fde

Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId centralizer + gateway resolver slash-form acceptance + per-model maxTokens cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan mentioned this pull request May 27, 2026

fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup #1540

Closed

garrytan changed the title ~~v0.41.21.0 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end)~~ v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end) May 27, 2026

garrytan merged commit 127842e into master May 27, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end)#1562

v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end)#1562
garrytan merged 7 commits into
masterfrom
garrytan/brainstorm-judge-fixes

garrytan commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 27, 2026

Summary

Test Coverage

Pre-Landing Review

Plan Completion

Verification Results

Documentation

TODOS

Credit

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant