v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end)#1562
Merged
Conversation
…sing
New pure helper in src/core/model-id.ts that splits provider:model,
provider/model, and bare model strings into a {provider, model} pair.
Defensive contract: null/undefined/empty/whitespace returns
{provider: null, model: ''}.
Will be wired into the 5 pricing/budget sites in the next commit.
Named splitProviderModelId (not parseModelId) to avoid the in-project
collision with the gateway-side src/core/ai/model-resolver.ts:parseModelId
which has a different bare-name contract.
Pinned by 16 cases in test/model-id.test.ts covering all separator
forms plus defensive + edge inputs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/core/ai/model-resolver.ts:parseModelId now accepts both
provider:model (colon) and provider/model (slash) forms. Colon wins
when both separators present so OpenRouter nested ids like
openrouter:anthropic/claude-sonnet-4.6 route as
{providerId: 'openrouter', modelId: 'anthropic/claude-sonnet-4.6'}.
Pre-fix: every gateway entry point (chat / embed / rerank) threw
AIConfigError 'missing a provider prefix' on slash form ids. That
meant CLI users running
gbrain brainstorm --judge-model anthropic/claude-sonnet-4-6
would still fail mid-judge with AIConfigError even after pricing
was relaxed to accept slash form. Closes the end-to-end bug class.
Bare names without ANY separator still throw — gateway routing
always needs an explicit provider. Existing tests pinning that
throw (test/ai/capabilities.test.ts:43) stay green.
Pinned by 10 cases in test/ai/model-resolver-slash.test.ts
including a resolveRecipe round-trip that slash and colon forms
land on the same recipe.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five sites had inline ':'-only provider-prefix splits that silently
missed slash-form ids. Centralizing through splitProviderModelId
closes the bug class:
- src/core/anthropic-pricing.ts:estimateMaxCostUsd
- src/core/budget/budget-tracker.ts:lookupPricing (closes the
headline BudgetExhausted no_pricing failure on --max-cost +
slash-form --judge-model)
- src/core/eval-contradictions/cost-tracker.ts:pricingFor
(legacy silent-Haiku fallback preserved per plan D9)
- src/core/minions/batch-projection.ts (deleted bareModel inline
helper; inlined splitProviderModelId at 2 call sites)
- src/core/model-config.ts:isAnthropicProvider (silently fixed
v0.31.12 subagent-guard bypass for slash-form Anthropic ids)
Test gates land together so any bisect step is green:
- NEW test/anthropic-pricing.test.ts (7 cases including structural
regression guard: every ANTHROPIC_PRICING key reachable via all
three forms)
- NEW test/eval-contradictions/cost-tracker-slash.test.ts (6 cases
including legacy-Haiku-fallback pin)
- EXTENDED test/batch-projection.test.ts (slash + double-separator
cases)
- EXTENDED test/model-config.serial.test.ts (2 slash-form
isAnthropicProvider cases)
- EXTENDED test/core/budget/budget-tracker.test.ts (2 slash + colon
reserve() cases)
Behavior changes for slash-prefix ids only; bare and colon ids
unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the hard-coded maxTokens: 4000 with computeJudgeMaxTokens
that scales with idea count and respects each model's actual output
cap.
Pre-fix: any judge call with 36+ ideas produced ~100 tokens/idea of
JSON that got truncated mid-output. parseJudgeJSON threw, orchestrator
surfaced judge_failed: true, all ideas saved unscored. Verified
failure mode on 72-idea fixture: 0/72 passing before, 39/72 after.
Formula: min(modelCap, max(LEGACY_MIN_MAX_TOKENS, ideaCount*150+500))
Named constants extracted at top of judges.ts:
- TOKEN_BUDGET_PER_IDEA = 150 (1.5x headroom over observed ~100/idea)
- TOKEN_BUDGET_ENVELOPE = 500 (JSON wrapper)
- LEGACY_MIN_MAX_TOKENS = 4000 (pre-fix floor preserved for 1-idea)
- MAX_OUTPUT_TOKENS_CEIL = 32_000 (fallback when model unknown)
- ANTHROPIC_OUTPUT_CAPS (per-model: Opus 4.7 = 32K, Sonnet 4.6 /
Haiku 4.5 = 64K, legacy 3.5 = 8K)
When the caller passes no modelOverride, the cap routes through the
gateway's actual configured chat model via getChatModel() so the
formula matches what chat() will use, not whatever the override
hints at. Pre-fix the undefined-override case fell back to 32K even
if the configured default was a legacy 8K model.
Pinned by 16 cases in test/brainstorm/judges-maxtokens.test.ts:
formula at 1/10/36/96/200/300 ideas, per-model cap binding (Haiku 3.5
8K, Opus 4.7 32K, Sonnet 4.6 64K), and integration via runJudge with
a stubbed chatFn that captures ChatOpts.maxTokens.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId centralizer + gateway resolver slash-form acceptance + per-model maxTokens cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md: add v0.41.21.0 annotations to brainstorm/judges + model-config entries; add new key-files entry for src/core/model-id.ts (the shared splitProviderModelId centralizer) and src/core/ai/model-resolver.ts slash-form extension. README.md: add user-facing callout for the brainstorm judge_failed + slash-form pricing fix, mirroring the v0.41.19.0 callout shape. llms-full.txt: regenerated to absorb the CLAUDE.md + README changes (passes test/build-llms.test.ts drift guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master claimed both v0.41.21.0 (ops-fix-wave) and v0.41.22.0 (type-unification cathedral) while this branch was shipping. Rebumped from v0.41.21.0 → v0.41.22.1 across VERSION, package.json, CHANGELOG header + in-body refs, and TODOS section header + body refs. No code changes — only the version metadata. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the brainstorm/lsd judge failure that's been silently returning
judge_failed: truesince the calibration cold-start landed. Productionizes PR #1540 (closed as superseded), expands scope from 2 patched sites to 5 + extends the gateway resolver so the fix actually works end-to-end.Per-commit:
feat(core)— newsplitProviderModelIdcentralizer for pricing-side parsingfeat(gateway)—model-resolver.ts:parseModelIdextended to accept slash form (closes the load-bearing gap Codex caught: pricing fix alone left gateway throwing on slash-form mid-judge)refactor— 5 pricing/config sites routed through the centralizerfeat(brainstorm)— judge maxTokens scales with idea count + respects each model's actual output cap viaANTHROPIC_OUTPUT_CAPS(closes the headline truncation bug)chore— VERSION + CHANGELOG + TODOSdocs— README + CLAUDE.md + llms-full.txt syncTest Coverage
Targeted suite: 190/190 pass. Full unit suite: pre-existing parallel-cross-contamination flakes only (schema-cli + 2 autopilot-cycle-handler tests, all pass standalone, present on master).
Pre-Landing Review
No issues found in code review. Codex adversarial caught 4 substantive issues during /ship; all absorbed into the shipped version:
ANTHROPIC_OUTPUT_CAPSper-model mapsplitProviderModelIddefensive contract was test-only → moved into the type signaturemodel-resolver.ts:parseModelIdto also accept slash formcomputeJudgeMaxTokenscap looked atmodelOverridenot the actual chat model → routes throughgetChatModel()when override is undefinedPlan Completion
All 13 plan-eng-review decisions (D1-D13) implemented. 3 follow-up TODOs filed in
TODOS.mdfor the explicitly-deferred items (config-write normalization, non-Anthropic pricing tables, eval-contradictions duplicate pricing table consolidation).Plan file:
~/.claude/plans/system-instruction-you-are-working-joyful-river.mdVerification Results
E2E suite NOT required (no DB-shape changes; the unit suite covers the judge integration via the chatFn DI seam, and the new gateway resolver tests pin the resolveRecipe round-trip).
Documentation
judge_failed: true+ slash-form--judge-modelfixes, mirroring the v0.41.19.0 callout shape.src/core/model-config.ts:isAnthropicProvidernow routes throughsplitProviderModelId;src/core/brainstorm/{...,judges}.tsnow scalesmaxTokensper-model viaANTHROPIC_OUTPUT_CAPS). Added NEW key-files entries forsrc/core/model-id.tsand thesrc/core/ai/model-resolver.ts:parseModelIdslash-form extension.test/build-llms.test.tsdrift guard).TODOS
3 new P2/P3 items filed under "v0.41.21.0 brainstorm judge fix-wave follow-ups":
:form on config write)These were explicitly deferred per the plan's Step 0 scope decision (Option A: ship the centralizer + 5 sites cleanly, defer the full pricing-system DRY).
Credit
Thanks to @garrytan-agents whose original bug report and first-pass diff in PR #1540 drove the whole investigation. Their pricing-side patch is incorporated in spirit (centralized + extended to 3 more sites); commits route through the new centralizer so their literal lines aren't in this PR's history, but credit lives in the CHANGELOG entry and this PR body per the attribution discussion in plan-eng-review D10.
Test plan
bun run verify(28 checks green)bun run typecheckcleangbrain brainstorm "topic" --judge-model anthropic/claude-sonnet-4-6 --max-cost 1🤖 Generated with Claude Code