fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup by garrytan-agents · Pull Request #1540 · garrytan/gbrain

garrytan-agents · 2026-05-27T06:32:24Z

Two bugs causing judge_failed on every brainstorm/lsd run:

Bug 1: maxTokens truncation

maxTokens hard-coded at 4000 in judges.ts. With 36-96 ideas per batch, each producing ~100 tokens of JSON output (id, 5 axis scores, note), the response was consistently truncated mid-JSON → parseJudgeJSON failure.

Fix: Scale maxTokens to ideas.length * 150 + 500 (min 4000).

Bug 2: Slash-prefix pricing lookup

Pricing lookup in anthropic-pricing.ts and budget-tracker.ts only split on : (anthropic:claude-sonnet-4-6) but not / (anthropic/claude-sonnet-4-6). CLI --judge-model passes slash-separated IDs → no pricing match → BudgetExhausted with reason no_pricing when --max-cost is set.

Fix: Fall through to slash-split when colon-split finds nothing.

Tested

Brainstorm run with 72 ideas now scores 39 passing (was 0 of 72, every run since calibration cold-start).

Two bugs causing judge_failed on every brainstorm/lsd run: 1. maxTokens hard-coded at 4000 in judges.ts. With 36-96 ideas per batch, each producing ~100 tokens of JSON output (id, 5 axis scores, note), the response was consistently truncated mid-JSON → parseJudgeJSON failure. Fix: scale maxTokens to ideas.length * 150 + 500 (min 4000). 2. Pricing lookup in anthropic-pricing.ts and budget-tracker.ts only split on ':' (anthropic:claude-sonnet-4-6) but not '/' (anthropic/claude-sonnet-4-6). CLI --judge-model passes slash-separated IDs → no pricing match → BudgetExhausted with reason 'no_pricing' when --max-cost is set. Fix: fall through to slash-split when colon-split finds nothing. Tested: brainstorm run with 72 ideas now scores 39 passing (was 0 of 72).

garrytan · 2026-05-27T14:03:33Z

Superseded by #1562: brought the fix into a production-ready wave that centralizes model-id parsing across all 5 lookup sites (was 2) + extends the gateway resolver (src/core/ai/model-resolver.ts:parseModelId) to also accept slash form + adds per-model maxTokens caps.

Codex adversarial review during /ship caught that the pricing fix alone wouldn't close the bug class — it would just shift the failure from BudgetExhausted no_pricing to AIConfigError 'missing a provider prefix' mid-judge. The shipped version extends the gateway resolver to match.

Thanks @garrytan-agents — your bug report + first-pass diff drove the whole investigation. Credit lives in the v0.41.21.0 CHANGELOG entry and PR #1562 body.

…#1562) * feat(core): add splitProviderModelId centralizer for pricing-side parsing New pure helper in src/core/model-id.ts that splits provider:model, provider/model, and bare model strings into a {provider, model} pair. Defensive contract: null/undefined/empty/whitespace returns {provider: null, model: ''}. Will be wired into the 5 pricing/budget sites in the next commit. Named splitProviderModelId (not parseModelId) to avoid the in-project collision with the gateway-side src/core/ai/model-resolver.ts:parseModelId which has a different bare-name contract. Pinned by 16 cases in test/model-id.test.ts covering all separator forms plus defensive + edge inputs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(gateway): accept slash-form provider id in model-resolver src/core/ai/model-resolver.ts:parseModelId now accepts both provider:model (colon) and provider/model (slash) forms. Colon wins when both separators present so OpenRouter nested ids like openrouter:anthropic/claude-sonnet-4.6 route as {providerId: 'openrouter', modelId: 'anthropic/claude-sonnet-4.6'}. Pre-fix: every gateway entry point (chat / embed / rerank) threw AIConfigError 'missing a provider prefix' on slash form ids. That meant CLI users running gbrain brainstorm --judge-model anthropic/claude-sonnet-4-6 would still fail mid-judge with AIConfigError even after pricing was relaxed to accept slash form. Closes the end-to-end bug class. Bare names without ANY separator still throw — gateway routing always needs an explicit provider. Existing tests pinning that throw (test/ai/capabilities.test.ts:43) stay green. Pinned by 10 cases in test/ai/model-resolver-slash.test.ts including a resolveRecipe round-trip that slash and colon forms land on the same recipe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor: route 5 pricing/config sites through splitProviderModelId Five sites had inline ':'-only provider-prefix splits that silently missed slash-form ids. Centralizing through splitProviderModelId closes the bug class: - src/core/anthropic-pricing.ts:estimateMaxCostUsd - src/core/budget/budget-tracker.ts:lookupPricing (closes the headline BudgetExhausted no_pricing failure on --max-cost + slash-form --judge-model) - src/core/eval-contradictions/cost-tracker.ts:pricingFor (legacy silent-Haiku fallback preserved per plan D9) - src/core/minions/batch-projection.ts (deleted bareModel inline helper; inlined splitProviderModelId at 2 call sites) - src/core/model-config.ts:isAnthropicProvider (silently fixed v0.31.12 subagent-guard bypass for slash-form Anthropic ids) Test gates land together so any bisect step is green: - NEW test/anthropic-pricing.test.ts (7 cases including structural regression guard: every ANTHROPIC_PRICING key reachable via all three forms) - NEW test/eval-contradictions/cost-tracker-slash.test.ts (6 cases including legacy-Haiku-fallback pin) - EXTENDED test/batch-projection.test.ts (slash + double-separator cases) - EXTENDED test/model-config.serial.test.ts (2 slash-form isAnthropicProvider cases) - EXTENDED test/core/budget/budget-tracker.test.ts (2 slash + colon reserve() cases) Behavior changes for slash-prefix ids only; bare and colon ids unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(brainstorm): scale judge maxTokens with per-model output cap Replace the hard-coded maxTokens: 4000 with computeJudgeMaxTokens that scales with idea count and respects each model's actual output cap. Pre-fix: any judge call with 36+ ideas produced ~100 tokens/idea of JSON that got truncated mid-output. parseJudgeJSON threw, orchestrator surfaced judge_failed: true, all ideas saved unscored. Verified failure mode on 72-idea fixture: 0/72 passing before, 39/72 after. Formula: min(modelCap, max(LEGACY_MIN_MAX_TOKENS, ideaCount*150+500)) Named constants extracted at top of judges.ts: - TOKEN_BUDGET_PER_IDEA = 150 (1.5x headroom over observed ~100/idea) - TOKEN_BUDGET_ENVELOPE = 500 (JSON wrapper) - LEGACY_MIN_MAX_TOKENS = 4000 (pre-fix floor preserved for 1-idea) - MAX_OUTPUT_TOKENS_CEIL = 32_000 (fallback when model unknown) - ANTHROPIC_OUTPUT_CAPS (per-model: Opus 4.7 = 32K, Sonnet 4.6 / Haiku 4.5 = 64K, legacy 3.5 = 8K) When the caller passes no modelOverride, the cap routes through the gateway's actual configured chat model via getChatModel() so the formula matches what chat() will use, not whatever the override hints at. Pre-fix the undefined-override case fell back to 32K even if the configured default was a legacy 8K model. Pinned by 16 cases in test/brainstorm/judges-maxtokens.test.ts: formula at 1/10/36/96/200/300 ideas, per-model cap binding (Haiku 3.5 8K, Opus 4.7 32K, Sonnet 4.6 64K), and integration via runJudge with a stubbed chatFn that captures ChatOpts.maxTokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v0.41.21.0) Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId centralizer + gateway resolver slash-form acceptance + per-model maxTokens cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update project documentation for v0.41.21.0 CLAUDE.md: add v0.41.21.0 annotations to brainstorm/judges + model-config entries; add new key-files entry for src/core/model-id.ts (the shared splitProviderModelId centralizer) and src/core/ai/model-resolver.ts slash-form extension. README.md: add user-facing callout for the brainstorm judge_failed + slash-form pricing fix, mirroring the v0.41.19.0 callout shape. llms-full.txt: regenerated to absorb the CLAUDE.md + README changes (passes test/build-llms.test.ts drift guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… fixes) Master shipped v0.41.22.1 (brainstorm/lsd judge fixes, closes #1540). Trio resolved with v0.41.23.0 staying at top, v0.41.22.1 entry preserved below. No source-code conflicts — only the standard VERSION + package.json + CHANGELOG trio. Verified: typecheck clean, bun run verify 28/28. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)

garrytan mentioned this pull request May 27, 2026

v0.41.22.1 feat: brainstorm/lsd judge fixes (closes #1540 end-to-end) #1562

Merged

6 tasks

garrytan closed this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540

fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-judge-truncation

garrytan-agents commented May 27, 2026

Uh oh!

garrytan commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 27, 2026

Bug 1: maxTokens truncation

Bug 2: Slash-prefix pricing lookup

Tested

Uh oh!

garrytan commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants