fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540
Closed
garrytan-agents wants to merge 1 commit into
Closed
fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540garrytan-agents wants to merge 1 commit into
garrytan-agents wants to merge 1 commit into
Conversation
Two bugs causing judge_failed on every brainstorm/lsd run: 1. maxTokens hard-coded at 4000 in judges.ts. With 36-96 ideas per batch, each producing ~100 tokens of JSON output (id, 5 axis scores, note), the response was consistently truncated mid-JSON → parseJudgeJSON failure. Fix: scale maxTokens to ideas.length * 150 + 500 (min 4000). 2. Pricing lookup in anthropic-pricing.ts and budget-tracker.ts only split on ':' (anthropic:claude-sonnet-4-6) but not '/' (anthropic/claude-sonnet-4-6). CLI --judge-model passes slash-separated IDs → no pricing match → BudgetExhausted with reason 'no_pricing' when --max-cost is set. Fix: fall through to slash-split when colon-split finds nothing. Tested: brainstorm run with 72 ideas now scores 39 passing (was 0 of 72).
6 tasks
Owner
|
Superseded by #1562: brought the fix into a production-ready wave that centralizes model-id parsing across all 5 lookup sites (was 2) + extends the gateway resolver ( Codex adversarial review during /ship caught that the pricing fix alone wouldn't close the bug class — it would just shift the failure from Thanks @garrytan-agents — your bug report + first-pass diff drove the whole investigation. Credit lives in the v0.41.21.0 CHANGELOG entry and PR #1562 body. |
garrytan
added a commit
that referenced
this pull request
May 27, 2026
…#1562) * feat(core): add splitProviderModelId centralizer for pricing-side parsing New pure helper in src/core/model-id.ts that splits provider:model, provider/model, and bare model strings into a {provider, model} pair. Defensive contract: null/undefined/empty/whitespace returns {provider: null, model: ''}. Will be wired into the 5 pricing/budget sites in the next commit. Named splitProviderModelId (not parseModelId) to avoid the in-project collision with the gateway-side src/core/ai/model-resolver.ts:parseModelId which has a different bare-name contract. Pinned by 16 cases in test/model-id.test.ts covering all separator forms plus defensive + edge inputs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(gateway): accept slash-form provider id in model-resolver src/core/ai/model-resolver.ts:parseModelId now accepts both provider:model (colon) and provider/model (slash) forms. Colon wins when both separators present so OpenRouter nested ids like openrouter:anthropic/claude-sonnet-4.6 route as {providerId: 'openrouter', modelId: 'anthropic/claude-sonnet-4.6'}. Pre-fix: every gateway entry point (chat / embed / rerank) threw AIConfigError 'missing a provider prefix' on slash form ids. That meant CLI users running gbrain brainstorm --judge-model anthropic/claude-sonnet-4-6 would still fail mid-judge with AIConfigError even after pricing was relaxed to accept slash form. Closes the end-to-end bug class. Bare names without ANY separator still throw — gateway routing always needs an explicit provider. Existing tests pinning that throw (test/ai/capabilities.test.ts:43) stay green. Pinned by 10 cases in test/ai/model-resolver-slash.test.ts including a resolveRecipe round-trip that slash and colon forms land on the same recipe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor: route 5 pricing/config sites through splitProviderModelId Five sites had inline ':'-only provider-prefix splits that silently missed slash-form ids. Centralizing through splitProviderModelId closes the bug class: - src/core/anthropic-pricing.ts:estimateMaxCostUsd - src/core/budget/budget-tracker.ts:lookupPricing (closes the headline BudgetExhausted no_pricing failure on --max-cost + slash-form --judge-model) - src/core/eval-contradictions/cost-tracker.ts:pricingFor (legacy silent-Haiku fallback preserved per plan D9) - src/core/minions/batch-projection.ts (deleted bareModel inline helper; inlined splitProviderModelId at 2 call sites) - src/core/model-config.ts:isAnthropicProvider (silently fixed v0.31.12 subagent-guard bypass for slash-form Anthropic ids) Test gates land together so any bisect step is green: - NEW test/anthropic-pricing.test.ts (7 cases including structural regression guard: every ANTHROPIC_PRICING key reachable via all three forms) - NEW test/eval-contradictions/cost-tracker-slash.test.ts (6 cases including legacy-Haiku-fallback pin) - EXTENDED test/batch-projection.test.ts (slash + double-separator cases) - EXTENDED test/model-config.serial.test.ts (2 slash-form isAnthropicProvider cases) - EXTENDED test/core/budget/budget-tracker.test.ts (2 slash + colon reserve() cases) Behavior changes for slash-prefix ids only; bare and colon ids unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(brainstorm): scale judge maxTokens with per-model output cap Replace the hard-coded maxTokens: 4000 with computeJudgeMaxTokens that scales with idea count and respects each model's actual output cap. Pre-fix: any judge call with 36+ ideas produced ~100 tokens/idea of JSON that got truncated mid-output. parseJudgeJSON threw, orchestrator surfaced judge_failed: true, all ideas saved unscored. Verified failure mode on 72-idea fixture: 0/72 passing before, 39/72 after. Formula: min(modelCap, max(LEGACY_MIN_MAX_TOKENS, ideaCount*150+500)) Named constants extracted at top of judges.ts: - TOKEN_BUDGET_PER_IDEA = 150 (1.5x headroom over observed ~100/idea) - TOKEN_BUDGET_ENVELOPE = 500 (JSON wrapper) - LEGACY_MIN_MAX_TOKENS = 4000 (pre-fix floor preserved for 1-idea) - MAX_OUTPUT_TOKENS_CEIL = 32_000 (fallback when model unknown) - ANTHROPIC_OUTPUT_CAPS (per-model: Opus 4.7 = 32K, Sonnet 4.6 / Haiku 4.5 = 64K, legacy 3.5 = 8K) When the caller passes no modelOverride, the cap routes through the gateway's actual configured chat model via getChatModel() so the formula matches what chat() will use, not whatever the override hints at. Pre-fix the undefined-override case fell back to 32K even if the configured default was a legacy 8K model. Pinned by 16 cases in test/brainstorm/judges-maxtokens.test.ts: formula at 1/10/36/96/200/300 ideas, per-model cap binding (Haiku 3.5 8K, Opus 4.7 32K, Sonnet 4.6 64K), and integration via runJudge with a stubbed chatFn that captures ChatOpts.maxTokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v0.41.21.0) Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId centralizer + gateway resolver slash-form acceptance + per-model maxTokens cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: update project documentation for v0.41.21.0 CLAUDE.md: add v0.41.21.0 annotations to brainstorm/judges + model-config entries; add new key-files entry for src/core/model-id.ts (the shared splitProviderModelId centralizer) and src/core/ai/model-resolver.ts slash-form extension. README.md: add user-facing callout for the brainstorm judge_failed + slash-form pricing fix, mirroring the v0.41.19.0 callout shape. llms-full.txt: regenerated to absorb the CLAUDE.md + README changes (passes test/build-llms.test.ts drift guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 27, 2026
… fixes) Master shipped v0.41.22.1 (brainstorm/lsd judge fixes, closes #1540). Trio resolved with v0.41.23.0 staying at top, v0.41.22.1 entry preserved below. No source-code conflicts — only the standard VERSION + package.json + CHANGELOG trio. Verified: typecheck clean, bun run verify 28/28. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two bugs causing
judge_failedon every brainstorm/lsd run:Bug 1: maxTokens truncation
maxTokenshard-coded at 4000 injudges.ts. With 36-96 ideas per batch, each producing ~100 tokens of JSON output (id, 5 axis scores, note), the response was consistently truncated mid-JSON →parseJudgeJSONfailure.Fix: Scale maxTokens to
ideas.length * 150 + 500(min 4000).Bug 2: Slash-prefix pricing lookup
Pricing lookup in
anthropic-pricing.tsandbudget-tracker.tsonly split on:(anthropic:claude-sonnet-4-6) but not/(anthropic/claude-sonnet-4-6). CLI--judge-modelpasses slash-separated IDs → no pricing match →BudgetExhaustedwith reasonno_pricingwhen--max-costis set.Fix: Fall through to slash-split when colon-split finds nothing.
Tested
Brainstorm run with 72 ideas now scores 39 passing (was 0 of 72, every run since calibration cold-start).