Skip to content

fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540

Closed
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-judge-truncation
Closed

fix: brainstorm/lsd judge truncation + slash-prefix pricing lookup#1540
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/brainstorm-judge-truncation

Conversation

@garrytan-agents

Copy link
Copy Markdown
Contributor

Two bugs causing judge_failed on every brainstorm/lsd run:

Bug 1: maxTokens truncation

maxTokens hard-coded at 4000 in judges.ts. With 36-96 ideas per batch, each producing ~100 tokens of JSON output (id, 5 axis scores, note), the response was consistently truncated mid-JSON → parseJudgeJSON failure.

Fix: Scale maxTokens to ideas.length * 150 + 500 (min 4000).

Bug 2: Slash-prefix pricing lookup

Pricing lookup in anthropic-pricing.ts and budget-tracker.ts only split on : (anthropic:claude-sonnet-4-6) but not / (anthropic/claude-sonnet-4-6). CLI --judge-model passes slash-separated IDs → no pricing match → BudgetExhausted with reason no_pricing when --max-cost is set.

Fix: Fall through to slash-split when colon-split finds nothing.

Tested

Brainstorm run with 72 ideas now scores 39 passing (was 0 of 72, every run since calibration cold-start).

Two bugs causing judge_failed on every brainstorm/lsd run:

1. maxTokens hard-coded at 4000 in judges.ts. With 36-96 ideas per batch,
   each producing ~100 tokens of JSON output (id, 5 axis scores, note),
   the response was consistently truncated mid-JSON → parseJudgeJSON failure.
   Fix: scale maxTokens to ideas.length * 150 + 500 (min 4000).

2. Pricing lookup in anthropic-pricing.ts and budget-tracker.ts only split
   on ':' (anthropic:claude-sonnet-4-6) but not '/' (anthropic/claude-sonnet-4-6).
   CLI --judge-model passes slash-separated IDs → no pricing match → BudgetExhausted
   with reason 'no_pricing' when --max-cost is set.
   Fix: fall through to slash-split when colon-split finds nothing.

Tested: brainstorm run with 72 ideas now scores 39 passing (was 0 of 72).
@garrytan

Copy link
Copy Markdown
Owner

Superseded by #1562: brought the fix into a production-ready wave that centralizes model-id parsing across all 5 lookup sites (was 2) + extends the gateway resolver (src/core/ai/model-resolver.ts:parseModelId) to also accept slash form + adds per-model maxTokens caps.

Codex adversarial review during /ship caught that the pricing fix alone wouldn't close the bug class — it would just shift the failure from BudgetExhausted no_pricing to AIConfigError 'missing a provider prefix' mid-judge. The shipped version extends the gateway resolver to match.

Thanks @garrytan-agents — your bug report + first-pass diff drove the whole investigation. Credit lives in the v0.41.21.0 CHANGELOG entry and PR #1562 body.

@garrytan garrytan closed this May 27, 2026
garrytan added a commit that referenced this pull request May 27, 2026
…#1562)

* feat(core): add splitProviderModelId centralizer for pricing-side parsing

New pure helper in src/core/model-id.ts that splits provider:model,
provider/model, and bare model strings into a {provider, model} pair.
Defensive contract: null/undefined/empty/whitespace returns
{provider: null, model: ''}.

Will be wired into the 5 pricing/budget sites in the next commit.
Named splitProviderModelId (not parseModelId) to avoid the in-project
collision with the gateway-side src/core/ai/model-resolver.ts:parseModelId
which has a different bare-name contract.

Pinned by 16 cases in test/model-id.test.ts covering all separator
forms plus defensive + edge inputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(gateway): accept slash-form provider id in model-resolver

src/core/ai/model-resolver.ts:parseModelId now accepts both
provider:model (colon) and provider/model (slash) forms. Colon wins
when both separators present so OpenRouter nested ids like
openrouter:anthropic/claude-sonnet-4.6 route as
{providerId: 'openrouter', modelId: 'anthropic/claude-sonnet-4.6'}.

Pre-fix: every gateway entry point (chat / embed / rerank) threw
AIConfigError 'missing a provider prefix' on slash form ids. That
meant CLI users running

  gbrain brainstorm --judge-model anthropic/claude-sonnet-4-6

would still fail mid-judge with AIConfigError even after pricing
was relaxed to accept slash form. Closes the end-to-end bug class.

Bare names without ANY separator still throw — gateway routing
always needs an explicit provider. Existing tests pinning that
throw (test/ai/capabilities.test.ts:43) stay green.

Pinned by 10 cases in test/ai/model-resolver-slash.test.ts
including a resolveRecipe round-trip that slash and colon forms
land on the same recipe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor: route 5 pricing/config sites through splitProviderModelId

Five sites had inline ':'-only provider-prefix splits that silently
missed slash-form ids. Centralizing through splitProviderModelId
closes the bug class:

  - src/core/anthropic-pricing.ts:estimateMaxCostUsd
  - src/core/budget/budget-tracker.ts:lookupPricing (closes the
    headline BudgetExhausted no_pricing failure on --max-cost +
    slash-form --judge-model)
  - src/core/eval-contradictions/cost-tracker.ts:pricingFor
    (legacy silent-Haiku fallback preserved per plan D9)
  - src/core/minions/batch-projection.ts (deleted bareModel inline
    helper; inlined splitProviderModelId at 2 call sites)
  - src/core/model-config.ts:isAnthropicProvider (silently fixed
    v0.31.12 subagent-guard bypass for slash-form Anthropic ids)

Test gates land together so any bisect step is green:

  - NEW test/anthropic-pricing.test.ts (7 cases including structural
    regression guard: every ANTHROPIC_PRICING key reachable via all
    three forms)
  - NEW test/eval-contradictions/cost-tracker-slash.test.ts (6 cases
    including legacy-Haiku-fallback pin)
  - EXTENDED test/batch-projection.test.ts (slash + double-separator
    cases)
  - EXTENDED test/model-config.serial.test.ts (2 slash-form
    isAnthropicProvider cases)
  - EXTENDED test/core/budget/budget-tracker.test.ts (2 slash + colon
    reserve() cases)

Behavior changes for slash-prefix ids only; bare and colon ids
unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(brainstorm): scale judge maxTokens with per-model output cap

Replace the hard-coded maxTokens: 4000 with computeJudgeMaxTokens
that scales with idea count and respects each model's actual output
cap.

Pre-fix: any judge call with 36+ ideas produced ~100 tokens/idea of
JSON that got truncated mid-output. parseJudgeJSON threw, orchestrator
surfaced judge_failed: true, all ideas saved unscored. Verified
failure mode on 72-idea fixture: 0/72 passing before, 39/72 after.

Formula: min(modelCap, max(LEGACY_MIN_MAX_TOKENS, ideaCount*150+500))

Named constants extracted at top of judges.ts:

  - TOKEN_BUDGET_PER_IDEA = 150 (1.5x headroom over observed ~100/idea)
  - TOKEN_BUDGET_ENVELOPE = 500 (JSON wrapper)
  - LEGACY_MIN_MAX_TOKENS = 4000 (pre-fix floor preserved for 1-idea)
  - MAX_OUTPUT_TOKENS_CEIL = 32_000 (fallback when model unknown)
  - ANTHROPIC_OUTPUT_CAPS (per-model: Opus 4.7 = 32K, Sonnet 4.6 /
    Haiku 4.5 = 64K, legacy 3.5 = 8K)

When the caller passes no modelOverride, the cap routes through the
gateway's actual configured chat model via getChatModel() so the
formula matches what chat() will use, not whatever the override
hints at. Pre-fix the undefined-override case fell back to 32K even
if the configured default was a legacy 8K model.

Pinned by 16 cases in test/brainstorm/judges-maxtokens.test.ts:
formula at 1/10/36/96/200/300 ideas, per-model cap binding (Haiku 3.5
8K, Opus 4.7 32K, Sonnet 4.6 64K), and integration via runJudge with
a stubbed chatFn that captures ChatOpts.maxTokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v0.41.21.0)

Brainstorm judge fix-wave: closes #1540 end-to-end. parseModelId
centralizer + gateway resolver slash-form acceptance + per-model
maxTokens cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update project documentation for v0.41.21.0

CLAUDE.md: add v0.41.21.0 annotations to brainstorm/judges + model-config
entries; add new key-files entry for src/core/model-id.ts (the shared
splitProviderModelId centralizer) and src/core/ai/model-resolver.ts
slash-form extension.

README.md: add user-facing callout for the brainstorm judge_failed +
slash-form pricing fix, mirroring the v0.41.19.0 callout shape.

llms-full.txt: regenerated to absorb the CLAUDE.md + README changes
(passes test/build-llms.test.ts drift guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 27, 2026
… fixes)

Master shipped v0.41.22.1 (brainstorm/lsd judge fixes, closes #1540).
Trio resolved with v0.41.23.0 staying at top, v0.41.22.1 entry
preserved below.

No source-code conflicts — only the standard VERSION + package.json +
CHANGELOG trio. Verified: typecheck clean, bun run verify 28/28.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572)
  v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571)
  v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566)
  v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543)
  v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541)
  v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562)
  v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542)
  v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545)
  v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544)
  feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537)
  v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521)
  v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519)
  v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510)
  v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants