v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable'#1211
Merged
Conversation
…vable'
Unblocks production grading scripts that write the judge's 4th verdict
type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit
a CHECK violation — 0 of 34 writes landed in a recent prod run.
Migration v74 widens BOTH:
- takes_resolution_consistency (table-level CHECK) — admits the
('unresolvable', NULL) pair alongside the existing 4 legal shapes
- resolved_quality column-level CHECK — drops the auto-generated
name from v37, re-adds as takes_resolved_quality_values with the
4-state enum
Backward compatible. Existing rows with quality IN (NULL, 'correct',
'incorrect', 'partial') all satisfy the new CHECKs unchanged.
TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate;
the existing `resolved` field deliberately keeps its 3-state meaning
so historical scorecards compare apples-to-apples (T1c sibling-field
design from the eng review).
Pinned by:
- test/takes-resolution.test.ts — R1-R5 round-trip
- test/migrate.test.ts — v74 structural assertions + PGLite E2E
suite exercising all valid + invalid (quality, outcome) shapes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master shipped v0.36.1.1 + v0.36.2.0 + v0.36.3.0 + autonomous-remediation wave through v0.37.0.0 in parallel with this hotfix. Renumbered: - VERSION: 0.36.1.1 → 0.37.0.1 (next-MICRO above master) - Migration v74 → v79 (master claimed v68-v78) - Migration name: takes_unresolvable_quality_v0_36_1_1 → takes_unresolvable_quality_v0_37_0_1 - All test references + CLAUDE.md annotations updated CHECK widening unchanged: master's takes_resolution_consistency is still narrow (3-state), so this hotfix remains needed. The full feature wave (falsifiability + per-category calibration) follows separately as the next minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rom caller bootstrap state Previously the test trusted caller-provided DATABASE_URL to point at a fresh database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the bootstrap step (needed by oauth-related tests for table creation), but doctor configures the gateway and bakes the configured embedding model into content_chunks.model DEFAULT during the initial CREATE TABLE. On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback 'text-embedding-3-large'. The test reported phantom drift: pg.default="'zembed-1'::text" pglite.default="'text-embedding-3-large'::text" Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema. Resets every table/index/sequence/constraint added by prior tooling. The PGLite side is already fresh-per-test by construction. Verified order-independent: - Fresh DB → 6/0 pass - After 'gbrain doctor' bootstrap → 6/0 pass - Full E2E suite (mechanical + schema-drift) → 84/0 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erface
Two findings from Codex adversarial review on the v0.37.0.1 hotfix:
1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an
unguarded DROP SCHEMA public CASCADE. A developer running the E2E with
DATABASE_URL pointing at a real brain or staging DB would lose the entire
public schema. The fix: triple-check before destruction.
- Parse the DATABASE_URL hostname + db name
- Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND
test-shaped db name like gbrain_test, *_test, test_*, *_e2e)
- Refuse otherwise with a loud paste-ready warning
- The test still proceeds (the parity check is the fail-safe — if the
caller already had a fresh DB, parity passes; if not, parity fails
LOUDLY instead of nuking their data)
Verified all three branches: localhost+gbrain_test resets (6/0 pass);
localhost+production_brain refuses + warns (6/0 pass against pre-existing
schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset.
2. **TakesScorecard interface compat.** Making `unresolvable_count` +
`unresolvable_rate` required fields on the public TakesScorecard
interface broke downstream SDK consumers who construct scorecard
fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose
a compile-break on hotfix users.
Fix: make both fields optional (`?: number` / `?: number | null`).
`finalizeScorecard` still always populates them, so all internal code
sees the real values. External fixtures that omit them compile cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orecard CLI Second codex adversarial pass on v0.37.0.1 surfaced two residual findings. **P0 — Safety gate still bypassable.** First-pass safety gate used `explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning `GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check. Someone running the E2E with that env set against a production DATABASE_URL would still nuke their schema. Codex re-flagged it as P0. Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`. GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name hosts). Setting the env on a DATABASE_URL pointing at `production_data` is explicitly refused with a paste-ready message naming the failed check. Verified 3 ways: - gbrain_test + localhost → resets (6/0 pass) - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes) **P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return on `resolved === 0` was triggered before the new sibling fields rendered. A brain with only `quality='unresolvable'` verdicts — the spec's whole production case — printed "No resolved bets yet" and exited. The unresolvable_rate field was unreachable from the human CLI unless the user knew to pass `--json`. Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`. Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate` when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD) pointing at retrieval coverage, not prediction accuracy — the actionable read for high-unresolvable brains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master shipped v0.37.1.0 (brainstorm/lsd, PR #1214) claiming migration v79 for `pages_last_retrieved_at`. Renumbered this hotfix's migration v79→v80 to land cleanly on top. v0.37.2.0 remains the version (v0.37.1.0 + v0.37.0.0 already taken on master). Conflict resolution: - VERSION/package.json: kept 0.37.2.0 (ours, higher semver) - CHANGELOG.md: both entries preserved, ours on top - migrate.ts: kept master's v79 unchanged, added ours as v80 - All references in test/migrate.test.ts, CHANGELOG, CLAUDE.md, spec doc, engine.ts updated from v79→v80 Verified: typecheck clean, migrate.test.ts passes (130/130, both v79 and v80 migrations apply in order).
garrytan
added a commit
that referenced
this pull request
May 20, 2026
Master landed PR #1211 (takes_resolution_consistency CHECK accepts 'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one slot above. CHANGELOG preserves both entries with master's v0.37.2.0 slotting in as the second-most-recent release. llms-full.txt regenerated against merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 21, 2026
…(cherry-pick #1210) (#1246) * feat(ai): add default_headers / resolveDefaultHeaders seam to Recipe Generalizes per-recipe header attachment so attribution headers (OpenRouter's HTTP-Referer + X-OpenRouter-Title) ride alongside Bearer auth on every openai-compatible touchpoint. Two safety guards fire at applyResolveAuth time: declaring both default_headers AND resolveDefaultHeaders throws AIConfigError (mutual exclusion); a default header whose key shadows the resolved auth header (Authorization, the resolver's custom header) also throws. Reranker HTTP path at gateway.ts:2281 now merges both Authorization Bearer AND auth.headers (where default_headers flow) into the request Headers map. Pre-fix the ternary picked one or the other; default_headers would have been silently dropped on the manual rerank path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add OpenRouter provider recipe One key, many hosted models. Configures openrouter:<provider>/<model> for chat (GPT-5.2 family, Claude 4.5/4.6/4.7, Gemini 3 Flash Preview, DeepSeek) and embedding (OpenAI text-embedding-3-small with Matryoshka dims_options). max_batch_tokens=300_000 (OpenAI's aggregate per-request token cap, not the per-input 8192 the original PR conflated). resolveDefaultHeaders returns HTTP-Referer + X-OpenRouter-Title + X-Title (back-compat alias) so traffic is attributed to gbrain on OR's leaderboard. Forks override via OPENROUTER_REFERER / OPENROUTER_TITLE env vars. supports_subagent_loop: false is informational — gbrain's subagent infra is hard-pinned to Anthropic-direct via isAnthropicProvider() upstream regardless of this flag. Filed as TODO to verify tool_use_id stability through OR. Cherry-picked from PR #1210. Contributed by @davemorin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): export buildGatewayConfig + thread OPENROUTER_BASE_URL Exports buildGatewayConfig for unit-test access. Adds one-line passthrough for OPENROUTER_BASE_URL matching the existing LITELLM/OLLAMA/LMSTUDIO/ LLAMA_SERVER pattern so users can point at a self-hosted OR-compatible proxy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai): cover OpenRouter recipe + default_headers seam + wire-level headers Four test additions: - test/ai/recipe-openrouter.test.ts (11 cases) — recipe shape, Matryoshka dims_options, max_batch_tokens=300K, arbitrary-ID acceptance via assertTouchpoint, defaultResolveAuth happy/error, resolveDefaultHeaders defaults + fork-override path, setup_hint coverage. Shape regression on every chat/embedding model ID (catches typos without pinning the dynamic catalog). - test/ai/recipes-existing-regression.test.ts (+6 cases) — IRON RULE preserved; adds default_headers contract: Bearer+defaults returns both apiKey AND headers, custom-header+defaults merges with resolver winning, mutual-exclusion guard, Authorization-shadow guard, custom-auth-shadow guard, cross-touchpoint parity for all four (embedding/expansion/chat/ reranker). - test/ai/header-transport.test.ts (3 cases) — proves headers actually reach the wire. Synthetic recipes with resolveOpenAICompatConfig fetch wrappers capture outgoing Headers on embed/chat/rerank. Asserts Authorization + HTTP-Referer + X-OpenRouter-Title + X-Title all present. Codex flagged the return-shape-only coverage gap during plan review. - test/ai/build-gateway-config.test.ts (7 cases) — 5-way env-baseURL passthrough sweep through the now-exported buildGatewayConfig. Uses withEnv() from test/helpers/with-env.ts for isolation compliance. Mops up pre-existing untested drift on LLAMA_SERVER/OLLAMA/LMSTUDIO/LITELLM in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add OpenRouter to embedding-providers + bump recipe count 15 -> 16 recipes. Adds OpenRouter row to the TL;DR table, a setup section covering the value-prop (one key, many hosted models), env-var overrides (OPENROUTER_BASE_URL, OPENROUTER_REFERER, OPENROUTER_TITLE), the subagent- loop limitation (isAnthropicProvider() gate), and a "One key for many hosted models" bullet under the decision tree. README updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump v0.37.2.0 version refs to v0.37.4.0 across in-tree comments v0.37.2.0 was claimed by master's takes_resolution_consistency hotfix (#1211) before this branch could land. This commit re-stamps the source comments that reference the OpenRouter recipe / default_headers seam to v0.37.4.0 so the in-tree version markers match the actual landing version. No behavior change — comments only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.37.4.0) One key, many hosted models — OpenRouter recipe lands. Cherry-picked from #1210 (@davemorin), with codex review corrections folded in: - recipe count math (16 not 17) - current OR attribution header name (X-OpenRouter-Title, X-Title back-compat) - max_batch_tokens semantic (300K aggregate per-request, not 8192 per-input) - Matryoshka dims_options for text-embedding-3-small - auth-shadow guard at applyResolveAuth Adds the generic Recipe.default_headers / resolveDefaultHeaders seam so attribution headers ride alongside Bearer auth. Future Together/Groq adoption tracked in TODOS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rebump to v0.37.6.0 (queue moved past v0.37.4/v0.37.5) VERSION + package.json + CHANGELOG header + CLAUDE.md + TODOS.md + in-tree source comments + llms regen. No code-behavior change. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297) v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289) v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275) v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286) v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278) v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252) v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267) v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253) v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246) v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229) feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228) v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215) v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211) v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214) v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208) v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
takes_unresolvable_quality_v0_36_1_1→takes_unresolvable_quality_v0_37_0_1.takes_resolution_consistencyCHECK and the column-leveltakes_resolved_quality_valuesCHECK to acceptquality='unresolvable' AND outcome=NULLas the 4th valid resolution state.TakesScorecardgains optional sibling fieldsunresolvable_count?+unresolvable_rate?. The existingresolvedfield deliberately keeps its 3-state meaning so historical scorecards stay valid for apples-to-apples comparison.gbrain takes scorecardnow surfaces unresolvable. Previous behavior early-returned with "No resolved bets yet" whenresolved=0, hiding the new sibling fields even on a brain with only unresolvable verdicts (the spec's whole production case). Now the gate isresolved=0 AND unresolvable_count=0; the human CLI renders both counts + rate when either is non-zero, with a >30% warn pointing at retrieval coverage, not prediction accuracy.test/e2e/schema-drift.test.tsresets thepublicschema inbeforeAllso the parity gate doesn't depend on caller bootstrap state. Gate logic tightened after a second codex pass:looksLikeTestDb && (isLocalhost || GBRAIN_TEST_DB=1). The test-shaped db-name pattern (gbrain_test,*_test,test_*,*_e2e) is the hard floor — even with the env-var opt-in, a production-named DB is refused with a clear message. The env-var only relaxes the localhost requirement for CI environments where the host is a service name (e.g.postgres).What changes for users
gbrain takes resolve <slug> --row N --quality unresolvable [--evidence "..."]works through the CLI.engine.resolveTake({ quality: 'unresolvable', resolvedBy: '...' })works through the SDK.gbrain takes scorecardnow displaysunresolvable+unresolvable_ratealongsidepartial_ratewhen either is populated. A brain with only unresolvable verdicts no longer looks empty.engine.getScorecard(...)returns two new optional sibling fields. The denominator forunresolvable_rateisresolved + unresolvable_count; NULL when both are 0.gbrain upgraderuns migration v79 automatically. Existing rows with(NULL, NULL),('correct', true),('incorrect', false),('partial', NULL)shapes survive the widened CHECKs unchanged.Backward compatibility
(resolved_quality, resolved_outcome)pairs remain legal.accuracy,brier,partial_rateformulas unchanged.resolvedcount semantics preserved (3-state).TakesScorecard.unresolvable_count/unresolvable_rateare optional on the TS interface — downstream SDK fixtures that omit them compile cleanly.finalizeScorecardalways emits them.Adversarial review (Codex, two passes)
Pass 1: 5 findings; 2 fixed inline (P0 unguarded DROP SCHEMA, SDK interface compat).
Pass 2 (re-review after fixes): codex confirmed migration + engine widening correct, but caught two residual issues:
GBRAIN_TEST_DB=1overrode both host AND db-name checks. Tightened: db-name pattern is now the hard floor; env-var only relaxes localhost requirement.gbrain takes scorecardearly-returned onresolved=0, hiding the new unresolvable fields. Fixed: gate now considers both counts.3 findings deferred to v0.37.x follow-up (pre-existing issues, not introduced by this hotfix):
extract-takes.tsre-import path drops all resolution fields (affects all 4 quality states, not just unresolvable)resolved_quality='unresolvable' AND resolved_at IS NULLis internally inconsistent acrossgetScorecard(counts it) vslistTakes(resolved:false)(returns it). Pre-existing: same issue for all quality states. Fix requires tightening the CHECK to requireresolved_at IS NOT NULLwhenresolved_quality IS NOT NULL.Test plan
test/takes-resolution.test.ts— R1 unresolvable round-trip, R4 unresolvable+outcome rejection, R5 sibling-field mathtest/migrate.test.ts— v79 structural assertions + PGLite E2E suite covering all 5 valid + 2 invalid quality/outcome shapesbun run typecheckclean3-line audit
Commits
6e0964fc— original v0.36.1.1 implementation (CHECK widen + R1-R5 + scorecard sibling fields)b5435d59— merge from master + rebump to v0.37.0.1, renumber migration v74 → v795988790b— schema-drift test isolation (DROP SCHEMA public CASCADE before initSchema)812319c5— codex pass 1 fix: DROP SCHEMA safety gate v1 + TakesScorecard fields optional6683507a— codex pass 2 fix: safety gate v2 (db-name as hard floor) + scorecard CLI surfaces unresolvable🤖 Generated with Claude Code