v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' by garrytan · Pull Request #1211 · garrytan/gbrain

garrytan · 2026-05-20T01:26:56Z

Summary

v0.37.0.1 rebased on top of master's v0.37.0.0. Originally cut as v0.36.1.1, but master shipped v0.36.1.1 + v0.36.2.0 + v0.36.3.0 + autonomous-remediation through v0.37.0.0 in parallel with this hotfix. Renumbered: VERSION 0.36.1.1 → 0.37.0.1, migration v74 → v79, name takes_unresolvable_quality_v0_36_1_1 → takes_unresolvable_quality_v0_37_0_1.
Migration v79 widens both the table-level takes_resolution_consistency CHECK and the column-level takes_resolved_quality_values CHECK to accept quality='unresolvable' AND outcome=NULL as the 4th valid resolution state.
TakesScorecard gains optional sibling fields unresolvable_count? + unresolvable_rate?. The existing resolved field deliberately keeps its 3-state meaning so historical scorecards stay valid for apples-to-apples comparison.
gbrain takes scorecard now surfaces unresolvable. Previous behavior early-returned with "No resolved bets yet" when resolved=0, hiding the new sibling fields even on a brain with only unresolvable verdicts (the spec's whole production case). Now the gate is resolved=0 AND unresolvable_count=0; the human CLI renders both counts + rate when either is non-zero, with a >30% warn pointing at retrieval coverage, not prediction accuracy.
Schema-drift test isolation + tightened safety gate: test/e2e/schema-drift.test.ts resets the public schema in beforeAll so the parity gate doesn't depend on caller bootstrap state. Gate logic tightened after a second codex pass: looksLikeTestDb && (isLocalhost || GBRAIN_TEST_DB=1). The test-shaped db-name pattern (gbrain_test, *_test, test_*, *_e2e) is the hard floor — even with the env-var opt-in, a production-named DB is refused with a clear message. The env-var only relaxes the localhost requirement for CI environments where the host is a service name (e.g. postgres).

What changes for users

gbrain takes resolve <slug> --row N --quality unresolvable [--evidence "..."] works through the CLI.
engine.resolveTake({ quality: 'unresolvable', resolvedBy: '...' }) works through the SDK.
gbrain takes scorecard now displays unresolvable + unresolvable_rate alongside partial_rate when either is populated. A brain with only unresolvable verdicts no longer looks empty.
engine.getScorecard(...) returns two new optional sibling fields. The denominator for unresolvable_rate is resolved + unresolvable_count; NULL when both are 0.
gbrain upgrade runs migration v79 automatically. Existing rows with (NULL, NULL), ('correct', true), ('incorrect', false), ('partial', NULL) shapes survive the widened CHECKs unchanged.

Backward compatibility

All four pre-v79 legal (resolved_quality, resolved_outcome) pairs remain legal.
accuracy, brier, partial_rate formulas unchanged. resolved count semantics preserved (3-state).
TakesScorecard.unresolvable_count / unresolvable_rate are optional on the TS interface — downstream SDK fixtures that omit them compile cleanly. finalizeScorecard always emits them.

Adversarial review (Codex, two passes)

Pass 1: 5 findings; 2 fixed inline (P0 unguarded DROP SCHEMA, SDK interface compat).

Pass 2 (re-review after fixes): codex confirmed migration + engine widening correct, but caught two residual issues:

P0 — Safety gate still bypassable: GBRAIN_TEST_DB=1 overrode both host AND db-name checks. Tightened: db-name pattern is now the hard floor; env-var only relaxes localhost requirement.
P2 — gbrain takes scorecard early-returned on resolved=0, hiding the new unresolvable fields. Fixed: gate now considers both counts.

3 findings deferred to v0.37.x follow-up (pre-existing issues, not introduced by this hotfix):

Codex feat: GBrain v0.2.0 — incremental sync, file storage, install skill #2 — extract-takes.ts re-import path drops all resolution fields (affects all 4 quality states, not just unresolvable)
Codex docs: expand brain schema with database architecture and OSS smoothing #4 — resolved_quality='unresolvable' AND resolved_at IS NULL is internally inconsistent across getScorecard (counts it) vs listTakes(resolved:false) (returns it). Pre-existing: same issue for all quality states. Fix requires tightening the CHECK to require resolved_at IS NOT NULL when resolved_quality IS NOT NULL.

Test plan

Unit suite full pass — 8014 tests, 0 failures
test/takes-resolution.test.ts — R1 unresolvable round-trip, R4 unresolvable+outcome rejection, R5 sibling-field math
test/migrate.test.ts — v79 structural assertions + PGLite E2E suite covering all 5 valid + 2 invalid quality/outcome shapes
E2E real Postgres — 84 pass, 0 fail (mechanical + schema-drift)
Safety gate verified 3 ways: localhost+gbrain_test resets; localhost+production_data+GBRAIN_TEST_DB=1 REFUSES with clear message; localhost+foo_e2e+GBRAIN_TEST_DB=1 resets
bun run typecheck clean

3-line audit

VERSION:     0.37.0.1
package.json: 0.37.0.1
## [0.37.0.1] - 2026-05-19

Commits

6e0964fc — original v0.36.1.1 implementation (CHECK widen + R1-R5 + scorecard sibling fields)
b5435d59 — merge from master + rebump to v0.37.0.1, renumber migration v74 → v79
5988790b — schema-drift test isolation (DROP SCHEMA public CASCADE before initSchema)
812319c5 — codex pass 1 fix: DROP SCHEMA safety gate v1 + TakesScorecard fields optional
6683507a — codex pass 2 fix: safety gate v2 (db-name as hard floor) + scorecard CLI surfaces unresolvable

🤖 Generated with Claude Code

…vable' Unblocks production grading scripts that write the judge's 4th verdict type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit a CHECK violation — 0 of 34 writes landed in a recent prod run. Migration v74 widens BOTH: - takes_resolution_consistency (table-level CHECK) — admits the ('unresolvable', NULL) pair alongside the existing 4 legal shapes - resolved_quality column-level CHECK — drops the auto-generated name from v37, re-adds as takes_resolved_quality_values with the 4-state enum Backward compatible. Existing rows with quality IN (NULL, 'correct', 'incorrect', 'partial') all satisfy the new CHECKs unchanged. TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate; the existing `resolved` field deliberately keeps its 3-state meaning so historical scorecards compare apples-to-apples (T1c sibling-field design from the eng review). Pinned by: - test/takes-resolution.test.ts — R1-R5 round-trip - test/migrate.test.ts — v74 structural assertions + PGLite E2E suite exercising all valid + invalid (quality, outcome) shapes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Master shipped v0.36.1.1 + v0.36.2.0 + v0.36.3.0 + autonomous-remediation wave through v0.37.0.0 in parallel with this hotfix. Renumbered: - VERSION: 0.36.1.1 → 0.37.0.1 (next-MICRO above master) - Migration v74 → v79 (master claimed v68-v78) - Migration name: takes_unresolvable_quality_v0_36_1_1 → takes_unresolvable_quality_v0_37_0_1 - All test references + CLAUDE.md annotations updated CHECK widening unchanged: master's takes_resolution_consistency is still narrow (3-state), so this hotfix remains needed. The full feature wave (falsifiability + per-category calibration) follows separately as the next minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rom caller bootstrap state Previously the test trusted caller-provided DATABASE_URL to point at a fresh database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the bootstrap step (needed by oauth-related tests for table creation), but doctor configures the gateway and bakes the configured embedding model into content_chunks.model DEFAULT during the initial CREATE TABLE. On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback 'text-embedding-3-large'. The test reported phantom drift: pg.default="'zembed-1'::text" pglite.default="'text-embedding-3-large'::text" Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema. Resets every table/index/sequence/constraint added by prior tooling. The PGLite side is already fresh-per-test by construction. Verified order-independent: - Fresh DB → 6/0 pass - After 'gbrain doctor' bootstrap → 6/0 pass - Full E2E suite (mechanical + schema-drift) → 84/0 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erface Two findings from Codex adversarial review on the v0.37.0.1 hotfix: 1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an unguarded DROP SCHEMA public CASCADE. A developer running the E2E with DATABASE_URL pointing at a real brain or staging DB would lose the entire public schema. The fix: triple-check before destruction. - Parse the DATABASE_URL hostname + db name - Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND test-shaped db name like gbrain_test, *_test, test_*, *_e2e) - Refuse otherwise with a loud paste-ready warning - The test still proceeds (the parity check is the fail-safe — if the caller already had a fresh DB, parity passes; if not, parity fails LOUDLY instead of nuking their data) Verified all three branches: localhost+gbrain_test resets (6/0 pass); localhost+production_brain refuses + warns (6/0 pass against pre-existing schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset. 2. **TakesScorecard interface compat.** Making `unresolvable_count` + `unresolvable_rate` required fields on the public TakesScorecard interface broke downstream SDK consumers who construct scorecard fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose a compile-break on hotfix users. Fix: make both fields optional (`?: number` / `?: number | null`). `finalizeScorecard` still always populates them, so all internal code sees the real values. External fixtures that omit them compile cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…orecard CLI Second codex adversarial pass on v0.37.0.1 surfaced two residual findings. **P0 — Safety gate still bypassable.** First-pass safety gate used `explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning `GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check. Someone running the E2E with that env set against a production DATABASE_URL would still nuke their schema. Codex re-flagged it as P0. Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`. GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name hosts). Setting the env on a DATABASE_URL pointing at `production_data` is explicitly refused with a paste-ready message naming the failed check. Verified 3 ways: - gbrain_test + localhost → resets (6/0 pass) - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes) **P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return on `resolved === 0` was triggered before the new sibling fields rendered. A brain with only `quality='unresolvable'` verdicts — the spec's whole production case — printed "No resolved bets yet" and exited. The unresolvable_rate field was unreachable from the human CLI unless the user knew to pass `--json`. Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`. Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate` when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD) pointing at retrieval coverage, not prediction accuracy — the actionable read for high-unresolvable brains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` → `takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json + CHANGELOG + llms bundles + inline doc references all swept.

Master shipped v0.37.1.0 (brainstorm/lsd, PR #1214) claiming migration v79 for `pages_last_retrieved_at`. Renumbered this hotfix's migration v79→v80 to land cleanly on top. v0.37.2.0 remains the version (v0.37.1.0 + v0.37.0.0 already taken on master). Conflict resolution: - VERSION/package.json: kept 0.37.2.0 (ours, higher semver) - CHANGELOG.md: both entries preserved, ours on top - migrate.ts: kept master's v79 unchanged, added ours as v80 - All references in test/migrate.test.ts, CHANGELOG, CLAUDE.md, spec doc, engine.ts updated from v79→v80 Verified: typecheck clean, migrate.test.ts passes (130/130, both v79 and v80 migrations apply in order).

Master landed PR #1211 (takes_resolution_consistency CHECK accepts 'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one slot above. CHANGELOG preserves both entries with master's v0.37.2.0 slotting in as the second-most-recent release. llms-full.txt regenerated against merged docs. Verify gate + 114 brain-first + adjacent unit/E2E cases green post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@davemorin

…(cherry-pick #1210) (#1246) * feat(ai): add default_headers / resolveDefaultHeaders seam to Recipe Generalizes per-recipe header attachment so attribution headers (OpenRouter's HTTP-Referer + X-OpenRouter-Title) ride alongside Bearer auth on every openai-compatible touchpoint. Two safety guards fire at applyResolveAuth time: declaring both default_headers AND resolveDefaultHeaders throws AIConfigError (mutual exclusion); a default header whose key shadows the resolved auth header (Authorization, the resolver's custom header) also throws. Reranker HTTP path at gateway.ts:2281 now merges both Authorization Bearer AND auth.headers (where default_headers flow) into the request Headers map. Pre-fix the ternary picked one or the other; default_headers would have been silently dropped on the manual rerank path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add OpenRouter provider recipe One key, many hosted models. Configures openrouter:<provider>/<model> for chat (GPT-5.2 family, Claude 4.5/4.6/4.7, Gemini 3 Flash Preview, DeepSeek) and embedding (OpenAI text-embedding-3-small with Matryoshka dims_options). max_batch_tokens=300_000 (OpenAI's aggregate per-request token cap, not the per-input 8192 the original PR conflated). resolveDefaultHeaders returns HTTP-Referer + X-OpenRouter-Title + X-Title (back-compat alias) so traffic is attributed to gbrain on OR's leaderboard. Forks override via OPENROUTER_REFERER / OPENROUTER_TITLE env vars. supports_subagent_loop: false is informational — gbrain's subagent infra is hard-pinned to Anthropic-direct via isAnthropicProvider() upstream regardless of this flag. Filed as TODO to verify tool_use_id stability through OR. Cherry-picked from PR #1210. Contributed by @davemorin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): export buildGatewayConfig + thread OPENROUTER_BASE_URL Exports buildGatewayConfig for unit-test access. Adds one-line passthrough for OPENROUTER_BASE_URL matching the existing LITELLM/OLLAMA/LMSTUDIO/ LLAMA_SERVER pattern so users can point at a self-hosted OR-compatible proxy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai): cover OpenRouter recipe + default_headers seam + wire-level headers Four test additions: - test/ai/recipe-openrouter.test.ts (11 cases) — recipe shape, Matryoshka dims_options, max_batch_tokens=300K, arbitrary-ID acceptance via assertTouchpoint, defaultResolveAuth happy/error, resolveDefaultHeaders defaults + fork-override path, setup_hint coverage. Shape regression on every chat/embedding model ID (catches typos without pinning the dynamic catalog). - test/ai/recipes-existing-regression.test.ts (+6 cases) — IRON RULE preserved; adds default_headers contract: Bearer+defaults returns both apiKey AND headers, custom-header+defaults merges with resolver winning, mutual-exclusion guard, Authorization-shadow guard, custom-auth-shadow guard, cross-touchpoint parity for all four (embedding/expansion/chat/ reranker). - test/ai/header-transport.test.ts (3 cases) — proves headers actually reach the wire. Synthetic recipes with resolveOpenAICompatConfig fetch wrappers capture outgoing Headers on embed/chat/rerank. Asserts Authorization + HTTP-Referer + X-OpenRouter-Title + X-Title all present. Codex flagged the return-shape-only coverage gap during plan review. - test/ai/build-gateway-config.test.ts (7 cases) — 5-way env-baseURL passthrough sweep through the now-exported buildGatewayConfig. Uses withEnv() from test/helpers/with-env.ts for isolation compliance. Mops up pre-existing untested drift on LLAMA_SERVER/OLLAMA/LMSTUDIO/LITELLM in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add OpenRouter to embedding-providers + bump recipe count 15 -> 16 recipes. Adds OpenRouter row to the TL;DR table, a setup section covering the value-prop (one key, many hosted models), env-var overrides (OPENROUTER_BASE_URL, OPENROUTER_REFERER, OPENROUTER_TITLE), the subagent- loop limitation (isAnthropicProvider() gate), and a "One key for many hosted models" bullet under the decision tree. README updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump v0.37.2.0 version refs to v0.37.4.0 across in-tree comments v0.37.2.0 was claimed by master's takes_resolution_consistency hotfix (#1211) before this branch could land. This commit re-stamps the source comments that reference the OpenRouter recipe / default_headers seam to v0.37.4.0 so the in-tree version markers match the actual landing version. No behavior change — comments only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.37.4.0) One key, many hosted models — OpenRouter recipe lands. Cherry-picked from #1210 (@davemorin), with codex review corrections folded in: - recipe count math (16 not 17) - current OR attribution header name (X-OpenRouter-Title, X-Title back-compat) - max_batch_tokens semantic (300K aggregate per-request, not 8192 per-input) - Matryoshka dims_options for text-embedding-3-small - auth-shadow guard at applyResolveAuth Adds the generic Recipe.default_headers / resolveDefaultHeaders seam so attribution headers ride alongside Bearer auth. Future Together/Groq adoption tracked in TODOS.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: rebump to v0.37.6.0 (queue moved past v0.37.4/v0.37.5) VERSION + package.json + CHANGELOG header + CLAUDE.md + TODOS.md + in-tree source comments + llms regen. No code-behavior change. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* upstream/master: v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297) v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289) v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275) v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286) v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278) v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252) v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267) v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253) v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246) v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229) feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228) v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215) v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211) v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214) v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208) v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)

garrytan mentioned this pull request May 20, 2026

feat: calibration quality gate — falsifiability filter + category classification #1191

Closed

garrytan and others added 2 commits May 19, 2026 18:55

garrytan changed the title ~~v0.36.1.1: takes_resolution_consistency CHECK accepts 'unresolvable'~~ v0.37.0.1: takes_resolution_consistency CHECK accepts 'unresolvable' May 20, 2026

garrytan and others added 3 commits May 19, 2026 19:34

garrytan changed the title ~~v0.37.0.1: takes_resolution_consistency CHECK accepts 'unresolvable'~~ v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' May 20, 2026

garrytan merged commit 9a4ae09 into master May 20, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable'#1211

v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable'#1211
garrytan merged 7 commits into
masterfrom
garrytan/santa-fe

garrytan commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes for users

Backward compatibility

Adversarial review (Codex, two passes)

Test plan

3-line audit

Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 20, 2026 •

edited

Loading