Skip to content

v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable'#1211

Merged
garrytan merged 7 commits into
masterfrom
garrytan/santa-fe
May 20, 2026
Merged

v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable'#1211
garrytan merged 7 commits into
masterfrom
garrytan/santa-fe

Conversation

@garrytan

@garrytan garrytan commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

  • v0.37.0.1 rebased on top of master's v0.37.0.0. Originally cut as v0.36.1.1, but master shipped v0.36.1.1 + v0.36.2.0 + v0.36.3.0 + autonomous-remediation through v0.37.0.0 in parallel with this hotfix. Renumbered: VERSION 0.36.1.1 → 0.37.0.1, migration v74 → v79, name takes_unresolvable_quality_v0_36_1_1takes_unresolvable_quality_v0_37_0_1.
  • Migration v79 widens both the table-level takes_resolution_consistency CHECK and the column-level takes_resolved_quality_values CHECK to accept quality='unresolvable' AND outcome=NULL as the 4th valid resolution state.
  • TakesScorecard gains optional sibling fields unresolvable_count? + unresolvable_rate?. The existing resolved field deliberately keeps its 3-state meaning so historical scorecards stay valid for apples-to-apples comparison.
  • gbrain takes scorecard now surfaces unresolvable. Previous behavior early-returned with "No resolved bets yet" when resolved=0, hiding the new sibling fields even on a brain with only unresolvable verdicts (the spec's whole production case). Now the gate is resolved=0 AND unresolvable_count=0; the human CLI renders both counts + rate when either is non-zero, with a >30% warn pointing at retrieval coverage, not prediction accuracy.
  • Schema-drift test isolation + tightened safety gate: test/e2e/schema-drift.test.ts resets the public schema in beforeAll so the parity gate doesn't depend on caller bootstrap state. Gate logic tightened after a second codex pass: looksLikeTestDb && (isLocalhost || GBRAIN_TEST_DB=1). The test-shaped db-name pattern (gbrain_test, *_test, test_*, *_e2e) is the hard floor — even with the env-var opt-in, a production-named DB is refused with a clear message. The env-var only relaxes the localhost requirement for CI environments where the host is a service name (e.g. postgres).

What changes for users

  • gbrain takes resolve <slug> --row N --quality unresolvable [--evidence "..."] works through the CLI.
  • engine.resolveTake({ quality: 'unresolvable', resolvedBy: '...' }) works through the SDK.
  • gbrain takes scorecard now displays unresolvable + unresolvable_rate alongside partial_rate when either is populated. A brain with only unresolvable verdicts no longer looks empty.
  • engine.getScorecard(...) returns two new optional sibling fields. The denominator for unresolvable_rate is resolved + unresolvable_count; NULL when both are 0.
  • gbrain upgrade runs migration v79 automatically. Existing rows with (NULL, NULL), ('correct', true), ('incorrect', false), ('partial', NULL) shapes survive the widened CHECKs unchanged.

Backward compatibility

  • All four pre-v79 legal (resolved_quality, resolved_outcome) pairs remain legal.
  • accuracy, brier, partial_rate formulas unchanged. resolved count semantics preserved (3-state).
  • TakesScorecard.unresolvable_count / unresolvable_rate are optional on the TS interface — downstream SDK fixtures that omit them compile cleanly. finalizeScorecard always emits them.

Adversarial review (Codex, two passes)

Pass 1: 5 findings; 2 fixed inline (P0 unguarded DROP SCHEMA, SDK interface compat).

Pass 2 (re-review after fixes): codex confirmed migration + engine widening correct, but caught two residual issues:

  • P0 — Safety gate still bypassable: GBRAIN_TEST_DB=1 overrode both host AND db-name checks. Tightened: db-name pattern is now the hard floor; env-var only relaxes localhost requirement.
  • P2gbrain takes scorecard early-returned on resolved=0, hiding the new unresolvable fields. Fixed: gate now considers both counts.

3 findings deferred to v0.37.x follow-up (pre-existing issues, not introduced by this hotfix):

Test plan

  • Unit suite full pass — 8014 tests, 0 failures
  • test/takes-resolution.test.ts — R1 unresolvable round-trip, R4 unresolvable+outcome rejection, R5 sibling-field math
  • test/migrate.test.ts — v79 structural assertions + PGLite E2E suite covering all 5 valid + 2 invalid quality/outcome shapes
  • E2E real Postgres — 84 pass, 0 fail (mechanical + schema-drift)
  • Safety gate verified 3 ways: localhost+gbrain_test resets; localhost+production_data+GBRAIN_TEST_DB=1 REFUSES with clear message; localhost+foo_e2e+GBRAIN_TEST_DB=1 resets
  • bun run typecheck clean

3-line audit

VERSION:     0.37.0.1
package.json: 0.37.0.1
## [0.37.0.1] - 2026-05-19

Commits

  • 6e0964fc — original v0.36.1.1 implementation (CHECK widen + R1-R5 + scorecard sibling fields)
  • b5435d59 — merge from master + rebump to v0.37.0.1, renumber migration v74 → v79
  • 5988790b — schema-drift test isolation (DROP SCHEMA public CASCADE before initSchema)
  • 812319c5 — codex pass 1 fix: DROP SCHEMA safety gate v1 + TakesScorecard fields optional
  • 6683507a — codex pass 2 fix: safety gate v2 (db-name as hard floor) + scorecard CLI surfaces unresolvable

🤖 Generated with Claude Code

…vable'

Unblocks production grading scripts that write the judge's 4th verdict
type. Before this fix, every quality='unresolvable' INSERT/UPDATE hit
a CHECK violation — 0 of 34 writes landed in a recent prod run.

Migration v74 widens BOTH:
  - takes_resolution_consistency (table-level CHECK) — admits the
    ('unresolvable', NULL) pair alongside the existing 4 legal shapes
  - resolved_quality column-level CHECK — drops the auto-generated
    name from v37, re-adds as takes_resolved_quality_values with the
    4-state enum

Backward compatible. Existing rows with quality IN (NULL, 'correct',
'incorrect', 'partial') all satisfy the new CHECKs unchanged.

TakesScorecard gains sibling fields unresolvable_count + unresolvable_rate;
the existing `resolved` field deliberately keeps its 3-state meaning
so historical scorecards compare apples-to-apples (T1c sibling-field
design from the eng review).

Pinned by:
  - test/takes-resolution.test.ts — R1-R5 round-trip
  - test/migrate.test.ts — v74 structural assertions + PGLite E2E
    suite exercising all valid + invalid (quality, outcome) shapes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan and others added 2 commits May 19, 2026 18:55
Master shipped v0.36.1.1 + v0.36.2.0 + v0.36.3.0 + autonomous-remediation
wave through v0.37.0.0 in parallel with this hotfix. Renumbered:
  - VERSION: 0.36.1.1 → 0.37.0.1 (next-MICRO above master)
  - Migration v74 → v79 (master claimed v68-v78)
  - Migration name: takes_unresolvable_quality_v0_36_1_1 → takes_unresolvable_quality_v0_37_0_1
  - All test references + CLAUDE.md annotations updated

CHECK widening unchanged: master's takes_resolution_consistency
is still narrow (3-state), so this hotfix remains needed.
The full feature wave (falsifiability + per-category calibration)
follows separately as the next minor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rom caller bootstrap state

Previously the test trusted caller-provided DATABASE_URL to point at a fresh
database. CLAUDE.md's E2E lifecycle prescribes 'gbrain doctor --json' as the
bootstrap step (needed by oauth-related tests for table creation), but doctor
configures the gateway and bakes the configured embedding model into
content_chunks.model DEFAULT during the initial CREATE TABLE.

On re-run, CREATE TABLE IF NOT EXISTS is a no-op and the bootstrapped default
sticks. PGLite (always fresh-in-memory) gets the unconfigured-gateway fallback
'text-embedding-3-large'. The test reported phantom drift:
  pg.default="'zembed-1'::text"  pglite.default="'text-embedding-3-large'::text"

Fix: DROP SCHEMA public CASCADE + CREATE SCHEMA public before pg.initSchema.
Resets every table/index/sequence/constraint added by prior tooling. The PGLite
side is already fresh-per-test by construction.

Verified order-independent:
  - Fresh DB → 6/0 pass
  - After 'gbrain doctor' bootstrap → 6/0 pass
  - Full E2E suite (mechanical + schema-drift) → 84/0 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.36.1.1: takes_resolution_consistency CHECK accepts 'unresolvable' v0.37.0.1: takes_resolution_consistency CHECK accepts 'unresolvable' May 20, 2026
garrytan and others added 3 commits May 19, 2026 19:34
…erface

Two findings from Codex adversarial review on the v0.37.0.1 hotfix:

1. **DROP SCHEMA safety gate (P0).** test/e2e/schema-drift.test.ts had an
   unguarded DROP SCHEMA public CASCADE. A developer running the E2E with
   DATABASE_URL pointing at a real brain or staging DB would lose the entire
   public schema. The fix: triple-check before destruction.
   - Parse the DATABASE_URL hostname + db name
   - Allow reset only when: explicit GBRAIN_TEST_DB=1 OR (localhost host AND
     test-shaped db name like gbrain_test, *_test, test_*, *_e2e)
   - Refuse otherwise with a loud paste-ready warning
   - The test still proceeds (the parity check is the fail-safe — if the
     caller already had a fresh DB, parity passes; if not, parity fails
     LOUDLY instead of nuking their data)

   Verified all three branches: localhost+gbrain_test resets (6/0 pass);
   localhost+production_brain refuses + warns (6/0 pass against pre-existing
   schema); GBRAIN_TEST_DB=1 override on production_brain name allows reset.

2. **TakesScorecard interface compat.** Making `unresolvable_count` +
   `unresolvable_rate` required fields on the public TakesScorecard
   interface broke downstream SDK consumers who construct scorecard
   fixtures (gbrain-evals, custom engines). The hotfix shouldn't impose
   a compile-break on hotfix users.

   Fix: make both fields optional (`?: number` / `?: number | null`).
   `finalizeScorecard` still always populates them, so all internal code
   sees the real values. External fixtures that omit them compile cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orecard CLI

Second codex adversarial pass on v0.37.0.1 surfaced two residual findings.

**P0 — Safety gate still bypassable.** First-pass safety gate used
`explicitOptIn || (isLocalhost && looksLikeTestDb)` — meaning
`GBRAIN_TEST_DB=1` bypassed BOTH the host check AND the db-name check.
Someone running the E2E with that env set against a production DATABASE_URL
would still nuke their schema. Codex re-flagged it as P0.

Tightened logic: `looksLikeTestDb && (isLocalhost || ciOptIn)`. The db-name
pattern is now the hard floor — `gbrain_test`, `*_test`, `test_*`, `*_e2e`.
GBRAIN_TEST_DB=1 only relaxes the localhost requirement (for CI service-name
hosts). Setting the env on a DATABASE_URL pointing at `production_data` is
explicitly refused with a paste-ready message naming the failed check.

Verified 3 ways:
  - gbrain_test + localhost → resets (6/0 pass)
  - production_data + GBRAIN_TEST_DB=1 → REFUSES with clear message
  - foo_e2e + GBRAIN_TEST_DB=1 → resets (test-shaped name passes)

**P2 — gbrain takes scorecard hides the unresolvable signal.** Early-return
on `resolved === 0` was triggered before the new sibling fields rendered.
A brain with only `quality='unresolvable'` verdicts — the spec's whole
production case — printed "No resolved bets yet" and exited. The
unresolvable_rate field was unreachable from the human CLI unless the user
knew to pass `--json`.

Fix: gate the early-return on `resolved === 0 AND unresolvable_count === 0`.
Render `unresolvable` count + `unresolvable_rate` alongside `partial_rate`
when present. Threshold warn at 30% (mirrors PARTIAL_RATE_WARNING_THRESHOLD)
pointing at retrieval coverage, not prediction accuracy — the actionable
read for high-unresolvable brains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRs #1214 and #1215 both claim v0.37.1.0; bumping past to the next free
slot. Migration v79 renamed `takes_unresolvable_quality_v0_37_0_1` →
`takes_unresolvable_quality_v0_37_2_0`. VERSION + package.json +
CHANGELOG + llms bundles + inline doc references all swept.
@garrytan garrytan changed the title v0.37.0.1: takes_resolution_consistency CHECK accepts 'unresolvable' v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' May 20, 2026
Master shipped v0.37.1.0 (brainstorm/lsd, PR #1214) claiming migration v79
for `pages_last_retrieved_at`. Renumbered this hotfix's migration v79→v80
to land cleanly on top. v0.37.2.0 remains the version (v0.37.1.0 + v0.37.0.0
already taken on master).

Conflict resolution:
- VERSION/package.json: kept 0.37.2.0 (ours, higher semver)
- CHANGELOG.md: both entries preserved, ours on top
- migrate.ts: kept master's v79 unchanged, added ours as v80
- All references in test/migrate.test.ts, CHANGELOG, CLAUDE.md, spec doc,
  engine.ts updated from v79→v80

Verified: typecheck clean, migrate.test.ts passes (130/130, both v79 and
v80 migrations apply in order).
@garrytan garrytan merged commit 9a4ae09 into master May 20, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 20, 2026
Master landed PR #1211 (takes_resolution_consistency CHECK accepts
'unresolvable') as v0.37.2.0. Wave's v0.37.3.0 claim still sits one
slot above. CHANGELOG preserves both entries with master's v0.37.2.0
slotting in as the second-most-recent release. llms-full.txt
regenerated against merged docs.

Verify gate + 114 brain-first + adjacent unit/E2E cases green
post-merge under CI-simulated GBRAIN_HOME=/tmp/empty-... env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 21, 2026
…(cherry-pick #1210) (#1246)

* feat(ai): add default_headers / resolveDefaultHeaders seam to Recipe

Generalizes per-recipe header attachment so attribution headers (OpenRouter's
HTTP-Referer + X-OpenRouter-Title) ride alongside Bearer auth on every
openai-compatible touchpoint. Two safety guards fire at applyResolveAuth time:
declaring both default_headers AND resolveDefaultHeaders throws AIConfigError
(mutual exclusion); a default header whose key shadows the resolved auth
header (Authorization, the resolver's custom header) also throws.

Reranker HTTP path at gateway.ts:2281 now merges both Authorization Bearer AND
auth.headers (where default_headers flow) into the request Headers map.
Pre-fix the ternary picked one or the other; default_headers would have been
silently dropped on the manual rerank path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add OpenRouter provider recipe

One key, many hosted models. Configures openrouter:<provider>/<model> for
chat (GPT-5.2 family, Claude 4.5/4.6/4.7, Gemini 3 Flash Preview, DeepSeek)
and embedding (OpenAI text-embedding-3-small with Matryoshka dims_options).
max_batch_tokens=300_000 (OpenAI's aggregate per-request token cap, not the
per-input 8192 the original PR conflated).

resolveDefaultHeaders returns HTTP-Referer + X-OpenRouter-Title + X-Title
(back-compat alias) so traffic is attributed to gbrain on OR's leaderboard.
Forks override via OPENROUTER_REFERER / OPENROUTER_TITLE env vars.

supports_subagent_loop: false is informational — gbrain's subagent infra is
hard-pinned to Anthropic-direct via isAnthropicProvider() upstream regardless
of this flag. Filed as TODO to verify tool_use_id stability through OR.

Cherry-picked from PR #1210. Contributed by @davemorin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): export buildGatewayConfig + thread OPENROUTER_BASE_URL

Exports buildGatewayConfig for unit-test access. Adds one-line passthrough
for OPENROUTER_BASE_URL matching the existing LITELLM/OLLAMA/LMSTUDIO/
LLAMA_SERVER pattern so users can point at a self-hosted OR-compatible
proxy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ai): cover OpenRouter recipe + default_headers seam + wire-level headers

Four test additions:

- test/ai/recipe-openrouter.test.ts (11 cases) — recipe shape, Matryoshka
  dims_options, max_batch_tokens=300K, arbitrary-ID acceptance via
  assertTouchpoint, defaultResolveAuth happy/error, resolveDefaultHeaders
  defaults + fork-override path, setup_hint coverage. Shape regression on
  every chat/embedding model ID (catches typos without pinning the dynamic
  catalog).

- test/ai/recipes-existing-regression.test.ts (+6 cases) — IRON RULE
  preserved; adds default_headers contract: Bearer+defaults returns both
  apiKey AND headers, custom-header+defaults merges with resolver winning,
  mutual-exclusion guard, Authorization-shadow guard, custom-auth-shadow
  guard, cross-touchpoint parity for all four (embedding/expansion/chat/
  reranker).

- test/ai/header-transport.test.ts (3 cases) — proves headers actually reach
  the wire. Synthetic recipes with resolveOpenAICompatConfig fetch wrappers
  capture outgoing Headers on embed/chat/rerank. Asserts Authorization +
  HTTP-Referer + X-OpenRouter-Title + X-Title all present. Codex flagged
  the return-shape-only coverage gap during plan review.

- test/ai/build-gateway-config.test.ts (7 cases) — 5-way env-baseURL
  passthrough sweep through the now-exported buildGatewayConfig. Uses
  withEnv() from test/helpers/with-env.ts for isolation compliance. Mops
  up pre-existing untested drift on LLAMA_SERVER/OLLAMA/LMSTUDIO/LITELLM
  in the same pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add OpenRouter to embedding-providers + bump recipe count

15 -> 16 recipes. Adds OpenRouter row to the TL;DR table, a setup section
covering the value-prop (one key, many hosted models), env-var overrides
(OPENROUTER_BASE_URL, OPENROUTER_REFERER, OPENROUTER_TITLE), the subagent-
loop limitation (isAnthropicProvider() gate), and a "One key for many
hosted models" bullet under the decision tree. README updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump v0.37.2.0 version refs to v0.37.4.0 across in-tree comments

v0.37.2.0 was claimed by master's takes_resolution_consistency hotfix
(#1211) before this branch could land. This commit re-stamps the source
comments that reference the OpenRouter recipe / default_headers seam to
v0.37.4.0 so the in-tree version markers match the actual landing version.

No behavior change — comments only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.37.4.0)

One key, many hosted models — OpenRouter recipe lands. Cherry-picked from
#1210 (@davemorin), with codex review corrections folded in:
- recipe count math (16 not 17)
- current OR attribution header name (X-OpenRouter-Title, X-Title back-compat)
- max_batch_tokens semantic (300K aggregate per-request, not 8192 per-input)
- Matryoshka dims_options for text-embedding-3-small
- auth-shadow guard at applyResolveAuth

Adds the generic Recipe.default_headers / resolveDefaultHeaders seam so
attribution headers ride alongside Bearer auth. Future Together/Groq
adoption tracked in TODOS.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebump to v0.37.6.0 (queue moved past v0.37.4/v0.37.5)

VERSION + package.json + CHANGELOG header + CLAUDE.md + TODOS.md + in-tree
source comments + llms regen. No code-behavior change.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.38.2.0 fix(doctor): bounded frontmatter scan + partial-state surfacing (supersedes garrytan#1287) (garrytan#1297)
  v0.38.1.0 feat(agents): provider-agnostic subagent loop + remote MCP dispatch + budget meter (garrytan#1289)
  v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract (garrytan#1275)
  v0.37.11.0: fresh-install PGLite embedding setup fix wave (garrytan#1286)
  v0.37.10.0 feat(init): env-detection + interactive picker + preflight invariants (garrytan#1278)
  v0.37.9.0 fix(frontmatter): canonical-style normalization for tag arrays (garrytan#1252)
  v0.37.8.0 feat: voyage-code-3 discoverability + reindex-code cost-preview fix (garrytan#1267)
  v0.37.7.0 fix wave: federated brains + autopilot safety + OAuth confidential clients (garrytan#1253)
  v0.37.6.0 feat(ai): OpenRouter recipe + generic default_headers seam (cherry-pick garrytan#1210) (garrytan#1246)
  v0.37.5.0 fix(markdown): YAML-aware NESTED_QUOTES validator (stops flagging valid YAML) (garrytan#1229)
  feat: pgGraph-inspired CI scaffolding wave (v0.37.4.0) (garrytan#1228)
  v0.37.3.0 feat: skill_brain_first doctor check + auto-fix + declarative opt-out (supersedes garrytan#1206) (garrytan#1215)
  v0.37.2.0: takes_resolution_consistency CHECK accepts 'unresolvable' (garrytan#1211)
  v0.37.1.0 feat: brainstorm + lsd — bisociation idea generator grounded in your own brain (garrytan#1214)
  v0.37.0.0 feat(skillpack): registry cathedral — third-party publish + install + 10/10 quality bar (garrytan#1208)
  v0.36.6.0 feat: cross-modal search wave (text↔image + unified column + LLM intent) (garrytan#1165)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant