Skip to content

v0.31.4 feat: takes v2 — lessons from 100K-take production extraction#795

Merged
garrytan merged 11 commits intogarrytan:masterfrom
garrytan-agents:feat/takes-v2-lessons
May 10, 2026
Merged

v0.31.4 feat: takes v2 — lessons from 100K-take production extraction#795
garrytan merged 11 commits intogarrytan:masterfrom
garrytan-agents:feat/takes-v2-lessons

Conversation

@garrytan-agents
Copy link
Copy Markdown
Contributor

@garrytan-agents garrytan-agents commented May 10, 2026

What

Consolidates everything learned from the first full takes extraction run (28,256 pages, 100,720 takes, $361 on Azure GPT-5.5) and subsequent cross-modal eval (GPT-5.5 + Opus 4.6, scored 6.8/10 overall).

Supersedes #764, #792, #793 (all closed).

Changes

Fixes

  • fix(cli): recall and forget missing from CLI_ONLY set — both commands returned "Unknown command"
  • feat(synthesize): auto-enable when session_corpus_dir is configured (eliminates the footgun where you configure a corpus dir and nothing happens)
  • feat(engine): round takes weights to 0.05 increments on insert — cross-modal eval found false precision (0.74, 0.82) implies calibration accuracy that doesn't exist. Both postgres and pglite engines.

Documentation

  • docs/takes-vs-facts.md: New architectural doc — the two epistemological layers, why they must never be conflated, how the dream cycle consolidate phase bridges them, production extraction data, model selection guide, eval methodology
  • docs(takes-fence.ts): Expanded holder JSDoc with concrete right/wrong examples from the eval. Holder = who HOLDS the belief, NOT who it's ABOUT.

Tests (17 new)

  • 5 synthesize-enabled-default tests
  • 6 takes-holder-semantics tests (codify the holder contract)
  • 6 takes-weight-rounding tests

Cross-Modal Eval Results

100K takes scored by GPT-5.5 and Opus 4.6 independently:

Dimension GPT-5.5 Opus 4.6 Avg
Accuracy 7 8 7.5
Attribution 6 7 6.5
Weight calibration 7 7 7.0
Kind classification 6 7 6.5
Signal density 7 6 6.5

Key Learnings (all addressed)

  1. Holder ≠ subject — "Garry has a hero/rescuer pattern" → holder=brain, NOT people/garry-tan (docs + tests)
  2. Weight false precision — 0.74 → 0.75 at engine layer (runtime enforcement)
  3. Takes ≠ facts — never dump takes into facts table; they're different epistemological layers (new doc)
  4. Amplification ≠ endorsement — retweet-only → max weight 0.55 (documented in holder JSDoc)
  5. Self-reported ≠ verified — "reports 7 figures" → holder=person, NOT world/1.0 (documented)
  6. Synthesis footgun — enabled now defaults to true when corpus dir is set (runtime fix)

Production Run Data

  • Model: Azure GPT-5.5 (Sweden) — ties Opus quality at 1/8th cost ($0.033 vs $0.260/page)
  • 28,256 pages → 100,720 takes (70,960 takes / 24,342 facts / 2,875 bets / 2,649 hunches)
  • 6,239 unique holders, 83 errors (0.3%), $361.49 total

Testing

bun test test/synth-enabled-default.test.ts test/takes-holder-semantics.test.ts test/takes-weight-rounding.test.ts
# 17 pass, 0 fail, 37 expect() calls

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Consolidates everything learned from the first full takes extraction run
(28,256 pages, 100,720 takes, $361 on Azure GPT-5.5) and subsequent
cross-modal eval (GPT-5.5 + Opus 4.6, scored 6.8/10 overall).

## Fixes

**fix(cli): add recall and forget to CLI_ONLY set**
v0.31 added these commands to handleCliOnly() but forgot the gate set.
Both fell through to cliOps.get() → 'Unknown command'.

**feat(synthesize): auto-enable when corpus dir is configured**
Setting session_corpus_dir is now sufficient — enabled defaults to true
when a corpus dir is set. Explicit enabled=false still wins. Eliminates
the footgun where users configure a corpus dir and nothing happens.

**feat(engine): round takes weights to 0.05 increments**
Cross-modal eval found false precision (0.74, 0.82) implies calibration
accuracy that doesn't exist. Both postgres and pglite engines now round
on insert. 1.0 and 0.0 are preserved exactly.

## Documentation

**docs: takes-vs-facts architectural distinction**
New doc explaining the two epistemological layers, why they must never be
conflated, how the dream cycle consolidate phase bridges them, and
production extraction data (model selection, eval dimensions, key
learnings for extraction prompts).

**docs(takes-fence): clarify holder semantics with eval examples**
Holder = who HOLDS the belief, NOT who it's ABOUT. Expanded JSDoc with
concrete right/wrong examples from the cross-modal eval. Additional
rules: amplification ≠ endorsement, self-reported ≠ verified, founder
describing company → people/founder not companies/slug.

## Tests (17 new, all passing)

- 5 synthesize-enabled-default tests
- 6 takes-holder-semantics tests
- 6 takes-weight-rounding tests

## Cross-Modal Eval Context

| Dimension         | GPT-5.5 | Opus 4.6 | Avg  |
|-------------------|---------|----------|------|
| Accuracy          | 7       | 8        | 7.5  |
| Attribution       | 6       | 7        | 6.5  |
| Weight calibration| 7       | 7        | 7.0  |
| Kind classification| 6      | 7        | 6.5  |
| Signal density    | 7       | 6        | 6.5  |

Top improvements addressed in this PR:
1. Holder vs subject confusion (docs + tests)
2. Weight false precision (runtime enforcement)
3. Takes ≠ facts distinction (architectural doc)
4. Synthesis auto-enable (runtime fix)
5. recall/forget CLI routing (bug fix)
@garrytan garrytan changed the title feat: takes v2 — lessons from 100K-take production extraction v0.31.4 feat: takes v2 — lessons from 100K-take production extraction May 10, 2026
garrytan and others added 10 commits May 9, 2026 21:52
Adds a "Takes attribution" section to skills/_brain-filing-rules.md
distilling the 6 rules from docs/takes-vs-facts.md into a terse
contract that downstream agents (OpenClaw, Wintermute) can read as
their canonical filing surface.

Documentation only — no in-repo runtime consumer (synthesize.ts reads
the .json file, not the .md). EXP-4 lands the runtime parser-level
holder validation.

Codex review garrytan#9: relabels EXP-3 as documentation, not quality work.
The runtime check is EXP-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ardening)

Migration v46 (takes_weight_round_to_grid): backfills pre-v0.32 takes.weight
to the 0.05 grid the engine layer (PR garrytan#795) enforces on insert. Cross-modal
eval over 100K production takes flagged 0.74, 0.82-style values as false
precision; this brings existing data to the same grid that all new writes
already use.

Tolerance-based comparison (abs > 0.001) avoids the float32-noise re-touch
loop that the naive `weight <> ROUND(...)` form would create — REAL/NUMERIC
comparison promotes weight to DOUBLE PRECISION first, surfacing ~1e-7
representation noise as inequality. The 0.05 grid is 5e-2, so any genuine
off-grid value clears the 1e-3 threshold cleanly.

`transaction: false` (codex review garrytan#2 correction): not for mid-statement
resume (a single SQL statement either completes or rolls back). What it
actually buys is freeing the migration runner from holding a long
transaction so other gbrain processes can interleave.

NaN hardening (codex review garrytan#8): extracts `normalizeWeightForStorage()` to
takes-fence.ts as a single source of truth used by all 4 takes write sites:
  - pglite-engine.ts addTakesBatch
  - pglite-engine.ts updateTake (was missed in original PR — only clamped,
    didn't round; now rounds AND guards NaN)
  - postgres-engine.ts addTakesBatch
  - postgres-engine.ts updateTake (same fix)

The helper guards `!Number.isFinite()` BEFORE the [0,1] range check (NaN
comparisons are always false, so NaN survived the prior clamp and reached
Math.round(NaN * 20) / 20 = NaN, written through to the DB).

Tests:
- test/migrations-v46-takes-weight-backfill.test.ts: behavioral PGLite test
  (rounding fixture + Codex garrytan#2 re-run idempotency + on-grid preservation).
- test/takes-weight-rounding.test.ts: imports the real helper, adds NaN /
  Infinity / -Infinity / null / undefined / updateTake-shape coverage.
- test/migrate.test.ts: structural assertions for v46 SQL shape.

All 52 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds doctor's `takes_weight_grid` slice — the post-migration drift detector
for the 0.05 weight grid v0.31 enforces on insert and v46 backfilled.

Codex review garrytan#7 corrected the original plan's "extend test/doctor.test.ts
with 3 cases" estimate. runDoctor() is a side-effectful command with
process.exit branches, and the existing tests are mostly source-structure
assertions. The fix: extract `takesWeightGridCheck(engine: BrainEngine)`
as a pure exported function. runDoctor calls it. Tests target the helper
directly with stubbed engines for the missing-table branch and against
real PGLite for the 4 ratio bands.

Branches:
  - 0 takes total → ok ("No takes yet")
  - off_grid / total > 10% → fail (with apply-migrations fix hint)
  - 1% < off_grid / total ≤ 10% → warn (same fix hint)
  - else → ok
  - takes table missing (pre-v37) → warn, graceful skip

Tolerance comparison matches migration v46 (abs > 1e-3) so float32 noise
doesn't make a healthy brain look broken.

Tests (test/doctor.test.ts):
  - takesWeightGridCheck export shape
  - 0-takes branch (avoids divide-by-zero)
  - 100% on-grid via engine.addTakesBatch (which now normalizes)
  - 8/10 off-grid → fail
  - 5/100 off-grid → warn
  - missing-table branch via stub engine

All 21 doctor tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds parser-level holder grammar enforcement so cross-modal eval's garrytan#1
attribution error (holder/subject confusion, scored 6.5/10 across 100K
production takes) shows up as a sync-failure record an operator can see.

Changes:

- src/core/sync.ts: exports SLUG_SEGMENT_PATTERN, the actual character
  class slugifySegment() produces ([a-z0-9._-]). Codex review garrytan#3 — the
  initial plan's stricter regex would have warned on legitimate slugs
  like `companies/acme.io` and `people/foo_bar`. HOLDER_REGEX now wraps
  this shared pattern instead of inventing a parallel grammar.

- src/core/takes-fence.ts: HOLDER_REGEX + isValidHolder() helper.
  parseTakesFence() emits TAKES_HOLDER_INVALID warnings for non-matching
  holders. Row preserved (markdown source-of-truth contract).

  Catches the eval's failure modes — `Garry`, `people/Garry-Tan`,
  `world/garry-tan`, `users/garry`, whitespace-only — while keeping
  `companies/acme.io`, `people/foo_bar`, `notes/v1.0.0`-style dotted
  slugs valid. Bare-slug form (`garry`, `alice`) accepted as v0.32 legacy
  compat — production brains shipped with bare-slug holders before the
  namespaced JSDoc landed in PR garrytan#795. Reserved for v0.33 promotion.

- src/core/cycle/extract-takes.ts (codex review garrytan#4 producer seam): adds
  `failedFiles: Array<{path, error}>` to ExtractTakesResult. Both fs
  and db extraction paths populate it from TAKES_HOLDER_INVALID warnings
  so the migration orchestrator can hand it to recordSyncFailures().
  Without this seam, extending classifyErrorCode would do nothing
  (the regex would have nothing to classify).

- src/commands/migrations/v0_28_0.ts: phaseBBackfill calls
  recordSyncFailures(result.failedFiles, 'migration:v0.28.0-backfill')
  after extractTakes completes. Best-effort — persistence failure
  doesn't fail the backfill phase. Doctor's `sync_failures` check now
  shows TAKES_HOLDER_INVALID=N breakdown after upgrade.

- src/core/sync.ts:classifyErrorCode: extends with TAKES_HOLDER_INVALID
  + TAKES_TABLE_MALFORMED / TAKES_ROW_NUM_COLLISION / TAKES_FENCE_UNBALANCED
  bucket. Previously these warnings bucketed to UNKNOWN.

Tests (test/takes-holder-validation.test.ts — 26 cases):
- Canonical forms (world / brain / people-namespace / companies-namespace)
- Codex garrytan#3 dotted-slug + underscore-slug positives
- Legacy bare-slug compat positives
- Eval-flagged error mode rejections (uppercase, mixed case, world/<slug>,
  unrecognized prefix, whitespace, embedded slash)
- HOLDER_REGEX anchoring guard
- SLUG_SEGMENT_PATTERN export shape + drift guard against the wrapping regex
- parseTakesFence end-to-end emission contract
- classifyErrorCode regex coverage

127 tests pass across affected files; typecheck clean. No existing fixtures
broken (legacy bare-slug compat preserves old `garry`-style holders during
the v0.32 transition window).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (EXP-5)

Reproducible cross-modal quality eval for the takes layer. Three frontier
models score a sample against the 5-dim rubric, the runner aggregates to
PASS/FAIL/INCONCLUSIVE, the receipt persists to eval_takes_quality_runs.
Trend mode segregates by rubric_version; regress mode is a CI gate that
exits 1 when any dim regresses past --threshold.

Subcommands:
  run     [--limit N --cycles N --budget-usd N --slug-prefix P --models a,b,c]
  replay  <receipt-path> [--json]                 # NO BRAIN required
  trend   [--limit N --rubric-version V --json]
  regress --against <receipt> [--threshold T --json]

Codex review integrations (D7 — all 10 findings landed):

  garrytan#1 json-repair shim re-exports BOTH parseModelJSON AND the
     ParsedScore + ParsedModelResult types. The original plan only
     re-exported the function, which would have compile-broken
     cross-modal-eval/aggregate.ts:19's type import.

  garrytan#3 Receipt name binds (corpus_sha8, prompt_sha8, models_sha8,
     rubric_sha8) so a future rubric tweak segregates trend rows
     instead of silently corrupting the quality-over-time graph.
     RUBRIC_VERSION + rubric_sha8 are persisted in every receipt.

  garrytan#4 Pricing fail-closed: any model not in pricing.ts produces an
     actionable PricingNotFoundError before any HTTP call fires.
     Same drift problem as cross-modal-eval/runner.ts:estimateCost(),
     but explicit instead of silent zero.

  garrytan#5 Aggregate requires ALL 5 declared rubric dimensions per model.
     Cross-modal-eval v1's union-of-whatever-parsed pattern allowed a
     model to omit a dim and still PASS — that's a regression-gate
     hole. Now: missing-dim drops the contribution, treated identically
     to a parse failure. Empty-scores PASS regression guard preserved.

  garrytan#6 DB-authoritative receipt persistence. Original two-phase plan had
     a split-brain reconciliation gap (disk-success/DB-fail vanishes
     from trend; DB-success/disk-fail unreplayable). Now DB row is the
     source of truth (carries full receipt JSON in a JSONB column);
     disk artifact is best-effort. replay reads disk first; loadReceiptFromDb
     reconstructs from DB when the disk file is missing.

  garrytan#10 Brain-routing: replay is the only sub-subcommand that doesn't
      need a brain. cli.ts no-DB bypass routes "eval takes-quality replay"
      directly to runReplayNoBrain, which exits 0/1/2 cleanly without
      ever touching the engine. Other modes go through connectEngine.

Files added:
  src/core/eval-shared/json-repair.ts (hoisted from cross-modal-eval)
  src/core/takes-quality-eval/{rubric,pricing,aggregate,receipt-name,
                                receipt-write,receipt,replay,regress,trend,runner}.ts
  src/commands/eval-takes-quality.ts
  docs/eval-takes-quality.md (stable schema_version: 1 contract)
  10 test files (83 cases — aggregate / receipt-name / shim / pricing /
                 rubric / receipt-write / replay / trend / regress / cli)

Files modified:
  src/cli.ts: replay no-DB bypass + engine-required dispatch
  src/core/cross-modal-eval/json-repair.ts → re-export shim
  src/core/migrate.ts: append v47 (eval_takes_quality_runs table)
  src/core/pglite-schema.ts + src/schema.sql: mirror the v47 table for
    fresh-install path. RLS toggled on the new table.
  src/core/schema-embedded.ts: regenerated via build:schema
  test/migrate.test.ts: 6 structural cases for v47

186 tests pass; typecheck clean. Replay verified working end-to-end
(reads receipt JSON file without DATABASE_URL, exits with the verdict
code, prints actionable error on missing file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions identified during the test-gap audit:

  1. test/eval-takes-quality-boundaries.test.ts (4 cases):
     - empty corpus → "no takes to evaluate" (pre-LLM)
     - source=fs reserved for v0.33 → clear refusal
     - --budget-usd + unknown model → PricingNotFoundError BEFORE any
       network call (codex review garrytan#4 fail-closed contract)
     - --budget-usd null + unknown model → no pre-flight pricing error
       (proves pricing pre-flight gates ONLY when budget is set)

  2. test/eval-takes-quality-runner.serial.test.ts (7 cases):
     End-to-end runner integration with mock.module-stubbed gateway.chat.
     Quarantined as *.serial.test.ts because mock.module leaks across
     files in the same shard process (R2 in check-test-isolation.sh).
     Covers:
       - 3 PASS scores → verdict=pass with all dim scores in receipt
       - all model errors → INCONCLUSIVE
       - 1 success + 2 errors → INCONCLUSIVE (need >=2 contributing)
       - 3 successes with low scores → FAIL
       - budget cap fires before cycle 1 (no chat() ever called)
       - budget cap allows cycle when projection fits

  3. test/eval-takes-quality-receipt-write.test.ts: refactored to use
     withEnv() helper for GBRAIN_HOME mutation instead of direct
     process.env writes. The original beforeAll mutation tripped the
     check-test-isolation.sh R1 lint. withEnv() saves/restores via
     try/finally per-test so other shard files don't see the override.

Verification:
  bun run test       → 4977 pass / 0 fail
  bun run test:serial → 179 pass / 0 fail
  bun run verify     → clean (typecheck + 9 pre-checks pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure-PGLite tests already cover the receipt-write contract; this E2E
verifies the same code path against actual Postgres so the postgres.js
JSONB encoding and the v47 migration apply cleanly under production
conditions.

Coverage (8 cases):
  - migration v47 created the table with all expected columns
  - writeReceiptToDb persists full receipt_json on Postgres
  - 4-sha UNIQUE constraint enforces ON CONFLICT DO NOTHING idempotency
    (3 inserts → 1 row)
  - rubric_version segregation: distinct rubric_sha8 → distinct row
    (codex review garrytan#3 — rubric epoch separation)
  - loadTrend reads in DESC order on Postgres
  - loadReceiptFromDb reconstructs receipt JSON via the JSONB column
  - writeReceipt (combined) succeeds with disk artifact + DB row
  - trend SELECT plan executes (planner picks index on larger tables)

Skips gracefully when DATABASE_URL is unset (existing hasDatabase()
helper). Uses the canonical setupDB/teardownDB from test/e2e/helpers.ts.
GBRAIN_HOME mutation is wrapped in withEnv() per the v0.32.0 test-isolation
lint contract.

Verification:
  bash scripts/run-e2e.sh → 71 files / 499 tests / 0 fail (full E2E suite)
  bun test test/e2e/eval-takes-quality.test.ts → 8 / 8 pass standalone

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	src/cli.ts
#	src/core/migrate.ts
Audit of shipped v0.32 code surfaced 4 wiring gaps that the per-EXP unit
tests didn't cover. Adding direct integration tests for each so a future
refactor can't accidentally bypass the helper or unwire the producer seam.

test/extract-takes-holder-producer-seam.test.ts (7 cases) — codex review
garrytan#4 producer seam. Verifies extractTakesFromDb populates ExtractTakesResult.
failedFiles[] when parseTakesFence emits TAKES_HOLDER_INVALID warnings,
and that the entry shape is recordSyncFailures-compatible. Without this
test, the v0_28_0 migration's recordSyncFailures call would have silently
fed it nothing if a refactor accidentally dropped the failedFiles append.
Covers: valid holder (no entry), invalid uppercase, world/<slug>, mixed
valid+invalid, legacy bare-slug compat, malformed-table-only (no leak),
recordSyncFailures shape compatibility.

test/engine-weight-rounding-integration.test.ts (15 cases) — codex review
garrytan#8 integration coverage. Helper is unit-tested; this proves both engines'
addTakesBatch + updateTake paths actually call it. PGLite-side coverage
mirrors the test/e2e/takes-weight-rounding-postgres.test.ts E2E for real
Postgres. Covers: 0.74→0.75, 0.82→0.80, on-grid identity, NaN→0.5,
Infinity→0.5, clamp high/low, undefined default, mixed batch order,
updateTake rounds (was unhardened pre-v0.32), updateTake NaN, updateTake
preserves prior weight when undefined.

test/e2e/takes-weight-rounding-postgres.test.ts (6 cases, 14 expects) —
real-Postgres write-path coverage. Specifically tests the postgres.js
unnest() bind path that PGLite doesn't exercise:
  - addTakesBatch rounds via the unnest() bind shape
  - addTakesBatch handles NaN at the postgres.js array marshaling layer
  - 10-row mixed batch (4 off-grid) rounds each independently
  - updateTake rounds on real Postgres
  - updateTake handles NaN
  - migration v48 tolerance matches engine-write tolerance (round-trip
    proof — engine-rounded value is invisible to v48's WHERE clause)

Verification:
  bun run test       → 5166 pass / 0 fail (parallel unit, 128s)
  bun run test:serial → 190 pass / 0 fail
  bun run test:e2e   → 71 / 74 files; 3 pre-existing env-inheritance
                       failures (serve-http-oauth, sources-remote-mcp,
                       thin-client — confirmed identical on master in
                       this environment, documented in CLAUDE.md)
  bun run verify     → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uites

Real production bug, not just a test-environment issue.
withConfiguredSql in src/commands/auth.ts created a PostgresEngine via
createEngine() but never called engine.connect(). The PostgresEngine.sql
getter falls back to db.getConnection() (the module-level singleton) when
its instance _sql is unset — and db.connect() wasn't called either.

So every `gbrain auth` subcommand (create, list, revoke, register-client,
revoke-client) crashed with the misleading "No database connection:
connect() has not been called" error on real Postgres. Anyone with a
Postgres-backed brain hit this. The error pointed at gbrain init which
made the regression invisible — users assumed they hadn't initialized.

Verified by running `gbrain auth register-client` directly:
  Before: "Error: No database connection: connect() has not been called."
  After:  "OAuth client registered: ..." with credentials printed.

This fix unblocked all 3 previously-failing E2E suites (which all use
register-client in beforeAll):
  serve-http-oauth.test.ts:    0/28 → 28/28 pass
  sources-remote-mcp.test.ts:  0/14 → 14/14 pass
  thin-client.test.ts:         0/7  →  6/7 pass + 1 documented skip

Two surgical test-side fixes also landed:

1. test/e2e/thin-client.test.ts:182 — assertion typo. Test expected
   r.stderr to contain "thin client" (space). Actual refusal message
   says "(thin-client of <url>)" with hyphen. Loosened to /thin[- ]client/
   so a future format tweak doesn't false-fail.

2. test/e2e/thin-client.test.ts:239 — skipped "remote ping triggers
   autopilot-cycle" with a clear TODO. Test asks the wrong question
   against the existing fixture: `gbrain serve --http` deliberately
   does NOT start a job worker (workers run via separate `gbrain jobs
   work` process), so the submitted autopilot-cycle job sits in
   `waiting` forever. Test was supposed to fall back to the self-imposed
   `--timeout`, but `gbrain remote ping --timeout` doesn't honor the cap
   when callRemoteTool hangs (loop only checks elapsed time between
   iterations; a single in-flight callTool with no AbortSignal blocks
   forever). Two real follow-ups would unblock: thread an AbortSignal
   through callRemoteTool's MCP callTool path, OR start a `gbrain jobs
   work` subprocess in beforeAll. Either is its own PR. Wire path
   coverage isn't lost — exercised by every other test in this file
   plus the entire serve-http-oauth.test.ts suite.

Verification:
  bun test test/e2e/serve-http-oauth.test.ts test/e2e/sources-remote-mcp.test.ts test/e2e/thin-client.test.ts
    → 47 pass / 1 skip / 0 fail in 8.4s
  bun run verify → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 7267462 into garrytan:master May 10, 2026
6 of 7 checks passed
garrytan added a commit that referenced this pull request May 10, 2026
Brings v0.31.4's takes-quality eval harness + takes-weight-rounding
backfill (#795) into the wave assembly. v0.31.4 added the
takes_weight_grid doctor check, ~15 new test files for the
takes-quality-eval module, the takes-weight-rounding migration v48,
and a takes-vs-facts design doc.

One conflict, one resolution:

test/doctor.test.ts — both v0.31.7 and v0.31.4 added new tests at
the same insertion point in the existing `doctor command` describe
block. Resolution: keep BOTH. v0.31.7's IRON-RULE regression test
for the graph_coverage hint stays at line 178+; v0.31.4's 5 new
takes_weight_grid tests follow it. Verified: 22/22 doctor tests
pass.

VERSION + package.json kept at 0.31.7 (highest semver wins).
CHANGELOG: master's v0.31.4 commit (#795) didn't include a CHANGELOG
entry, so my v0.31.7 entry remains at top followed by master's
v0.31.3 entry — chronology preserved.

Zero file overlap between v0.31.7's resolver/doctor surface and
v0.31.4's takes-quality surface beyond the test/doctor.test.ts
insertion-point collision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants