v0.31.4 feat: takes v2 — lessons from 100K-take production extraction by garrytan-agents · Pull Request #795 · garrytan/gbrain

garrytan-agents · 2026-05-10T01:04:39Z

What

Consolidates everything learned from the first full takes extraction run (28,256 pages, 100,720 takes, $361 on Azure GPT-5.5) and subsequent cross-modal eval (GPT-5.5 + Opus 4.6, scored 6.8/10 overall).

Supersedes #764, #792, #793 (all closed).

Changes

Fixes

fix(cli): recall and forget missing from CLI_ONLY set — both commands returned "Unknown command"
feat(synthesize): auto-enable when session_corpus_dir is configured (eliminates the footgun where you configure a corpus dir and nothing happens)
feat(engine): round takes weights to 0.05 increments on insert — cross-modal eval found false precision (0.74, 0.82) implies calibration accuracy that doesn't exist. Both postgres and pglite engines.

Documentation

docs/takes-vs-facts.md: New architectural doc — the two epistemological layers, why they must never be conflated, how the dream cycle consolidate phase bridges them, production extraction data, model selection guide, eval methodology
docs(takes-fence.ts): Expanded holder JSDoc with concrete right/wrong examples from the eval. Holder = who HOLDS the belief, NOT who it's ABOUT.

Tests (17 new)

5 synthesize-enabled-default tests
6 takes-holder-semantics tests (codify the holder contract)
6 takes-weight-rounding tests

Cross-Modal Eval Results

100K takes scored by GPT-5.5 and Opus 4.6 independently:

Dimension	GPT-5.5	Opus 4.6	Avg
Accuracy	7	8	7.5
Attribution	6	7	6.5
Weight calibration	7	7	7.0
Kind classification	6	7	6.5
Signal density	7	6	6.5

Key Learnings (all addressed)

Holder ≠ subject — "Garry has a hero/rescuer pattern" → holder=brain, NOT people/garry-tan (docs + tests)
Weight false precision — 0.74 → 0.75 at engine layer (runtime enforcement)
Takes ≠ facts — never dump takes into facts table; they're different epistemological layers (new doc)
Amplification ≠ endorsement — retweet-only → max weight 0.55 (documented in holder JSDoc)
Self-reported ≠ verified — "reports 7 figures" → holder=person, NOT world/1.0 (documented)
Synthesis footgun — enabled now defaults to true when corpus dir is set (runtime fix)

Production Run Data

Model: Azure GPT-5.5 (Sweden) — ties Opus quality at 1/8th cost ($0.033 vs $0.260/page)
28,256 pages → 100,720 takes (70,960 takes / 24,342 facts / 2,875 bets / 2,649 hunches)
6,239 unique holders, 83 errors (0.3%), $361.49 total

Testing

bun test test/synth-enabled-default.test.ts test/takes-holder-semantics.test.ts test/takes-weight-rounding.test.ts
# 17 pass, 0 fail, 37 expect() calls

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Consolidates everything learned from the first full takes extraction run (28,256 pages, 100,720 takes, $361 on Azure GPT-5.5) and subsequent cross-modal eval (GPT-5.5 + Opus 4.6, scored 6.8/10 overall). ## Fixes **fix(cli): add recall and forget to CLI_ONLY set** v0.31 added these commands to handleCliOnly() but forgot the gate set. Both fell through to cliOps.get() → 'Unknown command'. **feat(synthesize): auto-enable when corpus dir is configured** Setting session_corpus_dir is now sufficient — enabled defaults to true when a corpus dir is set. Explicit enabled=false still wins. Eliminates the footgun where users configure a corpus dir and nothing happens. **feat(engine): round takes weights to 0.05 increments** Cross-modal eval found false precision (0.74, 0.82) implies calibration accuracy that doesn't exist. Both postgres and pglite engines now round on insert. 1.0 and 0.0 are preserved exactly. ## Documentation **docs: takes-vs-facts architectural distinction** New doc explaining the two epistemological layers, why they must never be conflated, how the dream cycle consolidate phase bridges them, and production extraction data (model selection, eval dimensions, key learnings for extraction prompts). **docs(takes-fence): clarify holder semantics with eval examples** Holder = who HOLDS the belief, NOT who it's ABOUT. Expanded JSDoc with concrete right/wrong examples from the cross-modal eval. Additional rules: amplification ≠ endorsement, self-reported ≠ verified, founder describing company → people/founder not companies/slug. ## Tests (17 new, all passing) - 5 synthesize-enabled-default tests - 6 takes-holder-semantics tests - 6 takes-weight-rounding tests ## Cross-Modal Eval Context | Dimension | GPT-5.5 | Opus 4.6 | Avg | |-------------------|---------|----------|------| | Accuracy | 7 | 8 | 7.5 | | Attribution | 6 | 7 | 6.5 | | Weight calibration| 7 | 7 | 7.0 | | Kind classification| 6 | 7 | 6.5 | | Signal density | 7 | 6 | 6.5 | Top improvements addressed in this PR: 1. Holder vs subject confusion (docs + tests) 2. Weight false precision (runtime enforcement) 3. Takes ≠ facts distinction (architectural doc) 4. Synthesis auto-enable (runtime fix) 5. recall/forget CLI routing (bug fix)

Adds a "Takes attribution" section to skills/_brain-filing-rules.md distilling the 6 rules from docs/takes-vs-facts.md into a terse contract that downstream agents (OpenClaw, Wintermute) can read as their canonical filing surface. Documentation only — no in-repo runtime consumer (synthesize.ts reads the .json file, not the .md). EXP-4 lands the runtime parser-level holder validation. Codex review garrytan#9: relabels EXP-3 as documentation, not quality work. The runtime check is EXP-4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ardening) Migration v46 (takes_weight_round_to_grid): backfills pre-v0.32 takes.weight to the 0.05 grid the engine layer (PR garrytan#795) enforces on insert. Cross-modal eval over 100K production takes flagged 0.74, 0.82-style values as false precision; this brings existing data to the same grid that all new writes already use. Tolerance-based comparison (abs > 0.001) avoids the float32-noise re-touch loop that the naive `weight <> ROUND(...)` form would create — REAL/NUMERIC comparison promotes weight to DOUBLE PRECISION first, surfacing ~1e-7 representation noise as inequality. The 0.05 grid is 5e-2, so any genuine off-grid value clears the 1e-3 threshold cleanly. `transaction: false` (codex review garrytan#2 correction): not for mid-statement resume (a single SQL statement either completes or rolls back). What it actually buys is freeing the migration runner from holding a long transaction so other gbrain processes can interleave. NaN hardening (codex review garrytan#8): extracts `normalizeWeightForStorage()` to takes-fence.ts as a single source of truth used by all 4 takes write sites: - pglite-engine.ts addTakesBatch - pglite-engine.ts updateTake (was missed in original PR — only clamped, didn't round; now rounds AND guards NaN) - postgres-engine.ts addTakesBatch - postgres-engine.ts updateTake (same fix) The helper guards `!Number.isFinite()` BEFORE the [0,1] range check (NaN comparisons are always false, so NaN survived the prior clamp and reached Math.round(NaN * 20) / 20 = NaN, written through to the DB). Tests: - test/migrations-v46-takes-weight-backfill.test.ts: behavioral PGLite test (rounding fixture + Codex garrytan#2 re-run idempotency + on-grid preservation). - test/takes-weight-rounding.test.ts: imports the real helper, adds NaN / Infinity / -Infinity / null / undefined / updateTake-shape coverage. - test/migrate.test.ts: structural assertions for v46 SQL shape. All 52 tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds doctor's `takes_weight_grid` slice — the post-migration drift detector for the 0.05 weight grid v0.31 enforces on insert and v46 backfilled. Codex review garrytan#7 corrected the original plan's "extend test/doctor.test.ts with 3 cases" estimate. runDoctor() is a side-effectful command with process.exit branches, and the existing tests are mostly source-structure assertions. The fix: extract `takesWeightGridCheck(engine: BrainEngine)` as a pure exported function. runDoctor calls it. Tests target the helper directly with stubbed engines for the missing-table branch and against real PGLite for the 4 ratio bands. Branches: - 0 takes total → ok ("No takes yet") - off_grid / total > 10% → fail (with apply-migrations fix hint) - 1% < off_grid / total ≤ 10% → warn (same fix hint) - else → ok - takes table missing (pre-v37) → warn, graceful skip Tolerance comparison matches migration v46 (abs > 1e-3) so float32 noise doesn't make a healthy brain look broken. Tests (test/doctor.test.ts): - takesWeightGridCheck export shape - 0-takes branch (avoids divide-by-zero) - 100% on-grid via engine.addTakesBatch (which now normalizes) - 8/10 off-grid → fail - 5/100 off-grid → warn - missing-table branch via stub engine All 21 doctor tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds parser-level holder grammar enforcement so cross-modal eval's garrytan#1 attribution error (holder/subject confusion, scored 6.5/10 across 100K production takes) shows up as a sync-failure record an operator can see. Changes: - src/core/sync.ts: exports SLUG_SEGMENT_PATTERN, the actual character class slugifySegment() produces ([a-z0-9._-]). Codex review garrytan#3 — the initial plan's stricter regex would have warned on legitimate slugs like `companies/acme.io` and `people/foo_bar`. HOLDER_REGEX now wraps this shared pattern instead of inventing a parallel grammar. - src/core/takes-fence.ts: HOLDER_REGEX + isValidHolder() helper. parseTakesFence() emits TAKES_HOLDER_INVALID warnings for non-matching holders. Row preserved (markdown source-of-truth contract). Catches the eval's failure modes — `Garry`, `people/Garry-Tan`, `world/garry-tan`, `users/garry`, whitespace-only — while keeping `companies/acme.io`, `people/foo_bar`, `notes/v1.0.0`-style dotted slugs valid. Bare-slug form (`garry`, `alice`) accepted as v0.32 legacy compat — production brains shipped with bare-slug holders before the namespaced JSDoc landed in PR garrytan#795. Reserved for v0.33 promotion. - src/core/cycle/extract-takes.ts (codex review garrytan#4 producer seam): adds `failedFiles: Array<{path, error}>` to ExtractTakesResult. Both fs and db extraction paths populate it from TAKES_HOLDER_INVALID warnings so the migration orchestrator can hand it to recordSyncFailures(). Without this seam, extending classifyErrorCode would do nothing (the regex would have nothing to classify). - src/commands/migrations/v0_28_0.ts: phaseBBackfill calls recordSyncFailures(result.failedFiles, 'migration:v0.28.0-backfill') after extractTakes completes. Best-effort — persistence failure doesn't fail the backfill phase. Doctor's `sync_failures` check now shows TAKES_HOLDER_INVALID=N breakdown after upgrade. - src/core/sync.ts:classifyErrorCode: extends with TAKES_HOLDER_INVALID + TAKES_TABLE_MALFORMED / TAKES_ROW_NUM_COLLISION / TAKES_FENCE_UNBALANCED bucket. Previously these warnings bucketed to UNKNOWN. Tests (test/takes-holder-validation.test.ts — 26 cases): - Canonical forms (world / brain / people-namespace / companies-namespace) - Codex garrytan#3 dotted-slug + underscore-slug positives - Legacy bare-slug compat positives - Eval-flagged error mode rejections (uppercase, mixed case, world/<slug>, unrecognized prefix, whitespace, embedded slash) - HOLDER_REGEX anchoring guard - SLUG_SEGMENT_PATTERN export shape + drift guard against the wrapping regex - parseTakesFence end-to-end emission contract - classifyErrorCode regex coverage 127 tests pass across affected files; typecheck clean. No existing fixtures broken (legacy bare-slug compat preserves old `garry`-style holders during the v0.32 transition window). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… (EXP-5) Reproducible cross-modal quality eval for the takes layer. Three frontier models score a sample against the 5-dim rubric, the runner aggregates to PASS/FAIL/INCONCLUSIVE, the receipt persists to eval_takes_quality_runs. Trend mode segregates by rubric_version; regress mode is a CI gate that exits 1 when any dim regresses past --threshold. Subcommands: run [--limit N --cycles N --budget-usd N --slug-prefix P --models a,b,c] replay <receipt-path> [--json] # NO BRAIN required trend [--limit N --rubric-version V --json] regress --against <receipt> [--threshold T --json] Codex review integrations (D7 — all 10 findings landed): garrytan#1 json-repair shim re-exports BOTH parseModelJSON AND the ParsedScore + ParsedModelResult types. The original plan only re-exported the function, which would have compile-broken cross-modal-eval/aggregate.ts:19's type import. garrytan#3 Receipt name binds (corpus_sha8, prompt_sha8, models_sha8, rubric_sha8) so a future rubric tweak segregates trend rows instead of silently corrupting the quality-over-time graph. RUBRIC_VERSION + rubric_sha8 are persisted in every receipt. garrytan#4 Pricing fail-closed: any model not in pricing.ts produces an actionable PricingNotFoundError before any HTTP call fires. Same drift problem as cross-modal-eval/runner.ts:estimateCost(), but explicit instead of silent zero. garrytan#5 Aggregate requires ALL 5 declared rubric dimensions per model. Cross-modal-eval v1's union-of-whatever-parsed pattern allowed a model to omit a dim and still PASS — that's a regression-gate hole. Now: missing-dim drops the contribution, treated identically to a parse failure. Empty-scores PASS regression guard preserved. garrytan#6 DB-authoritative receipt persistence. Original two-phase plan had a split-brain reconciliation gap (disk-success/DB-fail vanishes from trend; DB-success/disk-fail unreplayable). Now DB row is the source of truth (carries full receipt JSON in a JSONB column); disk artifact is best-effort. replay reads disk first; loadReceiptFromDb reconstructs from DB when the disk file is missing. garrytan#10 Brain-routing: replay is the only sub-subcommand that doesn't need a brain. cli.ts no-DB bypass routes "eval takes-quality replay" directly to runReplayNoBrain, which exits 0/1/2 cleanly without ever touching the engine. Other modes go through connectEngine. Files added: src/core/eval-shared/json-repair.ts (hoisted from cross-modal-eval) src/core/takes-quality-eval/{rubric,pricing,aggregate,receipt-name, receipt-write,receipt,replay,regress,trend,runner}.ts src/commands/eval-takes-quality.ts docs/eval-takes-quality.md (stable schema_version: 1 contract) 10 test files (83 cases — aggregate / receipt-name / shim / pricing / rubric / receipt-write / replay / trend / regress / cli) Files modified: src/cli.ts: replay no-DB bypass + engine-required dispatch src/core/cross-modal-eval/json-repair.ts → re-export shim src/core/migrate.ts: append v47 (eval_takes_quality_runs table) src/core/pglite-schema.ts + src/schema.sql: mirror the v47 table for fresh-install path. RLS toggled on the new table. src/core/schema-embedded.ts: regenerated via build:schema test/migrate.test.ts: 6 structural cases for v47 186 tests pass; typecheck clean. Replay verified working end-to-end (reads receipt JSON file without DATABASE_URL, exits with the verdict code, prints actionable error on missing file). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three additions identified during the test-gap audit: 1. test/eval-takes-quality-boundaries.test.ts (4 cases): - empty corpus → "no takes to evaluate" (pre-LLM) - source=fs reserved for v0.33 → clear refusal - --budget-usd + unknown model → PricingNotFoundError BEFORE any network call (codex review garrytan#4 fail-closed contract) - --budget-usd null + unknown model → no pre-flight pricing error (proves pricing pre-flight gates ONLY when budget is set) 2. test/eval-takes-quality-runner.serial.test.ts (7 cases): End-to-end runner integration with mock.module-stubbed gateway.chat. Quarantined as *.serial.test.ts because mock.module leaks across files in the same shard process (R2 in check-test-isolation.sh). Covers: - 3 PASS scores → verdict=pass with all dim scores in receipt - all model errors → INCONCLUSIVE - 1 success + 2 errors → INCONCLUSIVE (need >=2 contributing) - 3 successes with low scores → FAIL - budget cap fires before cycle 1 (no chat() ever called) - budget cap allows cycle when projection fits 3. test/eval-takes-quality-receipt-write.test.ts: refactored to use withEnv() helper for GBRAIN_HOME mutation instead of direct process.env writes. The original beforeAll mutation tripped the check-test-isolation.sh R1 lint. withEnv() saves/restores via try/finally per-test so other shard files don't see the override. Verification: bun run test → 4977 pass / 0 fail bun run test:serial → 179 pass / 0 fail bun run verify → clean (typecheck + 9 pre-checks pass) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pure-PGLite tests already cover the receipt-write contract; this E2E verifies the same code path against actual Postgres so the postgres.js JSONB encoding and the v47 migration apply cleanly under production conditions. Coverage (8 cases): - migration v47 created the table with all expected columns - writeReceiptToDb persists full receipt_json on Postgres - 4-sha UNIQUE constraint enforces ON CONFLICT DO NOTHING idempotency (3 inserts → 1 row) - rubric_version segregation: distinct rubric_sha8 → distinct row (codex review garrytan#3 — rubric epoch separation) - loadTrend reads in DESC order on Postgres - loadReceiptFromDb reconstructs receipt JSON via the JSONB column - writeReceipt (combined) succeeds with disk artifact + DB row - trend SELECT plan executes (planner picks index on larger tables) Skips gracefully when DATABASE_URL is unset (existing hasDatabase() helper). Uses the canonical setupDB/teardownDB from test/e2e/helpers.ts. GBRAIN_HOME mutation is wrapped in withEnv() per the v0.32.0 test-isolation lint contract. Verification: bash scripts/run-e2e.sh → 71 files / 499 tests / 0 fail (full E2E suite) bun test test/e2e/eval-takes-quality.test.ts → 8 / 8 pass standalone Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # src/cli.ts # src/core/migrate.ts

Audit of shipped v0.32 code surfaced 4 wiring gaps that the per-EXP unit tests didn't cover. Adding direct integration tests for each so a future refactor can't accidentally bypass the helper or unwire the producer seam. test/extract-takes-holder-producer-seam.test.ts (7 cases) — codex review garrytan#4 producer seam. Verifies extractTakesFromDb populates ExtractTakesResult. failedFiles[] when parseTakesFence emits TAKES_HOLDER_INVALID warnings, and that the entry shape is recordSyncFailures-compatible. Without this test, the v0_28_0 migration's recordSyncFailures call would have silently fed it nothing if a refactor accidentally dropped the failedFiles append. Covers: valid holder (no entry), invalid uppercase, world/<slug>, mixed valid+invalid, legacy bare-slug compat, malformed-table-only (no leak), recordSyncFailures shape compatibility. test/engine-weight-rounding-integration.test.ts (15 cases) — codex review garrytan#8 integration coverage. Helper is unit-tested; this proves both engines' addTakesBatch + updateTake paths actually call it. PGLite-side coverage mirrors the test/e2e/takes-weight-rounding-postgres.test.ts E2E for real Postgres. Covers: 0.74→0.75, 0.82→0.80, on-grid identity, NaN→0.5, Infinity→0.5, clamp high/low, undefined default, mixed batch order, updateTake rounds (was unhardened pre-v0.32), updateTake NaN, updateTake preserves prior weight when undefined. test/e2e/takes-weight-rounding-postgres.test.ts (6 cases, 14 expects) — real-Postgres write-path coverage. Specifically tests the postgres.js unnest() bind path that PGLite doesn't exercise: - addTakesBatch rounds via the unnest() bind shape - addTakesBatch handles NaN at the postgres.js array marshaling layer - 10-row mixed batch (4 off-grid) rounds each independently - updateTake rounds on real Postgres - updateTake handles NaN - migration v48 tolerance matches engine-write tolerance (round-trip proof — engine-rounded value is invisible to v48's WHERE clause) Verification: bun run test → 5166 pass / 0 fail (parallel unit, 128s) bun run test:serial → 190 pass / 0 fail bun run test:e2e → 71 / 74 files; 3 pre-existing env-inheritance failures (serve-http-oauth, sources-remote-mcp, thin-client — confirmed identical on master in this environment, documented in CLAUDE.md) bun run verify → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…uites Real production bug, not just a test-environment issue. withConfiguredSql in src/commands/auth.ts created a PostgresEngine via createEngine() but never called engine.connect(). The PostgresEngine.sql getter falls back to db.getConnection() (the module-level singleton) when its instance _sql is unset — and db.connect() wasn't called either. So every `gbrain auth` subcommand (create, list, revoke, register-client, revoke-client) crashed with the misleading "No database connection: connect() has not been called" error on real Postgres. Anyone with a Postgres-backed brain hit this. The error pointed at gbrain init which made the regression invisible — users assumed they hadn't initialized. Verified by running `gbrain auth register-client` directly: Before: "Error: No database connection: connect() has not been called." After: "OAuth client registered: ..." with credentials printed. This fix unblocked all 3 previously-failing E2E suites (which all use register-client in beforeAll): serve-http-oauth.test.ts: 0/28 → 28/28 pass sources-remote-mcp.test.ts: 0/14 → 14/14 pass thin-client.test.ts: 0/7 → 6/7 pass + 1 documented skip Two surgical test-side fixes also landed: 1. test/e2e/thin-client.test.ts:182 — assertion typo. Test expected r.stderr to contain "thin client" (space). Actual refusal message says "(thin-client of <url>)" with hyphen. Loosened to /thin[- ]client/ so a future format tweak doesn't false-fail. 2. test/e2e/thin-client.test.ts:239 — skipped "remote ping triggers autopilot-cycle" with a clear TODO. Test asks the wrong question against the existing fixture: `gbrain serve --http` deliberately does NOT start a job worker (workers run via separate `gbrain jobs work` process), so the submitted autopilot-cycle job sits in `waiting` forever. Test was supposed to fall back to the self-imposed `--timeout`, but `gbrain remote ping --timeout` doesn't honor the cap when callRemoteTool hangs (loop only checks elapsed time between iterations; a single in-flight callTool with no AbortSignal blocks forever). Two real follow-ups would unblock: thread an AbortSignal through callRemoteTool's MCP callTool path, OR start a `gbrain jobs work` subprocess in beforeAll. Either is its own PR. Wire path coverage isn't lost — exercised by every other test in this file plus the entire serve-http-oauth.test.ts suite. Verification: bun test test/e2e/serve-http-oauth.test.ts test/e2e/sources-remote-mcp.test.ts test/e2e/thin-client.test.ts → 47 pass / 1 skip / 0 fail in 8.4s bun run verify → clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings v0.31.4's takes-quality eval harness + takes-weight-rounding backfill (#795) into the wave assembly. v0.31.4 added the takes_weight_grid doctor check, ~15 new test files for the takes-quality-eval module, the takes-weight-rounding migration v48, and a takes-vs-facts design doc. One conflict, one resolution: test/doctor.test.ts — both v0.31.7 and v0.31.4 added new tests at the same insertion point in the existing `doctor command` describe block. Resolution: keep BOTH. v0.31.7's IRON-RULE regression test for the graph_coverage hint stays at line 178+; v0.31.4's 5 new takes_weight_grid tests follow it. Verified: 22/22 doctor tests pass. VERSION + package.json kept at 0.31.7 (highest semver wins). CHANGELOG: master's v0.31.4 commit (#795) didn't include a CHANGELOG entry, so my v0.31.7 entry remains at top followed by master's v0.31.3 entry — chronology preserved. Zero file overlap between v0.31.7's resolver/doctor surface and v0.31.4's takes-quality surface beyond the test/doctor.test.ts insertion-point collision.

garrytan changed the title ~~feat: takes v2 — lessons from 100K-take production extraction~~ v0.31.4 feat: takes v2 — lessons from 100K-take production extraction May 10, 2026

garrytan and others added 10 commits May 9, 2026 21:52

Merge remote-tracking branch 'origin/master' into feat/takes-v2-lessons

8a6c59b

# Conflicts: # src/cli.ts # src/core/migrate.ts

garrytan merged commit 7267462 into garrytan:master May 10, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.31.4 feat: takes v2 — lessons from 100K-take production extraction#795

v0.31.4 feat: takes v2 — lessons from 100K-take production extraction#795
garrytan merged 11 commits intogarrytan:masterfrom
garrytan-agents:feat/takes-v2-lessons

garrytan-agents commented May 10, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 10, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Fixes

Documentation

Tests (17 new)

Cross-Modal Eval Results

Key Learnings (all addressed)

Production Run Data

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan-agents commented May 10, 2026 •

edited by blacksmith-sh Bot

Loading