v0.41-launch: hermetic baseline + qrels for gbrain eval gate by garrytan · Pull Request #13 · garrytan/gbrain-evals

garrytan · 2026-05-24T08:49:37Z

Summary

Adds the first published baseline + qrels for gbrain eval gate (the new CI verb landing in gbrain v0.41.0.0). Both files are hermetic-synthetic — placeholder names only per gbrain's D9 privacy posture.
baselines/v0.41-launch.baseline.ndjson (12 captured rows, source_hash 34e88041…) drives the regression gate.
qrels/v0.41-launch.qrels.json (12 hand-curated queries with known-right answers) drives the correctness gate.
scripts/generate-v0.41-launch.ts is the reproducible recipe; same input + fixed published_at → byte-identical output.

Closes the LOOP that gbrain v0.41 ships: users point CI at these files via gbrain eval gate --baseline X --qrels Y and fail PRs on retrieval regressions OR correctness drops without bootstrapping their own baseline.

What's in the box

File	Purpose
`baselines/v0.41-launch.baseline.ndjson`	Regression gate target. NDJSON: metadata header (`_kind: 'baseline_metadata'` + thresholds + `source_hash`) + 12 captured rows.
`baselines/README.md`	Privacy posture, file format, refresh discipline.
`qrels/v0.41-launch.qrels.json`	Correctness gate target. JSON object: `{schema_version, queries: [...]}`. Promoted from gbrain's existing `test/fixtures/eval-baselines/qrels-search.json` fixture.
`qrels/README.md`	File format docs (legacy + federated `source_id`-aware shapes).
`scripts/generate-v0.41-launch.ts`	Deterministic regenerator. Set `GBRAIN_SRC=<path-to-gbrain>` to use a local gbrain checkout instead of the npm dep.

Privacy posture (gbrain D9)

Every slug in both files is a *-example placeholder (people/alice-example, companies/widget-co-example, etc.) per gbrain's CLAUDE.md privacy rule. Real-user captures stay local in ~/.gbrain/baselines/ on each user's machine and never enter the public benchmark surface.

Refresh discipline (gbrain D4)

When a ranking change intentionally moves expected slugs, edit the qrels or regenerate the baseline, then include a Why: line in the commit body so future maintainers can audit the trail. Without that discipline, the gate degrades to rubber-stamp within months.

Test plan

Both files parse cleanly through gbrain's v0.41 parseBaselineFile + parseQrelsFile (verified locally).
Generator is deterministic — re-running with the same GBRAIN_SRC produces byte-identical output.
When gbrain v0.41.0.0 lands on master, run gbrain eval gate --baseline baselines/v0.41-launch.baseline.ndjson --qrels qrels/v0.41-launch.qrels.json against a known-good gbrain build and confirm exit 0.

🤖 Generated with Claude Code

Coordinated drop alongside gbrain v0.41.0.0. Both files are hermetic-synthetic — placeholder names only per gbrain D9 privacy posture. No real user queries, people, or companies. - baselines/v0.41-launch.baseline.ndjson — 12 captured rows from a fixture-seeded brain (source_hash 34e88041..., mean latency 27ms). Consumed by gbrain eval gate --baseline. Catches retrieval REGRESSIONS during refactors. - qrels/v0.41-launch.qrels.json — 12 hand-curated queries with known- right answers (promoted from gbrain's existing test/fixtures/eval-baselines/qrels-search.json). Consumed by gbrain eval gate --qrels. Catches retrieval QUALITY drops via recall@K + first-relevant-hit-rate + expected_top1-hit-rate. - scripts/generate-v0.41-launch.ts — reproducible regenerator. Deterministic: same input + fixed published_at timestamp → byte- identical output. Same recipe usable for future v0.42+ baselines. - baselines/README.md + qrels/README.md — privacy posture, file format, refresh discipline (D4: include a "Why:" line in any commit body that intentionally moves expected slugs). This closes the LOOP gbrain v0.41 ships: users can now point CI at these files via gbrain eval gate --baseline X --qrels Y and fail PRs on retrieval regressions OR correctness drops without bootstrapping their own baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan mentioned this pull request May 24, 2026

v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP garrytan/gbrain#1352

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41-launch: hermetic baseline + qrels for gbrain eval gate#13

v0.41-launch: hermetic baseline + qrels for gbrain eval gate#13
garrytan wants to merge 1 commit into
mainfrom
garrytan/v0.41-launch-baselines

garrytan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 24, 2026

Summary

What's in the box

Privacy posture (gbrain D9)

Refresh discipline (gbrain D4)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant