Skip to content

docs: Eval & Grader Registry design doc (#13)#337

Merged
spboyer merged 7 commits into
mainfrom
spboyer/issue-13-eval-registry-design
Jun 23, 2026
Merged

docs: Eval & Grader Registry design doc (#13)#337
spboyer merged 7 commits into
mainfrom
spboyer/issue-13-eval-registry-design

Conversation

@spboyer

@spboyer spboyer commented Jun 19, 2026

Copy link
Copy Markdown
Member

Closes #13. Refs #15, #17, #18.

Design-only doc for a shared eval & grader registry — waza's #1 competitive gap vs. OpenAI Evals. No code changes.

What's inside

docs/design/13-eval-registry.md covers:

  • feat: Go-module-style grader/eval references #15 — Go-module-style refs: host/path@version#subpath syntax, SemVer + eval.lock.yaml for reproducibility, content-addressed cache, gh auth token / env-var auth, flat transitive-dep resolution.
  • feat: Composable eval construction from registry graders #17 — Composable construction: waza registry search/add/get/sync/list CLI, federated index file, deep-merge override rules, waza init --grader scaffolding.
  • feat: Grader plugin extensibility (WASM/external programs) #18 — Plugin extensibility: WGP/1 (Waza Grader Protocol) over two runtimes — WASM (sandboxed via wasmtime, preferred for registry-distributed graders) and program (formalized bring-your-own-binary). Go plugins and embedded scripting rejected with rationale.
  • Security model (digest pinning, sandbox limits, no-secret lockfile), backward-compat impact (additive ref: field on GraderConfig, schema update, results.json source field), and a 5-phase rollout where each phase is independently shippable:
    1. Spec & schema
    2. Local-only resolver + program runtime (validates the contract without picking a backend)
    3. Git backend + auth + cache
    4. WASM runtime + sandbox
    5. Hardening (sigstore, vet, OCI backend)
  • Decision matrix (D1–D10), open questions, rejected alternatives, example end-state eval.yaml and eval.lock.yaml.

Path note

Issue body specified docs/research/waza-eval-registry-design.md, but docs/research/ is gitignored (.gitignore line 116, "Internal research docs"). The validation comment on #13 explicitly asked which location is canonical (docs/design/ vs docs/plans/). I placed the doc at docs/design/13-eval-registry.md to match existing convention (135-improve-concurrency.md, 194-baseline-skill-impact.md). Happy to move if the team prefers docs/plans/.

Out of scope (intentional)

Review asks

  • Sanity-check the WASM-vs-program split (D9, §7.1). The rationale for rejecting Go plugins and embedded scripting is in §13.
  • Confirm the lockfile + --frozen for CI approach matches how teams want to consume registry graders.
  • Confirm the auth model (D4): gh auth token for GitHub by default, env-var overrides per host, never store secrets in ~/.waza/credentials.yaml.
  • Decide canonical doc location (docs/design/ is my recommendation).

Adds docs/design/13-eval-registry.md covering the design for a
shared eval and grader registry. Design-only; no implementation.

Note: issue #13 asked for docs/research/, but that path is
gitignored. Placed in docs/design/ to match existing convention
(135-improve-concurrency.md, 194-baseline-skill-impact.md) and
to answer the open question from the issue validation comment.

Decisions cover sub-issues:
- #15 Go-module-style refs: ref syntax, SemVer + lockfile,
  content-addressed cache, gh/env auth, flat transitive deps.
- #17 Composable eval construction: registry search/add/get/sync,
  deep-merge override rules, waza init --grader scaffolding.
- #18 Plugin extensibility: WGP/1 protocol over WASM (sandboxed)
  and program (bring-your-own-binary); Go plugins and embedded
  scripting rejected with rationale.

Includes security model, backward-compat impact, a 5-phase
rollout (spec, local resolver, git backend, WASM runtime,
hardening), open questions, rejected alternatives, and example
end-state YAML + lockfile. Backend selection (#16) deferred to
start of Phase 2.

Refs #13 #15 #17 #18
Copilot AI review requested due to automatic review settings June 19, 2026 13:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design document proposing a shared eval & grader registry for waza, covering reference syntax/lockfiles, registry discovery & composition UX, and an extensibility model (WASM + external program protocol) intended to close the “shared registry” competitive gap vs. OpenAI Evals.

Changes:

  • Introduces a full design doc for registry refs (host/path@version#subpath), caching, lockfiles, auth, and transitive deps (#15).
  • Specifies CLI UX for discovery/composition (waza registry search/add/get/sync/list) and deep-merge override rules (#17).
  • Proposes a plugin model and security posture (WGP/1 + WASM sandbox) with a phased rollout plan (#18).
Show a summary per file
File Description
docs/design/13-eval-registry.md New design doc describing the eval/grader registry architecture, UX, security model, and rollout phases.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 4

Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
spboyer and others added 2 commits June 19, 2026 18:58
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 17:59
spboyer and others added 2 commits June 19, 2026 18:59
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 2

Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated

@spboyer spboyer left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a registry design doc; the structure is solid, but several spec details would mislead or weaken implementation.

Issues to address:

  • docs/design/13-eval-registry.md:119 - subpath grammar lacks explicit traversal and symlink-escape rejection
  • docs/design/13-eval-registry.md:441 - compromised-index mitigation overstates first-use integrity guarantees
  • docs/design/13-eval-registry.md:158 - cache path is Linux/XDG-only instead of OS cache-dir based
  • docs/design/13-eval-registry.md:176 - GitLab credential command does not return a token
  • docs/design/13-eval-registry.md:348 - wasmtime-go static dependency claim misses CGO/cross-compile tradeoffs
  • docs/design/13-eval-registry.md:477 - GitHub refs rely on remote git archive, which GitHub does not support
  • docs/design/13-eval-registry.md:309 - summary says one runtime but design uses WASM plus program runtimes

Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer marked this pull request as ready for review June 22, 2026 15:20
Copilot AI review requested due to automatic review settings June 22, 2026 15:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 3

Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Comment thread docs/design/13-eval-registry.md Outdated
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer merged commit b97f078 into main Jun 23, 2026
8 checks passed
@spboyer spboyer deleted the spboyer/issue-13-eval-registry-design branch June 23, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Eval & Grader Registry — design doc

3 participants