Skip to content

feat: Eval & Grader Registry — design doc #13

Description

@spboyer

Migrated from spboyer/waza#385

Summary

Design a shared eval and grader registry for waza — inspired by OpenAI Evals' 800+ community registry but adapted for agent evaluation.

Context

From docs/research/waza-vs-openai-evals.md, the registry gap is waza's #1 competitive disadvantage (Row 10). This epic covers the full design.

Sub-issues

Peter's Ideas (verbatim)

  • The registry of shared evals is interesting. Graders particularly.
  • OpenAI's are all in their repo as YAML files
  • Consuming their format could be interesting
  • Go module style: just point to a repo and that is your grader or eval
  • Being able to construct your eval from a set of known graders is interesting

Deliverable

Design document at docs/research/waza-eval-registry-design.md — design only, no implementation.

Non-goals (for now)

  • Implementation — this is design research only
  • NOT a single JSON file for the registry — needs to be more robust
  • Not building the actual CLI commands yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepic:evaluationE3: Evaluation Frameworkgo:yesReady to implementquestionFurther information is requestedrelease:backlogNot yet targetedsquad:copilotAssigned to @copilot (Coding Agent) for autonomous worktype:epicParent issue that decomposes into sub-issues

    Fields

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions