Skip to content

Feature request: forbidden_skills for skill_invocation grader #286

Description

@sandersaares

Add a forbidden_skills field to the skill_invocation grader, and relax the requirement that required_skills be non-empty. This would let evaluators express "skill X must not be invoked here, but other skills are fine" — the natural shape for negative-trigger tasks. Today this can only be approximated with behavior + forbidden_tools: [skill], which over-forbids by rejecting every skill invocation regardless of name.

Motivation: negative trigger tasks

We run "trigger-precision" evals: prompts paired with an expectation about whether a particular skill should be invoked. For each skill we have:

  • Positive tasks — the prompt should activate skill S. Expressed today with:
    - type: skill_invocation
      name: S-invoked
      config:
        required_skills: [S]
        mode: any_order
        allow_extra: true
  • Negative tasks — the prompt should not activate skill S. The accurate question is "skill S was not invoked"; we don't actually care whether the agent reached for some other (unrelated) skill, since that might still be the right thing to do for the prompt.

The closest thing today is the behavior grader:

- type: behavior
  name: no-skill-invoked
  config:
    forbidden_tools: [skill]

That works only because:

  1. behavior looks at tool names, not arguments — so forbidden_tools: [skill] forbids all skill invocations.
  2. The eval CWD currently exposes only one discoverable skill, so "any skill" and "this skill" are equivalent.

Both are accidents of our current setup, not a faithful expression of the test. As soon as additional skills are discoverable (own repo grows, config.skill_directories adds external sets), the grader starts producing false negatives on negative tasks: the agent legitimately invokes an unrelated skill and our negative-trigger task fails.

Proposal

Add forbidden_skills and make required_skills optional (default []):

- type: skill_invocation
  name: S-not-invoked
  config:
    forbidden_skills: [S]
    allow_extra: true

Reading: "skill S must not appear in runs[].skill_invocations; any other skill invocations are fine; no invocation at all is also fine."

Semantics with existing fields

required_skills forbidden_skills allow_extra Meaning
[A, B] [] true (today) A and B must fire; others are fine.
[A, B] [] false (today) A and B must fire; no others.
[] [X] true (new) X must not fire; others (including none) are fine. ← negative-trigger case
[A] [X] true (new) A must fire, X must not, others are fine. ← multi-skill routing tests
[A] [X] false (new) A must fire, X must not, nothing else may fire either.
[] [X] false Arguably meaninglessallow_extra: false with empty required_skills already implies "no skill may fire", which subsumes the prohibition on X. Either reject this combination with a validation error, or treat it as equivalent to [] / [] / false (no skills allowed at all).
[] [] false Edge case worth specifying — could mean "no skill may fire" (most useful for full-suite hygiene), or could be rejected as under-specified.

Suggested validation: require at least one of required_skills or forbidden_skills to be non-empty; otherwise the grader has nothing to check.

Scoring

A minimal interpretation:

  • Each entry in forbidden_skills is one check; it passes iff that skill is absent from runs[].skill_invocations.
  • Combined with existing scoring (F1 over required_skills, optional allow_extra penalty), the composite score remains passed_checks / total_checks-style or weighted average — whichever fits Waza's current shape best.

The mode field could remain meaningful only when required_skills is non-empty; when only forbidden_skills is set, mode is ignored (or required to be omitted).

Why this is better than alternatives we considered

  • A second behavior grader entry per task - not viable: behavior is tool-name-scoped and can't filter by skill name.
  • An LLM prompt grader - works in theory but adds judge cost and non-determinism to a tier whose whole point is being cheap and fast.
  • A custom program grader - works (we'd parse runs[].skill_invocations from JSON) but is boilerplate every adopter would re-invent. The semantics belong in the built-in grader.

Environment

  • Waza 0.31.0
  • executor: copilot-sdk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions