Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: microsoft/waza
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.27.0
Choose a base ref
...
head repository: microsoft/waza
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.28.0
Choose a head ref
  • 8 commits
  • 28 files changed
  • 2 contributors

Commits on Apr 21, 2026

  1. fix: CI integration test allows eval failures with mock executor (#210)

    The integration test step runs `waza run` with the mock executor,
    which produces generic output that won't match output_contains
    expectations. This is expected — the test validates that waza
    completes without crashing, not that mock evals pass.
    
    Root cause: PR #203 (v0.27.0) wired up evaluateExpectations() which
    made output_contains checks actually execute. Before that, these
    fields were defined but never evaluated, so the integration test
    passed silently.
    
    Exit code 1 (eval failures) is now allowed. Exit codes >1 (crashes,
    panics) still fail CI.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    75b2538 View commit details
    Browse the repository at this point in the history
  2. fix: use valid Python expression in test fixture assertion (#197)

    Replace placeholder assertion with 'len(output) > 0' which is valid
    Python syntax for the inline script grader eval_wrapper.py.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    f9c575c View commit details
    Browse the repository at this point in the history
  3. docs: add Quick Start guide to documentation site (#205)

    - Create focused 5-minute Quick Start page at site/src/content/docs/quick-start.mdx
    - Add installation options (binary, from source, azd extension)
    - Include authentication, first skill creation, minimal eval YAML
    - Add Mermaid workflow diagram
    - Include workflow steps: install → auth → create → write → run → view
    - Place Quick Start as first item in sidebar navigation
    - Update homepage to prominently link Quick Start guide
    - Site builds successfully with 17 pages
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    2c01028 View commit details
    Browse the repository at this point in the history
  4. fix: audit YAML validation and add TestCase unknown field test (#132) (

    …#206)
    
    Audit findings:
    - All primary user config loaders (LoadBenchmarkSpec, LoadTestCase,
      ParseSpec, ProjectConfig.Load, suggest.validateEvalYAML, jsonrpc
      eval validate) already use decoder.KnownFields(true) — strict.
    - Two yaml.Unmarshal calls in cmd_coverage.go and cmd_check.go are
      intentional partial parses (only read subset of fields); making
      them strict would break valid eval.yaml files.
    - internal/generate, internal/skill, internal/validation use
      non-strict parsing by design (frontmatter extensibility, schema
      probing, generic any decode).
    
    Changes:
    - Add TestLoadTestCase_UnknownFieldRejected proving bogus fields
      are rejected by LoadTestCase's KnownFields(true) decoder.
    - Remove broken TestLoadTestCase_FollowUpPrompts that referenced
      non-existent TestStimulus.FollowUps field (prevented package
      compilation on main).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    29149f6 View commit details
    Browse the repository at this point in the history
  5. feat: allow trigger tests to terminate early on skill invocation (#207)

    * feat: allow trigger tests to terminate early on skill invocation #188
    
    Add CancelOnSkillInvocation flag to ExecutionRequest that cancels the
    execution context as soon as a SkillInvoked event is received. This
    allows trigger tests to return immediately once the target skill fires,
    instead of waiting for the agent to complete its full turn.
    
    Implementation:
    - Add onSkillInvoked callback to SessionEventsCollector
    - Wire up context cancellation in CopilotEngine.Execute when flag is set
    - Trigger runner sets CancelOnSkillInvocation=true on all test prompts
    - Context cancellation from skill invocation is treated as success
    
    Tests:
    - SessionEventsCollector callback fires on skill invocation
    - CopilotEngine cancels SendAndWait early when skill invoked
    - CopilotEngine completes normally when no skill fires (flag is safe)
    - Trigger runner sets the CancelOnSkillInvocation flag
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: address lint and race detector issues
    
    - Rename cancelledForSkill to canceledForSkill (American spelling)
    - Fix 'cancelled' misspelling in comments
    - Add mutex to capturingEngine to prevent data race in trigger tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    956beaa View commit details
    Browse the repository at this point in the history
  6. feat: add waza models command to list available models (#208)

    * docs: add Quick Start guide to documentation site
    
    - Create focused 5-minute Quick Start page at site/src/content/docs/quick-start.mdx
    - Add installation options (binary, from source, azd extension)
    - Include authentication, first skill creation, minimal eval YAML
    - Add Mermaid workflow diagram
    - Include workflow steps: install → auth → create → write → run → view
    - Place Quick Start as first item in sidebar navigation
    - Update homepage to prominently link Quick Start guide
    - Site builds successfully with 17 pages
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: add `waza models` command to list available models (#141)
    
    Add a new `waza models` command that queries the Copilot SDK for
    available models and displays them as a formatted table (or JSON with
    --json flag). The table shows model ID, name, vision support, and
    context window size.
    
    Changes:
    - Add ListModels to CopilotClient interface and copilotClientWrapper
    - Add ListModels method on CopilotEngine
    - Regenerate gomock mocks for both internal and cmd packages
    - Register `models` subcommand in root.go
    - Handle auth errors gracefully ("run copilot login first")
    - Add 7 tests covering table output, JSON, empty list, auth errors,
      backend errors, and token formatting
    - Update CLI reference docs (site/src/content/docs/reference/cli.mdx)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: gofmt cmd_models.go and cmd_models_test.go
    
    Run gofmt to fix formatting issues flagged by CI.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: address lint errors in models command
    
    Add //nolint:errcheck for fmt.Fprintln/Fprintf calls matching
    existing patterns in cmd_check.go. Run gofmt on both files.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    47166fa View commit details
    Browse the repository at this point in the history
  7. feat: support pre-written follow-up prompts in eval YAML #189 (#209)

    Add follow_up_prompts field to TestStimulus for multi-turn eval scenarios.
    Follow-up prompts reuse the same session and workspace, preserving
    conversation history and file changes across turns.
    
    Changes:
    - Add FollowUps []string to TestStimulus (yaml: follow_up_prompts)
    - Add WorkspaceDir to ExecutionRequest for workspace reuse
    - Update CopilotEngine to skip setupWorkspace when WorkspaceDir is set
    - Update MockEngine to support workspace reuse and SessionID passthrough
    - Add executeFollowUps() to orchestration runner with result aggregation
    - Add 3 YAML parsing tests and 5 orchestration tests
    - Update JSON schema, eval-yaml guide, and schema reference docs
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    c37f404 View commit details
    Browse the repository at this point in the history
  8. chore: update CODEOWNERS to single owner (#211)

    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    spboyer and Copilot authored Apr 21, 2026
    Configuration menu
    Copy the full SHA
    b1acf61 View commit details
    Browse the repository at this point in the history
Loading