Comparing changes

The integration test step runs `waza run` with the mock executor, which produces generic output that won't match output_contains expectations. This is expected — the test validates that waza completes without crashing, not that mock evals pass. Root cause: PR #203 (v0.27.0) wired up evaluateExpectations() which made output_contains checks actually execute. Before that, these fields were defined but never evaluated, so the integration test passed silently. Exit code 1 (eval failures) is now allowed. Exit codes >1 (crashes, panics) still fail CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace placeholder assertion with 'len(output) > 0' which is valid Python syntax for the inline script grader eval_wrapper.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Create focused 5-minute Quick Start page at site/src/content/docs/quick-start.mdx - Add installation options (binary, from source, azd extension) - Include authentication, first skill creation, minimal eval YAML - Add Mermaid workflow diagram - Include workflow steps: install → auth → create → write → run → view - Place Quick Start as first item in sidebar navigation - Update homepage to prominently link Quick Start guide - Site builds successfully with 17 pages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…#206) Audit findings: - All primary user config loaders (LoadBenchmarkSpec, LoadTestCase, ParseSpec, ProjectConfig.Load, suggest.validateEvalYAML, jsonrpc eval validate) already use decoder.KnownFields(true) — strict. - Two yaml.Unmarshal calls in cmd_coverage.go and cmd_check.go are intentional partial parses (only read subset of fields); making them strict would break valid eval.yaml files. - internal/generate, internal/skill, internal/validation use non-strict parsing by design (frontmatter extensibility, schema probing, generic any decode). Changes: - Add TestLoadTestCase_UnknownFieldRejected proving bogus fields are rejected by LoadTestCase's KnownFields(true) decoder. - Remove broken TestLoadTestCase_FollowUpPrompts that referenced non-existent TestStimulus.FollowUps field (prevented package compilation on main). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: allow trigger tests to terminate early on skill invocation #188 Add CancelOnSkillInvocation flag to ExecutionRequest that cancels the execution context as soon as a SkillInvoked event is received. This allows trigger tests to return immediately once the target skill fires, instead of waiting for the agent to complete its full turn. Implementation: - Add onSkillInvoked callback to SessionEventsCollector - Wire up context cancellation in CopilotEngine.Execute when flag is set - Trigger runner sets CancelOnSkillInvocation=true on all test prompts - Context cancellation from skill invocation is treated as success Tests: - SessionEventsCollector callback fires on skill invocation - CopilotEngine cancels SendAndWait early when skill invoked - CopilotEngine completes normally when no skill fires (flag is safe) - Trigger runner sets the CancelOnSkillInvocation flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address lint and race detector issues - Rename cancelledForSkill to canceledForSkill (American spelling) - Fix 'cancelled' misspelling in comments - Add mutex to capturingEngine to prevent data race in trigger tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add Quick Start guide to documentation site - Create focused 5-minute Quick Start page at site/src/content/docs/quick-start.mdx - Add installation options (binary, from source, azd extension) - Include authentication, first skill creation, minimal eval YAML - Add Mermaid workflow diagram - Include workflow steps: install → auth → create → write → run → view - Place Quick Start as first item in sidebar navigation - Update homepage to prominently link Quick Start guide - Site builds successfully with 17 pages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add `waza models` command to list available models (#141) Add a new `waza models` command that queries the Copilot SDK for available models and displays them as a formatted table (or JSON with --json flag). The table shows model ID, name, vision support, and context window size. Changes: - Add ListModels to CopilotClient interface and copilotClientWrapper - Add ListModels method on CopilotEngine - Regenerate gomock mocks for both internal and cmd packages - Register `models` subcommand in root.go - Handle auth errors gracefully ("run copilot login first") - Add 7 tests covering table output, JSON, empty list, auth errors, backend errors, and token formatting - Update CLI reference docs (site/src/content/docs/reference/cli.mdx) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: gofmt cmd_models.go and cmd_models_test.go Run gofmt to fix formatting issues flagged by CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address lint errors in models command Add //nolint:errcheck for fmt.Fprintln/Fprintf calls matching existing patterns in cmd_check.go. Run gofmt on both files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add follow_up_prompts field to TestStimulus for multi-turn eval scenarios. Follow-up prompts reuse the same session and workspace, preserving conversation history and file changes across turns. Changes: - Add FollowUps []string to TestStimulus (yaml: follow_up_prompts) - Add WorkspaceDir to ExecutionRequest for workspace reuse - Update CopilotEngine to skip setupWorkspace when WorkspaceDir is set - Update MockEngine to support workspace reuse and SessionID passthrough - Add executeFollowUps() to orchestration runner with result aggregation - Add 3 YAML parsing tests and 5 orchestration tests - Update JSON schema, eval-yaml guide, and schema reference docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Apr 21, 2026

This comparison is taking too long to generate.

Uh oh!