feat: add MCP server mocks (closes #363) by spboyer · Pull Request #387 · microsoft/waza

spboyer · 2026-06-28T12:13:57Z

Summary

Add top-level mcp_mocks eval schema support gated to schemaVersion: "1.1".
Launch deterministic waza-managed stdio MCP mock servers for Copilot SDK evals, with inline/fixture-backed tools and exact, JSON Schema, and per-field regex response matching.
Surface clear MCP tool errors for unknown tools and unmatched calls, and document hermetic MCP eval usage across README, site docs, CLI reference, and integration testing docs.

Test plan

go build ./... && go test ./... && go vet ./... && golangci-lint run
cd site && npm run build

⚠️ This task was flagged as "needs review" — please have a squad member review before merging.

Closes #363

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds first-class, hermetic MCP mocking to waza evals (gated behind schemaVersion: "1.1"), enabling deterministic Copilot SDK runs without relying on live MCP services (network/auth/state).

Changes:

Extend eval spec + JSON schema with top-level mcp_mocks and validate it requires schema v1.1.
Implement a stdio MCP mock server (spawned via the waza binary) with exact / JSON Schema / per-field regex argument matching.
Document MCP mocks across README, site docs, CLI reference, and integration testing docs; add targeted tests for config wiring.

Show a summary per file

File	Description
site/src/content/docs/reference/schema.mdx	Documents `mcp_mocks` in the schema reference and points readers away from live `config.mcp_servers` for hermetic evals.
site/src/content/docs/reference/cli.mdx	Adds an example `waza run` invocation for hermetic MCP mock usage.
site/src/content/docs/guides/eval-yaml.mdx	Adds a worked guide section for `mcp_mocks` and response matching semantics.
schemas/eval.schema.json	Adds JSON Schema definitions for `mcp_mocks` / tools / response matchers.
README.md	Documents `mcp_mocks` and the supported matching modes at the repo level.
docs/INTEGRATION-TESTING.md	Documents hermetic MCP mock servers for CI-safe Copilot SDK evals.
internal/models/spec.go	Adds `MCPMocks` to `EvalSpec` and validates schema gating + uniqueness.
internal/models/spec_test.go	Adds tests for schemaVersion gating and duplicate mock name rejection.
internal/copilotconfig/mcp.go	Adds `ConvertMCPServersWithMocks` and wires mocks into Copilot SDK MCP server configs.
internal/copilotconfig/mcp_test.go	Verifies mock conversion produces a hermetic stdio server config and preserves regular servers.
internal/mcpmock/config.go	Resolves `mcp_mocks` entries from inline definitions and/or JSON fixture directories.
internal/mcpmock/server.go	Implements MCP JSON-RPC handling for `initialize`, `tools/list`, and `tools/call` with ordered response matching.
internal/mcpmock/server_test.go	Tests fixture directory loading, matching modes, and error surfacing for unknown/unmatched tool calls.
internal/orchestration/runner.go	Wires `mcp_mocks` into execution requests (Copilot SDK MCP servers).
internal/orchestration/runner_test.go	Adds a test ensuring MCP mocks are included in the built execution request.
internal/trigger/runner.go	Threads `mcp_mocks` through MCP server conversion in the trigger runner.
internal/trigger/runner_test.go	Updates conversion tests for the new `convertMCPServers` signature.
cmd/waza/root.go	Registers hidden `__mcp-mock` command used to run the mock server over stdio.
cmd/waza/cmd_mcp_mock.go	Implements the hidden `__mcp-mock` command to decode config and serve stdio MCP.

Review details

Files reviewed: 19/19 changed files
Comments generated: 4
Review effort level: Low

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The MCP mocks test in #387 used an empty schemaVersion and expected the 1.1 error path. Because LoadEvalSpec normalizes empty schemaVersion to the current version (1.1), the test passed validation instead of failing. Make the test explicit by setting schemaVersion: '1.0' to actually trigger the gate, then bump to '1.1' in the second half. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

#388) * feat: per-task tool metrics with structured arg matchers (closes #366) - Add argmatcher package (equals/regex/contains/range/json_schema) - Extend tool_calls grader with expect: [{tool, args}] block - Extend tool_constraint grader with args: matchers on expect_tools - Add normalized tool_events[] to RunResult (turn, sequence, tool_call_id, tool_name, args, result, success, duration_ms, error) populated from session events — replay-friendly for Wave 3 (#367), OTel-aligned - Bump results.json schemaVersion to 1.1 (MINOR additive per #368/#382) - waza compare prints aggregate TOOL USE section (total calls, success rate, avg/task, histogram, selection accuracy) when tool data present - Unit tests for matchers, builder, both graders, schema round-trip, compare metrics + histogram - Docs: graders.mdx (expect/args), schema-changes.md (1.1 entry), cli.mdx (compare TOOL USE), README.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address Copilot review feedback on #388 - graders.mdx: matchers are single-key mappings (no kind field); graders evaluate session_digest.tool_calls (not tool_events[]); range matcher uses gte/lte/gt/lt (not [min, max]). - tool_events.go: stringifyResult comment matches JSON-only behavior. - cmd_compare.go: histogram bucketed per-run (not truncated per-task avg); added 'Tasks w/ tools' row; renamed 'Tasks w/' to 'Runs w/' labels; use tagged switch on runCalls. - schema-changes.md / README.md / schema.mdx: missing schemaVersion is interpreted as the current schema version (1.1), not 1.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: fix MCP mocks schemaVersion test after rebase The MCP mocks test in #387 used an empty schemaVersion and expected the 1.1 error path. Because LoadEvalSpec normalizes empty schemaVersion to the current version (1.1), the test passed validation instead of failing. Make the test explicit by setting schemaVersion: '1.0' to actually trigger the gate, then bump to '1.1' in the second half. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address round-2 Copilot review on #388 - persist compiled matcher in validateToolSpecs (map value semantics) - capture engine-specific tool args via ToolCallArgs.Extra (mapstructure ',remain') - bucket call_count_histogram per-task across trials (not per-run) - rename TOOL USE table label 'Runs w/' -> 'Tasks w/' to match metric - sync README tool_events[] field list with ToolEvent struct - add IsCompiled() accessor + tests for persisted compile, extra args, and per-task histogram with trials_per_task > 1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: add MCP server mocks #363

d3ffb78

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 28, 2026 12:13

Copilot started reviewing on behalf of spboyer June 28, 2026 12:14 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

Comment thread internal/models/spec.go

Comment thread internal/copilotconfig/mcp.go

Comment thread internal/copilotconfig/mcp.go

Comment thread internal/mcpmock/server.go

fix: harden MCP mock config handling #363

463a673

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer merged commit ec8cb62 into main Jun 28, 2026
10 checks passed

spboyer deleted the spboyer-squad-363-mcp-server-mocks branch June 28, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add MCP server mocks (closes #363)#387

feat: add MCP server mocks (closes #363)#387
spboyer merged 2 commits into
mainfrom
spboyer-squad-363-mcp-server-mocks

spboyer commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented Jun 28, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants