You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Skills increasingly call MCP tools. Today, evaluating an MCP-using skill in CI requires:
A real MCP server (network, auth, side effects, flaky).
Or hand-rolled stubs per skill (no shared shape).
There's no waza-native way to stand up a deterministic MCP mock for an eval run. Tool-call assertions and record/replay of MCP traffic are handled by sibling issues (see "Non-goals" below).
Proposal
Ship a built-in MCP mock server that waza launches alongside the eval:
MCP is the primary tool surface for agents. Without deterministic MCP mocking, every MCP-using skill is untestable in CI — or worse, "tested" against a live service that mutates state. This is the single biggest blocker to CI-first agentic skill development.
Acceptance criteria
mcp_mocks: field in eval schema; backward compatible.
Mock server runs in-process and registers as an MCP server the skill can call.
Fixture matching: exact-args (default), JSON-schema, regex on individual fields.
Unknown/unmatched calls fail the task with a clear error pointing at the missing fixture.
No live network required in CI; tests in internal/ verify this.
Problem
Skills increasingly call MCP tools. Today, evaluating an MCP-using skill in CI requires:
There's no waza-native way to stand up a deterministic MCP mock for an eval run. Tool-call assertions and record/replay of MCP traffic are handled by sibling issues (see "Non-goals" below).
Proposal
Ship a built-in MCP mock server that waza launches alongside the eval:
config.mcp_serversshape).Why this matters for agentic-first
MCP is the primary tool surface for agents. Without deterministic MCP mocking, every MCP-using skill is untestable in CI — or worse, "tested" against a live service that mutates state. This is the single biggest blocker to CI-first agentic skill development.
Acceptance criteria
mcp_mocks:field in eval schema; backward compatible.internal/verify this.site/with a worked MCP eval example.Non-goals (filed separately)
Related
config.mcp_servers