Bug: mcp_servers config section does not connect MCP server to agent session
Summary
When configuring an MCP server via the mcp_servers section in eval.yaml, the MCP server is not connected to the agent session. The agent can make regular tool calls (view, edit, powershell, etc.) but cannot use any MCP tools (e.g. azsdk_run_typespec_validation, azsdk_typespec_generate_authoring_plan).
The agent falls back to running tsp compile . via powershell instead of using the MCP tool azsdk_run_typespec_validation, which causes the mcp_workflow grader to fail.
Related Issue
Environment
- waza version: latest
- OS: Windows
- Model: claude-opus-4.6-1m
Eval Config
I'm referring to this document to set the mcp_servers section. https://github.com/microsoft/waza/blob/main/docs/INTEGRATION-TESTING.md
config:
trials_per_task: 1
timeout_seconds: 1000
parallel: false
executor: copilot-sdk
model: claude-opus-4.6-1m
mcp_servers:
azure-sdk-mcp:
type: stdio
command: pwsh
args: ["C:\\workspace\\azure-rest-api-specs\\eng\\common\\mcp\\azure-sdk-mcp.ps1", "-Run"]
Observed Behavior
The eval report shows the agent made 19 tool calls but none were MCP tools:
tools_used: [skill, view, view, view, view, view, view, view, view, view, view, web_fetch, view, edit, powershell, powershell, powershell, powershell, view]
Key observations:
- The agent correctly invoked the
azure-typespec-author skill (routing_check passed)
- The agent used
powershell to run tsp compile . manually instead of using the MCP tool azsdk_run_typespec_validation
- No MCP tools appear in the tools_used list, confirming the MCP server was not connected
Grader Results
| Grader |
Type |
Score |
Passed |
Feedback |
routing_check |
skill_invocation |
1.0 |
✅ |
Skill invocation sequence matched |
mcp_workflow |
action_sequence |
0.095 |
❌ |
Expected [edit, azsdk_run_typespec_validation] in order, but azsdk_run_typespec_validation never appeared |
version_enum_check |
diff |
0.33 |
❌ |
Missing expected fragments (related to #165) |
The mcp_workflow grader details:
{
"actual_actions": ["skill", "view", "view", "view", "view", "view", "view", "view", "view", "view", "view", "web_fetch", "view", "edit", "powershell", "powershell", "powershell", "powershell", "view"],
"expected_actions": ["edit", "azsdk_run_typespec_validation"],
"matching_mode": "in_order_match",
"precision": 0.053,
"recall": 0.5
}
The edit action was found (recall 0.5) but azsdk_run_typespec_validation was never called because the MCP server wasn't available.
Expected Behavior
The mcp_servers section should:
- Start the MCP server process (via
pwsh ... azure-sdk-mcp.ps1 -Run)
- Connect it to the agent session so MCP tools are available as callable tools
- The agent should be able to invoke MCP tools like
azsdk_run_typespec_validation
Summary Stats
{
"total_tests": 1,
"succeeded": 0,
"failed": 1,
"success_rate": 0,
"aggregate_score": 0.476,
"duration_ms": 171610,
"usage": {
"turns": 13,
"input_tokens": 307342,
"output_tokens": 3385,
"cache_read_tokens": 220527,
"premium_requests": 78
}
}
Questions
- Is
mcp_servers the correct way to provide MCP servers to the agent in evals?
- Does the
type: stdio + command: pwsh + args format work correctly on Windows?
- Is there any logging I can enable to see whether the MCP server process is actually spawned and connected?
- Are MCP tools supposed to appear in the
tools_used list in the session digest?
Full report
claude-opus-4.6-1m.json
Bug:
mcp_serversconfig section does not connect MCP server to agent sessionSummary
When configuring an MCP server via the
mcp_serverssection ineval.yaml, the MCP server is not connected to the agent session. The agent can make regular tool calls (view,edit,powershell, etc.) but cannot use any MCP tools (e.g.azsdk_run_typespec_validation,azsdk_typespec_generate_authoring_plan).The agent falls back to running
tsp compile .viapowershellinstead of using the MCP toolazsdk_run_typespec_validation, which causes themcp_workflowgrader to fail.Related Issue
Environment
Eval Config
I'm referring to this document to set the
mcp_serverssection. https://github.com/microsoft/waza/blob/main/docs/INTEGRATION-TESTING.mdObserved Behavior
The eval report shows the agent made 19 tool calls but none were MCP tools:
Key observations:
azure-typespec-authorskill (routing_checkpassed)powershellto runtsp compile .manually instead of using the MCP toolazsdk_run_typespec_validationGrader Results
routing_checkmcp_workflow[edit, azsdk_run_typespec_validation]in order, butazsdk_run_typespec_validationnever appearedversion_enum_checkThe
mcp_workflowgrader details:{ "actual_actions": ["skill", "view", "view", "view", "view", "view", "view", "view", "view", "view", "view", "web_fetch", "view", "edit", "powershell", "powershell", "powershell", "powershell", "view"], "expected_actions": ["edit", "azsdk_run_typespec_validation"], "matching_mode": "in_order_match", "precision": 0.053, "recall": 0.5 }The
editaction was found (recall 0.5) butazsdk_run_typespec_validationwas never called because the MCP server wasn't available.Expected Behavior
The
mcp_serverssection should:pwsh ... azure-sdk-mcp.ps1 -Run)azsdk_run_typespec_validationSummary Stats
{ "total_tests": 1, "succeeded": 0, "failed": 1, "success_rate": 0, "aggregate_score": 0.476, "duration_ms": 171610, "usage": { "turns": 13, "input_tokens": 307342, "output_tokens": 3385, "cache_read_tokens": 220527, "premium_requests": 78 } }Questions
mcp_serversthe correct way to provide MCP servers to the agent in evals?type: stdio+command: pwsh+argsformat work correctly on Windows?tools_usedlist in the session digest?Full report
claude-opus-4.6-1m.json