Skip to content

mcp_servers config section seems do not start/connect MCP server — agent has zero tool calls #180

Description

@haolingdong-msft

Bug: mcp_servers config section does not connect MCP server to agent session

Summary

When configuring an MCP server via the mcp_servers section in eval.yaml, the MCP server is not connected to the agent session. The agent can make regular tool calls (view, edit, powershell, etc.) but cannot use any MCP tools (e.g. azsdk_run_typespec_validation, azsdk_typespec_generate_authoring_plan).

The agent falls back to running tsp compile . via powershell instead of using the MCP tool azsdk_run_typespec_validation, which causes the mcp_workflow grader to fail.

Related Issue

Environment

  • waza version: latest
  • OS: Windows
  • Model: claude-opus-4.6-1m

Eval Config

I'm referring to this document to set the mcp_servers section. https://github.com/microsoft/waza/blob/main/docs/INTEGRATION-TESTING.md

config:
  trials_per_task: 1
  timeout_seconds: 1000
  parallel: false
  executor: copilot-sdk
  model: claude-opus-4.6-1m
  mcp_servers:
    azure-sdk-mcp:
      type: stdio
      command: pwsh
      args: ["C:\\workspace\\azure-rest-api-specs\\eng\\common\\mcp\\azure-sdk-mcp.ps1", "-Run"]

Observed Behavior

The eval report shows the agent made 19 tool calls but none were MCP tools:

tools_used: [skill, view, view, view, view, view, view, view, view, view, view, web_fetch, view, edit, powershell, powershell, powershell, powershell, view]

Key observations:

  • The agent correctly invoked the azure-typespec-author skill (routing_check passed)
  • The agent used powershell to run tsp compile . manually instead of using the MCP tool azsdk_run_typespec_validation
  • No MCP tools appear in the tools_used list, confirming the MCP server was not connected

Grader Results

Grader Type Score Passed Feedback
routing_check skill_invocation 1.0 Skill invocation sequence matched
mcp_workflow action_sequence 0.095 Expected [edit, azsdk_run_typespec_validation] in order, but azsdk_run_typespec_validation never appeared
version_enum_check diff 0.33 Missing expected fragments (related to #165)

The mcp_workflow grader details:

{
  "actual_actions": ["skill", "view", "view", "view", "view", "view", "view", "view", "view", "view", "view", "web_fetch", "view", "edit", "powershell", "powershell", "powershell", "powershell", "view"],
  "expected_actions": ["edit", "azsdk_run_typespec_validation"],
  "matching_mode": "in_order_match",
  "precision": 0.053,
  "recall": 0.5
}

The edit action was found (recall 0.5) but azsdk_run_typespec_validation was never called because the MCP server wasn't available.

Expected Behavior

The mcp_servers section should:

  1. Start the MCP server process (via pwsh ... azure-sdk-mcp.ps1 -Run)
  2. Connect it to the agent session so MCP tools are available as callable tools
  3. The agent should be able to invoke MCP tools like azsdk_run_typespec_validation

Summary Stats

{
  "total_tests": 1,
  "succeeded": 0,
  "failed": 1,
  "success_rate": 0,
  "aggregate_score": 0.476,
  "duration_ms": 171610,
  "usage": {
    "turns": 13,
    "input_tokens": 307342,
    "output_tokens": 3385,
    "cache_read_tokens": 220527,
    "premium_requests": 78
  }
}

Questions

  1. Is mcp_servers the correct way to provide MCP servers to the agent in evals?
  2. Does the type: stdio + command: pwsh + args format work correctly on Windows?
  3. Is there any logging I can enable to see whether the MCP server process is actually spawned and connected?
  4. Are MCP tools supposed to appear in the tools_used list in the session digest?

Full report

claude-opus-4.6-1m.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions