-
Notifications
You must be signed in to change notification settings - Fork 341
Failed MCP servers from coding agent settings should not break unrelated agentic workflows #21813
Description
Problem
An agentic workflow (daily-team-status) hard-failed because an MCP server defined in the repo's .mcp.json failed to launch inside the gh-aw sandbox — even though the workflow never requested that server.
Failed run: https://github.com/RealPage/task-management/actions/runs/23258101607
Root Cause
The repo had a .mcp.json at the root configuring a SonarQube MCP server that launches via Docker. The coding agent auto-discovers .mcp.json and attempts to start all servers defined in it. Inside the gh-aw chroot sandbox:
- Docker is unavailable — the sandbox is a chroot jail, not a full Docker environment. The Docker socket is not mounted, so the
docker runcommand fails immediately. - The image is not pre-pulled — the
Download container imagesstep only pulls gh-aw infrastructure images, not project-specific MCP images. - Docker Hub is not on the firewall allowlist — even if Docker were available,
registry-1.docker.io/production.cloudflare.docker.comare not in--allow-domains, so image pulls would be blocked.
The failure happens at the Docker binary/socket level before any network call — confirmed by firewall logs showing 0 blocked requests and only 2 unique domains (api.anthropic.com, raw.githubusercontent.com).
Error
ERR_API: MCP server(s) failed to launch: sonarqube
This error annotation marks the entire job as failure, despite the agent completing successfully and producing all expected outputs.
Workaround
We removed .mcp.json from version control and added it to .gitignore (https://github.com/RealPage/task-management/pull/1129). Developers recreate it locally. This works but sacrifices the "just works on clone" experience for project-specific MCP servers.
Request
MCP server launch failures should not be fatal for servers that are not explicitly declared in the workflow's tools: frontmatter. Two possible approaches:
- Preferred: Make auto-discovered MCP servers (from
.mcp.json) non-fatal for agentic workflows that don't declare them intools:— treat them as best-effort - Alternative: Support an
optional: trueflag per MCP server in.mcp.json, so repos can mark servers that should not block workflows on failure
Impact
Any repo with a .mcp.json that includes servers requiring Docker, local secrets, or external connectivity will break all scheduled agentic workflows — even workflows that have no dependency on those servers.