Skip to content

fix(mcp): HTTP 4xx auth errors wrapped into McpError::Connection are retried as transient #3579

@bug-ops

Description

@bug-ops

Description

HTTP 401/403 responses from remote MCP endpoints are currently wrapped into McpError::Connection by McpClient::connect_url. Because is_retryable_connect_error classifies McpError::Connection as retryable, a misconfigured server with wrong credentials will be retried up to max_connect_attempts times before failing.

This is an operational hazard: startup becomes slower by design on auth misconfiguration, and the log output suggests a transient network issue rather than a credentials problem.

Reproduction Steps

  1. Configure an MCP server with wrong bearer token or revoked credentials
  2. Start Zeph
  3. Observe: server retried 3 times with backoff before failing; WARN logs say "retrying" not "auth failed"

Expected Behavior

HTTP 4xx responses (except 408 Request Timeout, 425 Too Early, 429 Too Many Requests) should surface as a non-retryable error with a clear "authentication failed" or "authorization denied" message.

Actual Behavior

Auth errors are retried up to max_connect_attempts times with full exponential backoff, adding up to 1.5 s (default) or 47 s (max) to startup before the server is marked unavailable.

Proposed Fix

Add an HttpStatus(u16) variant to McpError::Connection or a dedicated McpError::AuthRequired { server_id, status: u16 } variant. Map 4xx responses (excluding 408/425/429) to None in is_retryable_connect_error.

Environment

Labels

P3, bug, mcp

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions