fix(mcp): validate remote URLs up-front with a clear error#18132
Closed
teknium1 wants to merge 1 commit into
Closed
fix(mcp): validate remote URLs up-front with a clear error#18132teknium1 wants to merge 1 commit into
teknium1 wants to merge 1 commit into
Conversation
Port from anomalyco/opencode#25019 ("fix: handle invalid mcp urls"). Previously: a typo in `config.yaml` (missing scheme, wrong scheme, empty string, non-string value) slipped past `_is_http()` and hit `httpx.URL(url)` or `streamablehttp_client(url, ...)` deep in the transport layer. That raised a generic exception which went through the reconnect-backoff loop, so a bad URL caused _MAX_INITIAL_CONNECT_RETRIES attempts with doubling backoff — about a minute of pointless retries plus an opaque error — before the server was marked failed. Now: we validate the URL once, at the top of `run()`, before entering the retry loop. A malformed URL raises `InvalidMcpUrlError` (a `ValueError` subclass) with a message that names the offending server and explains exactly what was wrong. `_ready` is set and `_error` is populated, so `start()` re-raises and the server shows up as failed in `hermes mcp list` without any backoff burn. Validation rules: - Must be a string (rejects None, dict, int) - Must be non-empty (rejects '' and whitespace-only) - Scheme must be http or https (rejects file://, ws://, stdio://) - Must have a non-empty host (rejects http:///, http://:8080) Tests (21 new cases in tests/tools/test_mcp_invalid_url.py): - TestValidUrlsAccepted: http, https, IPv6, ports, paths, query strings - TestInvalidUrlsRejected: every rejection path above + clear error text - TestErrorIsValueError: downstream code catching ValueError still works E2E verified: a misconfigured server with `url: not-a-valid-url` now fails in <0.001s with the clear error, instead of minutes of retries. Doesn't touch stdio servers (they use `command`, not `url`) — the validator only fires when `_is_http()` returns True.
Contributor
Author
|
Closing in favor of the merged salvage PR. Conflict resolution: main added MCP image-block helpers between this PR's branch and now, at the same insertion point; both blocks kept since they're independent concerns. 21 new tests pass, 193 regression tests pass, E2E confirms <0.01ms rejection vs the ~60s of pointless retries the old path caused. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A typo in
config.yaml(missing scheme, wrong scheme, empty string) now fails fast with a clear error naming the offending MCP server, instead of burning_MAX_INITIAL_CONNECT_RETRIES× exponential backoff inside the transport layer with an opaque exception.Ported from anomalyco/opencode#25019 ("fix: handle invalid mcp urls").
Root cause
tools/mcp_tool.py_run_http()passesconfig["url"]verbatim tohttpx.URL(url)and the streamable-http client. A malformed URL raises a generic exception deep in the transport layer which goes through the reconnect-backoff loop (while True: try: ... except Exception:) — so a bad URL costs about a minute of pointless retries with a confusing error before the server is finally marked failed.Changes
tools/mcp_tool.pyInvalidMcpUrlError(ValueError)exception. New_validate_remote_mcp_url(server_name, url)helper (scheme must be http/https, host must be non-empty, must be a string). Wired intoMCPServerTask.run()before the retry loop — malformed URLs short-circuit immediately with_errorset and_readyfired sostart()re-raises the clean error.tests/tools/test_mcp_invalid_url.pyInvalidMcpUrlErroris aValueError, error message names the server.Validation
not-a-valid-url)Invalid MCP URL for 'broken-example': scheme must be http or https…command, nourl)_is_http()is False)scripts/run_tests.sh tests/tools/test_mcp_invalid_url.py→ 21/21 passed.scripts/run_tests.sh tests/tools/test_mcp_stability.py tests/tools/test_mcp_tool.py→ 198/198 passed (no regression in existing MCP tests).HERMES_HOME): bad URL →InvalidMcpUrlErrorin <1ms.Architectural note
OpenCode's fix returned
{ client: undefined, status: { status: "failed", error: "Invalid MCP URL…" } }. Hermes uses the_error/_readypattern for the same outcome — the callerstart()awaits_readyand re-raises_error, sohermes mcp listshows the server as failed with the clear message. No new public API surface.Source
anomalyco/opencode#25019