Skip to content

fix(mcp): forwarded RPC non-2xx responses masked as success (#3365)#3371

Merged
crivetimihai merged 2 commits intoIBM:mainfrom
omorros:fix/forwarded-rpc-non-2xx-masking-3365
Mar 22, 2026
Merged

fix(mcp): forwarded RPC non-2xx responses masked as success (#3365)#3371
crivetimihai merged 2 commits intoIBM:mainfrom
omorros:fix/forwarded-rpc-non-2xx-masking-3365

Conversation

@omorros
Copy link
Copy Markdown
Contributor

@omorros omorros commented Mar 2, 2026

🔗 Related Issue

Closes #3365


📝 Summary

_execute_forwarded_request() now gates on HTTP status code before parsing the response body. Non-2xx
responses with non-JSON-RPC bodies (e.g. 401 {"detail":"..."}) are mapped to JSON-RPC errors instead of
silently returning {"result": {}}. The two session-affinity tests that depended on localhost port state are
now hermetic via mocked HTTP responses, and two new test cases cover the non-2xx error mapping paths.


🏷️ Type of Change

  • Bug fix
  • Feature / Enhancement
  • Documentation
  • Refactor
  • Chore (deps, CI, tooling)
  • Other (describe below)

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 80% make coverage

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • Tests added/updated for changes
  • Documentation updated (if applicable)
  • No secrets or credentials committed

📓 Notes (optional)

Files changed:

  • mcpgateway/services/mcp_session_pool.py — Added HTTP status check before response parsing in
    _execute_forwarded_request()
  • tests/e2e/test_session_pool_e2e.py — Mocked httpx.AsyncClient in existing tests, added
    test_execute_forwarded_request_returns_error_on_non_2xx_non_jsonrpc and
    test_execute_forwarded_request_propagates_jsonrpc_error_on_non_2xx

@crivetimihai crivetimihai changed the title fix: forwarded RPC non-2xx responses masked as success (#3365) fix(mcp): forwarded RPC non-2xx responses masked as success (#3365) Mar 2, 2026
@crivetimihai crivetimihai added bug Something isn't working MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe mcp-protocol Alignment with MCP protocol or specification labels Mar 2, 2026
@crivetimihai crivetimihai added this to the Release 1.0.0-RC2 milestone Mar 2, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks @omorros. This correctly addresses #3365 — the HTTP status gating before JSON parsing is the right approach, and the error mapping logic (propagate JSON-RPC errors vs. map non-JSON-RPC bodies) is clean. Tests are well-structured. Looks good.

@crivetimihai crivetimihai added the merge-queue Rebased and ready to merge label Mar 3, 2026
@ja8zyjits
Copy link
Copy Markdown
Member

ja8zyjits commented Mar 3, 2026

Hi @omorros,
have you rebased with the main and ran all the tests?
It seems that some tests are failing.
Always run before you create a PR.

$ make lint-web flake8 bandit interrogate pylint verify
$ make autoflake isort black pre-commit
$ make doctest test htmlcov

Regards,
Jitesh

@ja8zyjits ja8zyjits added the do-not-merge Do NOT merge. This requires manual effort, with multiple checks! label Mar 3, 2026
@ja8zyjits
Copy link
Copy Markdown
Member

Hi these changes are needed to be incorporated for all the tests to pass.

diff --git a/tests/unit/mcpgateway/services/test_mcp_session_pool_coverage.py b/tests/unit/mcpgateway/services/test_mcp_session_pool_coverage.py
index b1243307a..90a4d2a2f 100644
--- a/tests/unit/mcpgateway/services/test_mcp_session_pool_coverage.py
+++ b/tests/unit/mcpgateway/services/test_mcp_session_pool_coverage.py
@@ -1495,8 +1495,9 @@ class TestExecuteForwardedRequest:
         pool = MCPSessionPool()

         class DummyResponse:
-            def __init__(self, data):
+            def __init__(self, data, status_code):
                 self._data = data
+                self.status_code = status_code
             def json(self):
                 return self._data

@@ -1512,7 +1513,7 @@ class TestExecuteForwardedRequest:
                 self.post_calls.append({"url": url, "json": json, "headers": headers, "timeout": timeout})
                 return self._response

-        dummy_client = DummyClient(DummyResponse({"jsonrpc": "2.0", "result": {"x": 1}, "id": 1}))
+        dummy_client = DummyClient(DummyResponse({"jsonrpc": "2.0", "result": {"x": 1}, "id": 1}, 200))

         with patch("mcpgateway.services.mcp_session_pool.settings") as mock_settings:
             mock_settings.port = 4444
@@ -1532,8 +1533,9 @@ class TestExecuteForwardedRequest:
         pool = MCPSessionPool()

         class DummyResponse:
-            def __init__(self, data):
+            def __init__(self, data, status_code):
                 self._data = data
+                self.status_code = status_code
             def json(self):
                 return self._data

@@ -1547,7 +1549,7 @@ class TestExecuteForwardedRequest:
             async def post(self, *_args, **_kwargs):
                 return self._response

-        dummy_client = DummyClient(DummyResponse({"jsonrpc": "2.0", "error": {"code": -32601, "message": "nope"}, "id": 1}))
+        dummy_client = DummyClient(DummyResponse({"jsonrpc": "2.0", "error": {"code": -32601, "message": "nope"}, "id": 1}, status_code=200))

         with patch("mcpgateway.services.mcp_session_pool.settings") as mock_settings:
             mock_settings.port = 4444

@ja8zyjits ja8zyjits self-requested a review March 3, 2026 02:20
@omorros
Copy link
Copy Markdown
Contributor Author

omorros commented Mar 3, 2026

Hi @ja8zyjits, thanks for catching that and for providing the fix, I've applied your suggested changes and pushed the update. All tests are passing now. Let me know if there's anything else that needs to be done! ;)

@ja8zyjits
Copy link
Copy Markdown
Member

Hi @omorros ,
Please fix the issue here.
I think the code needs more test coverage?

Regards,
Jitesh

@omorros
Copy link
Copy Markdown
Contributor Author

omorros commented Mar 3, 2026

Hi @ja8zyjits, I've added unit tests covering all the new non-2xx response handling branches (JSON-RPC error propagation, non-JSON-RPC body mapping, and unparseable body fallback). All tests are passing.

@ja8zyjits
Copy link
Copy Markdown
Member

Hi @omorros ,

This looks good, Iam trying to reproduce output with and without the changes.
Have you had any success in doing that?
i.e Do you have any integration testing strategy?

BR
Jitesh

@omorros
Copy link
Copy Markdown
Contributor Author

omorros commented Mar 5, 2026

Hi @ja8zyjits,

Yes, I verified it locally. The unit and e2e tests call the fixed method directly with simulated 401, 403, and 502 responses, all pass and return proper errors. Before the fix, those would silently return {"result": {}}. A full Docker repro is tricky since you can't control which worker nginx routes to, but the tests hit the exact code path where the bug was. Let me know if anything else is needed!

@ja8zyjits
Copy link
Copy Markdown
Member

ja8zyjits commented Mar 19, 2026

Hi @omorros ,
The code changes are appropriate. But it seems this issue is bit more nuanced.
I was not able to run this end to end. Or may be Iam missing something here.
There is a good chance that there are issues in the implementation of forwarded RPC request, in which case, we can create a followup issue and close this issue while accepting your changes.

The following are the steps I carried out to reproduce the issue:

In Terminal-1

Run 401 error server:

from fastapi import FastAPI, HTTPException
import uvicorn

app = FastAPI()

@app.post("/rpc")
async def rpc_endpoint():
    # Always return a 401 with a simple FastAPI-style error body
    raise HTTPException(
        status_code=401,
        detail="Unauthorized request"  # Your message here
    )

if __name__ == "__main__":
    uvicorn.run(app, host="127.0.0.1", port=4445)

In Terminal-2:

Create .env file by cp .env.example .env
Update the .env file

571c571
< # CACHE_TYPE=database
---
> CACHE_TYPE=redis
597c597
< # REDIS_URL=redis://localhost:6379/0
---
> REDIS_URL=redis://localhost:6379/0
1570c1570
< # LOG_LEVEL=ERROR
---
> LOG_LEVEL=INFO
2005c2005
< # MCPGATEWAY_SESSION_AFFINITY_ENABLED=false
---
> MCPGATEWAY_SESSION_AFFINITY_ENABLED=true
2591c2591
< # GUNICORN_WORKERS=auto
---
> GUNICORN_WORKERS=2
2612c2612
< # GUNICORN_PRELOAD_APP=true
---
> GUNICORN_PRELOAD_APP=true
2618c2618
< # GUNICORN_DEV_MODE=false
---
> GUNICORN_DEV_MODE=true
2803c2803
< # DEBUG=false
---
> DEBUG=true
2879c2879

Run 2 worker server with the following command GUNICORN_DEV_MODE=true ./run-gunicorn.sh

Register the 401 server as a tool with name dummy for future tool-call

In Terminal-3:

Run a single Redis server.

In Terminal-4:

Create a token for access

export TOKEN=$(python -m mcpgateway.utils.create_jwt_token --username admin@example.com --exp 60 --secret my-test-key)

In the same terminal run the following code as sse-client.py
Simulate the multi worker toolcall scenario:

import os
import requests
from sseclient import SSEClient
import threading
import time


# Load token from environment variable
TOKEN = os.getenv("TOKEN")


SSE_URL = "http://127.0.0.1:4444/sse"
SEND_URL = "http://127.0.0.1:4444/message"
SESSION_ID = None


def listen_sse():
    global SESSION_ID
    print("Connecting to SSE with Bearer token...")

    headers = {
        "Authorization": f"Bearer {TOKEN}",
        "Accept": "text/event-stream",
    }

    session = requests.Session()
    response = session.get(SSE_URL, stream=True, headers=headers)

    client = SSEClient(response)
    for msg in client.events():
        if "session_id" in msg.data:
            SESSION_ID = msg.data.split("=")[-1]
        print("SSE received:", msg.data)


def invoke_tool():
    headers = {
        "Authorization": f"Bearer {TOKEN}",
        "Content-Type": "application/json"
    }

    r = requests.post(
	SEND_URL,
	params={"session_id":SESSION_ID},
        json={"jsonrpc":"2.0","id":2, "method":"tools/call", "params":{"name":"dummy"}},
	headers=headers)
    print("POST /invoke_tool:", r.json())



# Start SSE listener in background
thread = threading.Thread(target=listen_sse, daemon=True)
thread.start()

# Give it a second to connect
while not SESSION_ID:
    time.sleep(1)

# Send a message to the server
invoke_tool()
time.sleep(4)
invoke_tool()   #  calling the invoke_tool twice since only then we can simulate cache storage. Only then request-forwarding is happening.
# if the cache conditions execute well we reach to `_execute_forwarded_request`


# Keep main thread alive
while True:
    time.sleep(1)

All these are heavily influenced by docs/docs/architecture/adr/038-multi-worker-session-affinity.md

@omorros
Copy link
Copy Markdown
Contributor Author

omorros commented Mar 21, 2026

Hi @ja8zyjits, I tested the forwarding pipeline locally with Redis and confirmed ownership registration, pub/sub forwarding, and the non-2xx error handling all work when the code paths are triggered.

I agree it's hard to trigger end-to-end with gunicorn since it depends on which worker the internal /rpc call lands on.

Happy to open a follow-up issue for improving integration test coverage on this area if that works for you.

@omorros omorros force-pushed the fix/forwarded-rpc-non-2xx-masking-3365 branch from f85399c to 0ae28a5 Compare March 21, 2026 22:31
omorros and others added 2 commits March 22, 2026 12:44
…rror masking

_execute_forwarded_request() now inspects the response body before
returning a generic error on non-2xx HTTP status. If the body contains
a JSON-RPC error ({"error": {...}}), it is propagated directly. If
the body is non-JSON-RPC (e.g. {"detail": "Authorization token
required"}), the detail is included in the mapped JSON-RPC error
instead of being discarded. Unparseable bodies fall back to
response.text (truncated to 200 chars).

Closes IBM#3365

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Co-authored-by: Oriol Morros Vilaseca <OM368@student.aru.ac.uk>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
…ON bodies

Guard against response.json() returning None (JSON null) or a non-dict
type, which would crash with TypeError at the `"error" in response_data`
check. Also narrow exception catch from Exception to ValueError (the
actual type raised by json parsing failures), consistent with codebase
patterns.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai self-assigned this Mar 22, 2026
@crivetimihai crivetimihai force-pushed the fix/forwarded-rpc-non-2xx-masking-3365 branch from 0ae28a5 to 7cabc5e Compare March 22, 2026 13:44
@crivetimihai
Copy link
Copy Markdown
Member

Maintainer rebase and review

Rebased onto current main and reworked the PR to keep only the scoped bug fix. Here's what changed:

What was kept (the fix)

_execute_forwarded_request() in mcp_session_pool.py now parses non-2xx response bodies before returning an error:

  • JSON-RPC error bodies ({"error": {...}}) are propagated directly with their original error code
  • Non-JSON-RPC bodies ({"detail": "..."} from FastAPI) include the detail in the error message
  • Unparseable bodies fall back to response.text[:200]

What was added during review

  • Guard against response.json() returning None (JSON null) or non-dict types — would have caused TypeError at runtime
  • Narrowed exception catch from Exception to ValueError (matches codebase patterns for JSON parse failures)
  • Test for the null-body edge case

What was removed

The original PR included a large refactoring that reverted commit bbc1792aa (#3739 — background task ownership for cancel scope isolation). That commit fixed 20-45% tool call failures under load and was merged after this branch was forked. The refactoring has been stripped — only the scoped bug fix remains.

Test coverage (6 new tests)

  • Non-2xx + JSON-RPC error body → propagated (unit + e2e)
  • Non-2xx + non-JSON-RPC body → mapped with detail (unit + e2e)
  • Non-2xx + unparseable body → fallback to text (unit)
  • Non-2xx + JSON null body → no crash (unit)

Verification

  • 246 session pool unit/e2e tests pass
  • 65 affinity/forwarding-specific tests pass
  • Live testing against deployed 3-worker cluster (RPC, SSE, Streamable HTTP, tool calls, admin UI)
  • Playwright: 67 passed (2 pre-existing flaky failures unrelated to this change)
  • Docker logs: 0 errors across all workers
  • Codex cross-review: no blocking findings

Commits:

  • 486d999 — fix(mcp): parse non-2xx response bodies in forwarded RPC to prevent error masking (author: @omorros)
  • 7cabc5e — fix(mcp): harden forwarded RPC error parsing against null/non-dict JSON bodies

Copy link
Copy Markdown
Member

@crivetimihai crivetimihai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased, reviewed, hardened, and tested. The fix correctly addresses #3365 — forwarded RPC non-2xx responses now preserve error details instead of masking them. All tests pass, no regressions.

@crivetimihai crivetimihai merged commit 0876b15 into IBM:main Mar 22, 2026
36 checks passed
@ja8zyjits
Copy link
Copy Markdown
Member

HI @omorros
Thanks for the response. If possible do create a followup testing/chore issue to do a end-to-end verification to make it a robust feature.

Regrads,
Jitesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working do-not-merge Do NOT merge. This requires manual effort, with multiple checks! mcp-protocol Alignment with MCP protocol or specification merge-queue Rebased and ready to merge MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][API]: Forwarded RPC non-2xx responses masked as success — affinity tests non-hermetic

3 participants