Skip to content

feat(gateway_mode): Add gateway mode feature with direct_proxy support for pass-through MCP operations#2723

Merged
crivetimihai merged 18 commits intoIBM:mainfrom
010gvr:gateway-mode/direct_proxy
Feb 13, 2026
Merged

feat(gateway_mode): Add gateway mode feature with direct_proxy support for pass-through MCP operations#2723
crivetimihai merged 18 commits intoIBM:mainfrom
010gvr:gateway-mode/direct_proxy

Conversation

@010gvr
Copy link
Copy Markdown
Contributor

@010gvr 010gvr commented Feb 6, 2026

🔗 Related Issue

Closes #2171 and #2344


📝 Summary

Implements a new direct_proxy gateway mode that enables pass-through proxying of MCP operations directly to remote servers without database caching. This complements the existing cache mode (default) and provides flexibility for different use cases.

⚠️ The direct proxy assumes a stateless MCP server is built using Streamable HTTP transport. Further, RBAC architecture need to be analyzed.

  • New field: Added gateway_mode column to gateways table with values cache (default) or direct_proxy

  • Migration: Created idempotent Alembic migration

  • Models: Updated Gateway ORM model in db.py with new field and documentation

  • GatewayCreate: Added gateway_mode field with validation pattern ^(cache|direct_proxy)$

  • GatewayUpdate: Added optional gateway_mode field for updates

  • GatewayRead: Added gateway_mode field to response schema

gateway_service.py

  • Added gateway_mode field handling in gateway creation and updates
  • Passes gateway mode through initialization flow

tool_service.py

  • Implements direct proxy mode detection via X-Gateway-Id header
  • When in direct_proxy mode:
    • Bypasses database tool lookup/tool call
    • Creates minimal tool payload for direct proxying
    • Skips access control checks (delegated to remote server)
    • Skips metrics recording (no tool_id in direct mode)
  • Enhanced error handling for extract_using_jq failures
  • Added comprehensive logging for streamablehttp requests

resource_service.py

  • Implements direct proxy mode for resource operations
  • Proxies resources/read and resources/list requests directly to remote MCP servers when in direct_proxy mode
  • Uses MCP SDK for proper protocol handling

streamablehttp_transport.py

  • Proxy methods tools/list , tools/call, resources/list and resources/read

Features

  • Fast lookups, access control enforced, metrics recorded
  • Existing behavior unchanged
  • All MCP operations proxied directly to remote server
  • No database caching of tools/resources/prompts
  • Client sends X-Context-Forge-Gateway-Id header to specify gateway
  • Gateway must have gateway_mode: "direct_proxy"
  • Minimal overhead, real-time data from remote server
  • Access control delegated to remote server
  • Existing tests should pass (default cache mode unchanged)

This is a backward-compatible feature addition with sensible defaults.


🏷️ Type of Change

  • Bug fix
  • Feature / Enhancement
  • Documentation
  • Refactor
  • Chore (deps, CI, tooling)
  • Other (describe below)

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 80% make coverage

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • Tests added/updated for changes
  • Documentation updated (if applicable)
  • No secrets or credentials committed

📓 Notes (optional)

TODOs:

  1. Support for any MCP server/transport types
  2. Improve Normalization. Maintenance overhead to keep up to rapidly changing MCP spec, Code can be simplified if Context Forge acted as a proxy if server is already MCP-compliant.

@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch 2 times, most recently from 2fe6fb4 to daef63b Compare February 6, 2026 04:13
@jonpspri jonpspri self-requested a review February 6, 2026 08:57
@crivetimihai crivetimihai self-assigned this Feb 6, 2026
@crivetimihai crivetimihai added this to the Release 1.0.0-RC1 milestone Feb 7, 2026
@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch 2 times, most recently from 9f74738 to ea275e4 Compare February 10, 2026 06:08
Comment on lines +88 to +89
# Create naive datetime from UTC (not local time) to test the tzinfo addition
naive_time = datetime.now(timezone.utc).replace(tzinfo=None) - timedelta(seconds=10)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many test cases where datetime.now() is used which pulls in local time. This may work if the developer is in UTC tz or around and tests may pass by fluke. Otherwise time can end up going backward. I'm fixing in this PR but I think a proper fix would really be to avoid adding test cases with plain datetime.now() if the implementation standardizes on UTC.

@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch from ea275e4 to 210b781 Compare February 10, 2026 06:18
@crivetimihai
Copy link
Copy Markdown
Member

Thanks for implementing the direct proxy feature, @010gvr! The concept is sound — not all use cases benefit from caching MCP server responses. The migration is clean and the schema validation looks good.

A few significant concerns:

  1. RBAC bypass — When is_direct_proxy is true, the entire RBAC check block is skipped in tool_service.py. Any authenticated user who knows a gateway ID can invoke any tool regardless of team membership or visibility. At minimum, please verify the user has access to the gateway itself.
  2. Code duplication — The auth header construction logic (~20 lines of bearer/basic/dict/str handling) is copy-pasted across 5-6 proxy functions. Please extract a shared build_gateway_auth_headers(gateway) helper.
  3. No test coverage — All proxy logic is marked # pragma: no cover. The tests only cover schema validation and gateway registration, not the actual proxying behavior.
  4. Inconsistent session.initialize() — Some proxy functions call it, others skip it. This should be consistent.
  5. flake8: noqa broadeningmcp_session_pool.py changed from specific noqa: DAR101, DAR201, DAR401 to blanket noqa, which silences all flake8 warnings for the file.

Item 1 (RBAC) is the most critical — this is a privilege escalation risk in multi-tenant deployments.

@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch 3 times, most recently from 5147612 to 2ad4124 Compare February 11, 2026 16:34
@crivetimihai
Copy link
Copy Markdown
Member

Code Review: direct_proxy Gateway Mode

Rebased onto main, resolved 2 merge conflicts (trivial comment differences), and fixed the Alembic migration head (was pointing to old parent 04cda6733305, updated to current head c1c2c3c4c5c6 — this was causing multiple Alembic heads which would break all tests).


Critical Issues (Must Fix Before Merge)

1. SECURITY: Missing check_gateway_access() in call_tool direct proxy path

  • File: streamablehttp_transport.py:700-723
  • The call_tool function's direct proxy shortcut calls invoke_tool_direct() without any gateway access check.
  • invoke_tool_direct() (tool_service.py:2557-2627) also has no access check.
  • Impact: Any authenticated user who knows a gateway ID can invoke tools on any direct_proxy gateway, regardless of visibility (public/team/private). This completely bypasses RBAC for tool invocation.
  • Compare with list_tools (line 1014), list_resources (line 1248), and read_resource (line 1361) — all three correctly call check_gateway_access() before proxying. call_tool is the only one missing it.
  • Note: The normal invoke_tool() path at tool_service.py:2707 does have the access check. The simplest fix would be to remove the shortcut path (lines 692-726) entirely and let invoke_tool() handle direct_proxy internally.

2. Silent fallback from direct_proxy to normal mode in call_tool

  • File: streamablehttp_transport.py:724-726
  • On any exception (including ToolNotFoundError for missing gateways), call_tool silently falls through to normal tool invocation mode.
  • This is inconsistent — list_tools, list_resources, and read_resource do NOT have this fallback pattern; they return empty results or raise errors.
  • Risk: A user expecting direct proxy behavior could silently get cached tool execution instead. If the remote server is temporarily unreachable, a different tool with the same name from a different gateway could be executed.
  • Fix: Either remove the fallback and let the exception propagate, or explicitly return an error.

3. AttributeError crash in resource_service.py direct proxy path

  • File: resource_service.py:2182-2229
  • After the direct proxy successfully fetches content as TextResourceContents, the comment at line 2225 says "Skip the rest of the DB lookup logic" but there is no return, break, or control flow skip.
  • The code falls through to line 2326: if isinstance(content, (ResourceContent, ResourceContents, TextContent)) — which matches TextResourceContents (inherits from ResourceContents).
  • Line 2329: getattr(content, "id") raises AttributeError because TextResourceContents has no id field (verified).
  • The crash is caught by the outer except Exception at line 2386, which re-raises and loses the successfully-fetched content.
  • Fix: Add a return content or a flag to skip the invoke_resource() call when content was already fetched via direct proxy.

Medium Issues

4. Hardcoded timeout=30.0 in all proxy functions

  • All proxy functions use timeout=30.0 while the normal mode uses settings.tool_timeout (default 60s). Should use the configurable value for consistency.

5. # pragma: no cover - integration test annotations are misleading

  • The annotations claim integration tests exist, but tests/integration/ has zero references to direct_proxy or gateway_mode.
  • The annotations hide the entire call_tool direct proxy path (including the missing access check) from coverage reports.
  • Should either write integration tests or remove the misleading annotations.

6. Inconsistent gateway_mode access pattern

  • getattr(gateway, "gateway_mode", "cache") used in some places (call_tool, list_tools, invoke_tool_direct), direct gateway.gateway_mode in others (list_resources, read_resource). Should be consistent.

Minor Issues

  • Hardcoded "streamablehttp" transport in tool_service.py:2735 — SSE gateways registered with direct_proxy mode would fail.
  • No format validation on gateway_id header — the X-Context-Forge-Gateway-Id header value is passed directly to a DB query without UUID format validation. SQLAlchemy parameterizes safely, but format validation would fail fast.
  • Inconsistent extract_using_jq error returns — jq filter errors now return [TextContent(...)] (lines 288, 291) but JSON parse errors still return ['string'] (line 278). The REST API caller at line 3168 checks isinstance(filtered_response[0], TextContent) and would miss string-based errors.
  • from sqlalchemy import select imported twice in list_tools function body (lines 1006 and 1038).
  • 4 unrelated whitespace-only changes in noqa comments (registry_cache.py, tls_utils.py, db_isready.py, redis_isready.py).

What's Good

  • Schema validation with pattern="^(cache|direct_proxy)$" is solid
  • Idempotent Alembic migration follows project conventions
  • check_gateway_access() is structurally consistent with existing _check_tool_access and _check_resource_access across all 8 decision steps
  • build_gateway_auth_headers() has no credential leaking or injection risks
  • list_tools, list_resources, and read_resource all have proper gateway access checks
  • Good test coverage for schemas, gateway service, and transport-level access denial
  • Backward compatible — existing cache mode behavior is unchanged
  • The assumption of stateless MCP servers is explicitly documented

Recommendation

Issues #1, #2, and #3 need to be fixed before merge. The simplest fix for #1 and #2 is to remove the shortcut path in call_tool (lines 692-726) and let invoke_tool() handle direct_proxy internally — it already does correctly at lines 2700-2709 with proper access checks.

@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch from 2ad4124 to 01b1144 Compare February 12, 2026 03:24
@010gvr
Copy link
Copy Markdown
Contributor Author

010gvr commented Feb 12, 2026

@crivetimihai I addressed most of them in 01b1144.

Reg (1) though, the code block in tool_service.py was stale and used for debugging. I've removed it. I won't be able to use the current invoke_tool(..) to inject direct_proxy without significant changes. The problem is that (as I mentioned in the description) there are normalizations that are happening in the regular tool call and there's a close knit DB interaction to fetch tool info, auth and more importantly uses a mcp session pool which aren't needed for direct proxy. I'm looking to keep the change radar small to avoid breaking the existing cache setup. the direct_proxy can still become even more light-weight so it doesn't have to pass through the current mcp app fully, but may be for some other day. PTAL.

@010gvr 010gvr force-pushed the gateway-mode/direct_proxy branch from 59a2943 to c032997 Compare February 12, 2026 17:19
Copy link
Copy Markdown
Member

@crivetimihai crivetimihai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under review

Implements a new `direct_proxy` gateway mode that enables pass-through proxying of MCP operations directly to remote servers without database caching. This complements the existing `cache` mode (default) and provides flexibility for different use cases.

:warning: The direct proxy assumes a stateless MCP server is built using Streamable HTTP transport. Further, RBAC architecture need to be analyzed.

- **New field**: Added `gateway_mode` column to `gateways` table with values `cache` (default) or `direct_proxy`
- **Migration**: Created idempotent Alembic migration
- **Models**: Updated `Gateway` ORM model in `db.py` with new field and documentation

- **GatewayCreate**: Added `gateway_mode` field with validation pattern `^(cache|direct_proxy)$`
- **GatewayUpdate**: Added optional `gateway_mode` field for updates
- **GatewayRead**: Added `gateway_mode` field to response schema

**gateway_service.py**
- Added `gateway_mode` field handling in gateway creation and updates
- Passes gateway mode through initialization flow

**tool_service.py**
- Implements direct proxy mode detection via `X-Gateway-Id` header
- When in `direct_proxy` mode:
  - Bypasses database tool lookup/tool call
  - Creates minimal tool payload for direct proxying
  - Skips access control checks (delegated to remote server)
  - Skips metrics recording (no tool_id in direct mode)
- Enhanced error handling for `extract_using_jq` failures
- Added comprehensive logging for streamablehttp requests
- Fixed `Accept` header to include both `application/json` and `text/event-stream` for MCP compatibility

**resource_service.py**
- Implements direct proxy mode for resource operations
- Proxies `resources/read` and `resources/list` requests directly to remote MCP servers when in direct_proxy mode
- Uses MCP SDK for proper protocol handling

**streamablehttp_transport.py**
- Proxy methods for all MCP methods

- Fast lookups, access control enforced, metrics recorded
- Existing behavior unchanged
- All MCP operations proxied directly to remote server
- No database caching of tools/resources/prompts
- Client sends `X-Context-Forge-Gateway-Id` header to specify gateway
- Gateway must have `gateway_mode: "direct_proxy"`
- Minimal overhead, real-time data from remote server
- Access control delegated to remote server
- Existing tests should pass (default cache mode unchanged)

This is a backward-compatible feature addition with sensible defaults.

Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
010gvr and others added 7 commits February 13, 2026 16:16
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: 010gvr <010gvr@gmail.com>
The migration pointed to an old parent revision (04cda6733305) causing
multiple alembic heads. Updated to current head (c1c2c3c4c5c6).

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Add check_gateway_access() calls to two code paths that were missing
access control before proxying requests in direct_proxy mode:

- streamablehttp_transport.py call_tool(): was forwarding tool calls
  without verifying the user had access to the target gateway
- resource_service.py read_resource(): was proxying resource reads
  without verifying gateway access permissions

Also adds a defensive access check inside invoke_tool_direct() to
prevent RBAC bypass if called from new contexts, fixes the migration
docstring Revises mismatch, removes stale attribution comment, and
defines GATEWAY_ID_HEADER constant for the header name.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the gateway-mode/direct_proxy branch from c032997 to 4599b23 Compare February 13, 2026 17:16
…test coverage

- Add MCPGATEWAY_DIRECT_PROXY_ENABLED (default: false) and
  MCPGATEWAY_DIRECT_PROXY_TIMEOUT (default: 30s) config settings
  across config.py, .env.example, Helm values, docker-compose, admin UI
- Add extract_gateway_id_from_headers() helper in gateway_access.py,
  replacing 6 repeated header-scanning loops with unified constant
- Add registration-time guards in gateway_service (register + update)
  that reject direct_proxy mode when the feature flag is disabled
- Add runtime guards in tool_service, resource_service, and
  streamablehttp_transport that fall through to cache mode when disabled
- Replace 5 hardcoded timeout=30.0 with configurable setting
- Remove all pragma: no cover markers from direct_proxy code paths
- Add 41 new unit tests covering config defaults, header helper,
  gateway service guards, invoke_tool_direct, invoke_tool header branch,
  call_tool direct_proxy, and read_resource direct_proxy
- Add documentation section in configuration.md and config.schema.json

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the gateway-mode/direct_proxy branch from 4599b23 to af61f72 Compare February 13, 2026 18:01
- call_tool: add feature-flag gate (settings.mcpgateway_direct_proxy_enabled)
  matching list_tools/list_resources/read_resource
- call_tool: return error on proxy failure instead of falling through to
  cache mode (fail-closed)
- call_tool: sanitize error message to avoid leaking exception details
- read_resource: return empty string directly on access denial instead of
  raising HTTPException that gets swallowed by outer handler
- read_resource: remove _meta from session.read_resource() call (MCP SDK
  only accepts uri parameter)
- GatewayCreate: change gateway_mode from Optional[str] to str with
  before-validator defaulting None to 'cache' (prevents DB integrity error)
- admin: expose direct_proxy_timeout in Connection Timeouts section
- list_tools docstring: fix refresh_strategy -> gateway_mode
- Fix jq test assertions to match TextContent return type

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai
Copy link
Copy Markdown
Member

Review Fixes (abca2c6)

Addressed review feedback across two commits. Summary of all changes on this branch:

Security Fixes

  • call_tool fail-closed — proxy failure now returns CallToolResult(isError=True) instead of silently falling back to cache mode. When a client explicitly targets a direct-proxy gateway, upstream errors must not result in execution from a different path.
  • Error message sanitization — direct proxy error responses no longer leak raw exception details to clients. Detailed errors remain in server logs only.
  • GatewayCreate gateway_mode=null — schema now rejects/defaults None to "cache" via a @field_validator(mode="before"), preventing DB integrity errors on the non-nullable column.
  • read_resource auth denial — returns empty string directly instead of raising HTTPException(404) that was caught by the outer handler and silently swallowed.

Consistency Fixes

  • Feature-flag gate in call_tool — added settings.mcpgateway_direct_proxy_enabled check, matching list_tools, list_resources, and read_resource. Previously call_tool entered the direct-proxy path without checking the flag.
  • read_resource _meta removal — MCP SDK read_resource(uri) does not accept _meta (unlike call_tool which accepts meta). Removed the _meta injection that would cause TypeError at runtime.
  • Admin config — exposed mcpgateway_direct_proxy_timeout in the Connection Timeouts section for operational visibility.
  • Docstring fixlist_tools docstring referenced refresh_strategy but implementation uses gateway_mode.

Structural Fix

  • Removed dead single-tool DB lookup path in invoke_tool() — the scalar_one_or_none() path (inside if not is_direct_proxy:) made the multi-tool scalars().all() path unreachable. The multi-tool path correctly handles both single and duplicate tool names across teams.

Test Fixes

  • Fixed extract_using_jq test assertions to match [TextContent(...)] return type (changed in prior branch commit).
  • Updated test_call_tool_direct_proxy_exception to verify error return instead of fallback.
  • Added test_call_tool_direct_proxy_feature_disabled_falls_through test.
  • Added autouse fixture for TestCallToolDirectProxy to enable feature flag.
  • Updated test_read_resource_direct_proxy_with_meta to verify _meta is NOT forwarded.

Items Reviewed but Not Changed

  • Prompts: Direct proxy for prompts is intentionally out of scope — feature targets tools and resources only.
  • ResourceService narrow URI path: Transport layer handles direct_proxy via header; service-level check is a valid secondary path for URI-based lookups.

All 11,547 unit tests pass.

Covers meta extraction, gateway fallback paths, passthrough headers,
jq filter error handling, and isError attribute fallback.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai
Copy link
Copy Markdown
Member

crivetimihai commented Feb 13, 2026

Direct Proxy Mode — Feature Flag & Configuration

This feature is now controlled by two environment variables:

Variable Default Description
MCPGATEWAY_DIRECT_PROXY_ENABLED false Enable the direct_proxy gateway mode
MCPGATEWAY_DIRECT_PROXY_TIMEOUT 30 Timeout in seconds for proxied MCP operations

How it works:

  1. Set MCPGATEWAY_DIRECT_PROXY_ENABLED=true to enable
  2. Register a gateway with "gateway_mode": "direct_proxy" via POST /gateways
  3. Send requests with the X-Context-Forge-Gateway-Id header set to that gateway's ID
  4. All MCP operations (tools/list, tools/call, resources/list, resources/read) are proxied directly to the remote server, bypassing the caching layer

Safety guarantees:

  • Feature is disabled by default across all config surfaces (.env.example, Helm values.yaml, docker-compose.yml)
  • Registration of direct_proxy gateways is rejected when the flag is disabled
  • Runtime checks silently fall back to cache mode when the flag is disabled
  • Proxy failures return errors to the client (fail-closed, never fall through to cache)
  • All direct_proxy paths enforce RBAC access checks before proxying
  • Error messages are sanitized (no raw exception details leaked to clients)

… prefix

Two bugs discovered during E2E testing:

1. session.initialize() was commented out in all 5 direct_proxy code
   paths, causing 422 errors from remote MCP servers (MCP protocol
   requires initialization before operations).

2. invoke_tool_direct sent the gateway-prefixed slug name (e.g.
   "fast-test-get-system-time") to the remote server, which only
   knows the original name ("get_system_time"). Fixed by looking up
   the tool's original_name from the DB, with a slug-prefix stripping
   fallback for tools not yet cached locally.

Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai merged commit 07bc73c into IBM:main Feb 13, 2026
53 checks passed
@010gvr
Copy link
Copy Markdown
Contributor Author

010gvr commented Feb 16, 2026

@crivetimihai Just leaving a comment here to revisit in the future if needed. The last commit be00451 has put back the session initialize() . This perfectly makes sense if Gateway is the "source" client. However, for a proxy case, this will be a redundant call. The code inside this gateway does really nothing on the initialize notification sent back (this is supposed to validate protocol, capabilities etc) as it might have been done already by the source client. We'll do some perf tests and then decide what to do here.

suciu-daniel pushed a commit that referenced this pull request Feb 16, 2026
…pport for pass-through MCP operations (#2723)

* feat: add direct_proxy gateway mode for pass-through MCP operations

Implements a new `direct_proxy` gateway mode that enables pass-through proxying of MCP operations directly to remote servers without database caching. This complements the existing `cache` mode (default) and provides flexibility for different use cases.

:warning: The direct proxy assumes a stateless MCP server is built using Streamable HTTP transport. Further, RBAC architecture need to be analyzed.

- **New field**: Added `gateway_mode` column to `gateways` table with values `cache` (default) or `direct_proxy`
- **Migration**: Created idempotent Alembic migration
- **Models**: Updated `Gateway` ORM model in `db.py` with new field and documentation

- **GatewayCreate**: Added `gateway_mode` field with validation pattern `^(cache|direct_proxy)$`
- **GatewayUpdate**: Added optional `gateway_mode` field for updates
- **GatewayRead**: Added `gateway_mode` field to response schema

**gateway_service.py**
- Added `gateway_mode` field handling in gateway creation and updates
- Passes gateway mode through initialization flow

**tool_service.py**
- Implements direct proxy mode detection via `X-Gateway-Id` header
- When in `direct_proxy` mode:
  - Bypasses database tool lookup/tool call
  - Creates minimal tool payload for direct proxying
  - Skips access control checks (delegated to remote server)
  - Skips metrics recording (no tool_id in direct mode)
- Enhanced error handling for `extract_using_jq` failures
- Added comprehensive logging for streamablehttp requests
- Fixed `Accept` header to include both `application/json` and `text/event-stream` for MCP compatibility

**resource_service.py**
- Implements direct proxy mode for resource operations
- Proxies `resources/read` and `resources/list` requests directly to remote MCP servers when in direct_proxy mode
- Uses MCP SDK for proper protocol handling

**streamablehttp_transport.py**
- Proxy methods for all MCP methods

- Fast lookups, access control enforced, metrics recorded
- Existing behavior unchanged
- All MCP operations proxied directly to remote server
- No database caching of tools/resources/prompts
- Client sends `X-Context-Forge-Gateway-Id` header to specify gateway
- Gateway must have `gateway_mode: "direct_proxy"`
- Minimal overhead, real-time data from remote server
- Access control delegated to remote server
- Existing tests should pass (default cache mode unchanged)

This is a backward-compatible feature addition with sensible defaults.

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore: lint code

Signed-off-by: 010gvr <010gvr@gmail.com>

* test(gateway_mode): Schema + Gateway Service tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): flake8 fix

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(rebase): Rebase with main

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(tests): Use integration tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: User access check to Gateway for all direct_proxy mode

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(headers): Helper for auth header propagation

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: inconsistent initialize()

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(tests): coverage fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(migration): update alembic down_revision to current head

The migration pointed to an old parent revision (04cda6733305) causing
multiple alembic heads. Updated to current head (c1c2c3c4c5c6).

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(security): add missing RBAC checks in direct_proxy mode

Add check_gateway_access() calls to two code paths that were missing
access control before proxying requests in direct_proxy mode:

- streamablehttp_transport.py call_tool(): was forwarding tool calls
  without verifying the user had access to the target gateway
- resource_service.py read_resource(): was proxying resource reads
  without verifying gateway access permissions

Also adds a defensive access check inside invoke_tool_direct() to
prevent RBAC bypass if called from new contexts, fixes the migration
docstring Revises mismatch, removes stale attribution comment, and
defines GATEWAY_ID_HEADER constant for the header name.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* feat(direct_proxy): add feature flag, configurable timeout, and full test coverage

- Add MCPGATEWAY_DIRECT_PROXY_ENABLED (default: false) and
  MCPGATEWAY_DIRECT_PROXY_TIMEOUT (default: 30s) config settings
  across config.py, .env.example, Helm values, docker-compose, admin UI
- Add extract_gateway_id_from_headers() helper in gateway_access.py,
  replacing 6 repeated header-scanning loops with unified constant
- Add registration-time guards in gateway_service (register + update)
  that reject direct_proxy mode when the feature flag is disabled
- Add runtime guards in tool_service, resource_service, and
  streamablehttp_transport that fall through to cache mode when disabled
- Replace 5 hardcoded timeout=30.0 with configurable setting
- Remove all pragma: no cover markers from direct_proxy code paths
- Add 41 new unit tests covering config defaults, header helper,
  gateway service guards, invoke_tool_direct, invoke_tool header branch,
  call_tool direct_proxy, and read_resource direct_proxy
- Add documentation section in configuration.md and config.schema.json

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): address review feedback for security and consistency

- call_tool: add feature-flag gate (settings.mcpgateway_direct_proxy_enabled)
  matching list_tools/list_resources/read_resource
- call_tool: return error on proxy failure instead of falling through to
  cache mode (fail-closed)
- call_tool: sanitize error message to avoid leaking exception details
- read_resource: return empty string directly on access denial instead of
  raising HTTPException that gets swallowed by outer handler
- read_resource: remove _meta from session.read_resource() call (MCP SDK
  only accepts uri parameter)
- GatewayCreate: change gateway_mode from Optional[str] to str with
  before-validator defaulting None to 'cache' (prevents DB integrity error)
- admin: expose direct_proxy_timeout in Connection Timeouts section
- list_tools docstring: fix refresh_strategy -> gateway_mode
- Fix jq test assertions to match TextContent return type

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* test(direct_proxy): add coverage tests for 20 missing diff lines

Covers meta extraction, gateway fallback paths, passthrough headers,
jq filter error handling, and isError attribute fallback.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): restore session.initialize() and resolve tool name prefix

Two bugs discovered during E2E testing:

1. session.initialize() was commented out in all 5 direct_proxy code
   paths, causing 422 errors from remote MCP servers (MCP protocol
   requires initialization before operations).

2. invoke_tool_direct sent the gateway-prefixed slug name (e.g.
   "fast-test-get-system-time") to the remote server, which only
   knows the original name ("get_system_time"). Fixed by looking up
   the tool's original_name from the DB, with a slug-prefix stripping
   fallback for tools not yet cached locally.

Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
vishu-bh pushed a commit that referenced this pull request Feb 18, 2026
…pport for pass-through MCP operations (#2723)

* feat: add direct_proxy gateway mode for pass-through MCP operations

Implements a new `direct_proxy` gateway mode that enables pass-through proxying of MCP operations directly to remote servers without database caching. This complements the existing `cache` mode (default) and provides flexibility for different use cases.

:warning: The direct proxy assumes a stateless MCP server is built using Streamable HTTP transport. Further, RBAC architecture need to be analyzed.

- **New field**: Added `gateway_mode` column to `gateways` table with values `cache` (default) or `direct_proxy`
- **Migration**: Created idempotent Alembic migration
- **Models**: Updated `Gateway` ORM model in `db.py` with new field and documentation

- **GatewayCreate**: Added `gateway_mode` field with validation pattern `^(cache|direct_proxy)$`
- **GatewayUpdate**: Added optional `gateway_mode` field for updates
- **GatewayRead**: Added `gateway_mode` field to response schema

**gateway_service.py**
- Added `gateway_mode` field handling in gateway creation and updates
- Passes gateway mode through initialization flow

**tool_service.py**
- Implements direct proxy mode detection via `X-Gateway-Id` header
- When in `direct_proxy` mode:
  - Bypasses database tool lookup/tool call
  - Creates minimal tool payload for direct proxying
  - Skips access control checks (delegated to remote server)
  - Skips metrics recording (no tool_id in direct mode)
- Enhanced error handling for `extract_using_jq` failures
- Added comprehensive logging for streamablehttp requests
- Fixed `Accept` header to include both `application/json` and `text/event-stream` for MCP compatibility

**resource_service.py**
- Implements direct proxy mode for resource operations
- Proxies `resources/read` and `resources/list` requests directly to remote MCP servers when in direct_proxy mode
- Uses MCP SDK for proper protocol handling

**streamablehttp_transport.py**
- Proxy methods for all MCP methods

- Fast lookups, access control enforced, metrics recorded
- Existing behavior unchanged
- All MCP operations proxied directly to remote server
- No database caching of tools/resources/prompts
- Client sends `X-Context-Forge-Gateway-Id` header to specify gateway
- Gateway must have `gateway_mode: "direct_proxy"`
- Minimal overhead, real-time data from remote server
- Access control delegated to remote server
- Existing tests should pass (default cache mode unchanged)

This is a backward-compatible feature addition with sensible defaults.

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore: lint code

Signed-off-by: 010gvr <010gvr@gmail.com>

* test(gateway_mode): Schema + Gateway Service tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): flake8 fix

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(rebase): Rebase with main

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(tests): Use integration tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: User access check to Gateway for all direct_proxy mode

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(headers): Helper for auth header propagation

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: inconsistent initialize()

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(tests): coverage fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(migration): update alembic down_revision to current head

The migration pointed to an old parent revision (04cda6733305) causing
multiple alembic heads. Updated to current head (c1c2c3c4c5c6).

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(security): add missing RBAC checks in direct_proxy mode

Add check_gateway_access() calls to two code paths that were missing
access control before proxying requests in direct_proxy mode:

- streamablehttp_transport.py call_tool(): was forwarding tool calls
  without verifying the user had access to the target gateway
- resource_service.py read_resource(): was proxying resource reads
  without verifying gateway access permissions

Also adds a defensive access check inside invoke_tool_direct() to
prevent RBAC bypass if called from new contexts, fixes the migration
docstring Revises mismatch, removes stale attribution comment, and
defines GATEWAY_ID_HEADER constant for the header name.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* feat(direct_proxy): add feature flag, configurable timeout, and full test coverage

- Add MCPGATEWAY_DIRECT_PROXY_ENABLED (default: false) and
  MCPGATEWAY_DIRECT_PROXY_TIMEOUT (default: 30s) config settings
  across config.py, .env.example, Helm values, docker-compose, admin UI
- Add extract_gateway_id_from_headers() helper in gateway_access.py,
  replacing 6 repeated header-scanning loops with unified constant
- Add registration-time guards in gateway_service (register + update)
  that reject direct_proxy mode when the feature flag is disabled
- Add runtime guards in tool_service, resource_service, and
  streamablehttp_transport that fall through to cache mode when disabled
- Replace 5 hardcoded timeout=30.0 with configurable setting
- Remove all pragma: no cover markers from direct_proxy code paths
- Add 41 new unit tests covering config defaults, header helper,
  gateway service guards, invoke_tool_direct, invoke_tool header branch,
  call_tool direct_proxy, and read_resource direct_proxy
- Add documentation section in configuration.md and config.schema.json

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): address review feedback for security and consistency

- call_tool: add feature-flag gate (settings.mcpgateway_direct_proxy_enabled)
  matching list_tools/list_resources/read_resource
- call_tool: return error on proxy failure instead of falling through to
  cache mode (fail-closed)
- call_tool: sanitize error message to avoid leaking exception details
- read_resource: return empty string directly on access denial instead of
  raising HTTPException that gets swallowed by outer handler
- read_resource: remove _meta from session.read_resource() call (MCP SDK
  only accepts uri parameter)
- GatewayCreate: change gateway_mode from Optional[str] to str with
  before-validator defaulting None to 'cache' (prevents DB integrity error)
- admin: expose direct_proxy_timeout in Connection Timeouts section
- list_tools docstring: fix refresh_strategy -> gateway_mode
- Fix jq test assertions to match TextContent return type

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* test(direct_proxy): add coverage tests for 20 missing diff lines

Covers meta extraction, gateway fallback paths, passthrough headers,
jq filter error handling, and isError attribute fallback.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): restore session.initialize() and resolve tool name prefix

Two bugs discovered during E2E testing:

1. session.initialize() was commented out in all 5 direct_proxy code
   paths, causing 422 errors from remote MCP servers (MCP protocol
   requires initialization before operations).

2. invoke_tool_direct sent the gateway-prefixed slug name (e.g.
   "fast-test-get-system-time") to the remote server, which only
   knows the original name ("get_system_time"). Fixed by looking up
   the tool's original_name from the DB, with a slug-prefix stripping
   fallback for tools not yet cached locally.

Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Vishu Bhatnagar <vishu.bhatnagar@ibm.com>
@crivetimihai crivetimihai added the wxo wxo integration label Feb 23, 2026
kcostell06 pushed a commit to kcostell06/mcp-context-forge that referenced this pull request Feb 24, 2026
…pport for pass-through MCP operations (IBM#2723)

* feat: add direct_proxy gateway mode for pass-through MCP operations

Implements a new `direct_proxy` gateway mode that enables pass-through proxying of MCP operations directly to remote servers without database caching. This complements the existing `cache` mode (default) and provides flexibility for different use cases.

:warning: The direct proxy assumes a stateless MCP server is built using Streamable HTTP transport. Further, RBAC architecture need to be analyzed.

- **New field**: Added `gateway_mode` column to `gateways` table with values `cache` (default) or `direct_proxy`
- **Migration**: Created idempotent Alembic migration
- **Models**: Updated `Gateway` ORM model in `db.py` with new field and documentation

- **GatewayCreate**: Added `gateway_mode` field with validation pattern `^(cache|direct_proxy)$`
- **GatewayUpdate**: Added optional `gateway_mode` field for updates
- **GatewayRead**: Added `gateway_mode` field to response schema

**gateway_service.py**
- Added `gateway_mode` field handling in gateway creation and updates
- Passes gateway mode through initialization flow

**tool_service.py**
- Implements direct proxy mode detection via `X-Gateway-Id` header
- When in `direct_proxy` mode:
  - Bypasses database tool lookup/tool call
  - Creates minimal tool payload for direct proxying
  - Skips access control checks (delegated to remote server)
  - Skips metrics recording (no tool_id in direct mode)
- Enhanced error handling for `extract_using_jq` failures
- Added comprehensive logging for streamablehttp requests
- Fixed `Accept` header to include both `application/json` and `text/event-stream` for MCP compatibility

**resource_service.py**
- Implements direct proxy mode for resource operations
- Proxies `resources/read` and `resources/list` requests directly to remote MCP servers when in direct_proxy mode
- Uses MCP SDK for proper protocol handling

**streamablehttp_transport.py**
- Proxy methods for all MCP methods

- Fast lookups, access control enforced, metrics recorded
- Existing behavior unchanged
- All MCP operations proxied directly to remote server
- No database caching of tools/resources/prompts
- Client sends `X-Context-Forge-Gateway-Id` header to specify gateway
- Gateway must have `gateway_mode: "direct_proxy"`
- Minimal overhead, real-time data from remote server
- Access control delegated to remote server
- Existing tests should pass (default cache mode unchanged)

This is a backward-compatible feature addition with sensible defaults.

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore: lint code

Signed-off-by: 010gvr <010gvr@gmail.com>

* test(gateway_mode): Schema + Gateway Service tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): More lint fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(lint): flake8 fix

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(rebase): Rebase with main

Signed-off-by: 010gvr <010gvr@gmail.com>

* chore(tests): Use integration tests

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: User access check to Gateway for all direct_proxy mode

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(headers): Helper for auth header propagation

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix: inconsistent initialize()

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(tests): coverage fixes

Signed-off-by: 010gvr <010gvr@gmail.com>

* fix(migration): update alembic down_revision to current head

The migration pointed to an old parent revision (04cda6733305) causing
multiple alembic heads. Updated to current head (c1c2c3c4c5c6).

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(security): add missing RBAC checks in direct_proxy mode

Add check_gateway_access() calls to two code paths that were missing
access control before proxying requests in direct_proxy mode:

- streamablehttp_transport.py call_tool(): was forwarding tool calls
  without verifying the user had access to the target gateway
- resource_service.py read_resource(): was proxying resource reads
  without verifying gateway access permissions

Also adds a defensive access check inside invoke_tool_direct() to
prevent RBAC bypass if called from new contexts, fixes the migration
docstring Revises mismatch, removes stale attribution comment, and
defines GATEWAY_ID_HEADER constant for the header name.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* feat(direct_proxy): add feature flag, configurable timeout, and full test coverage

- Add MCPGATEWAY_DIRECT_PROXY_ENABLED (default: false) and
  MCPGATEWAY_DIRECT_PROXY_TIMEOUT (default: 30s) config settings
  across config.py, .env.example, Helm values, docker-compose, admin UI
- Add extract_gateway_id_from_headers() helper in gateway_access.py,
  replacing 6 repeated header-scanning loops with unified constant
- Add registration-time guards in gateway_service (register + update)
  that reject direct_proxy mode when the feature flag is disabled
- Add runtime guards in tool_service, resource_service, and
  streamablehttp_transport that fall through to cache mode when disabled
- Replace 5 hardcoded timeout=30.0 with configurable setting
- Remove all pragma: no cover markers from direct_proxy code paths
- Add 41 new unit tests covering config defaults, header helper,
  gateway service guards, invoke_tool_direct, invoke_tool header branch,
  call_tool direct_proxy, and read_resource direct_proxy
- Add documentation section in configuration.md and config.schema.json

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): address review feedback for security and consistency

- call_tool: add feature-flag gate (settings.mcpgateway_direct_proxy_enabled)
  matching list_tools/list_resources/read_resource
- call_tool: return error on proxy failure instead of falling through to
  cache mode (fail-closed)
- call_tool: sanitize error message to avoid leaking exception details
- read_resource: return empty string directly on access denial instead of
  raising HTTPException that gets swallowed by outer handler
- read_resource: remove _meta from session.read_resource() call (MCP SDK
  only accepts uri parameter)
- GatewayCreate: change gateway_mode from Optional[str] to str with
  before-validator defaulting None to 'cache' (prevents DB integrity error)
- admin: expose direct_proxy_timeout in Connection Timeouts section
- list_tools docstring: fix refresh_strategy -> gateway_mode
- Fix jq test assertions to match TextContent return type

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* test(direct_proxy): add coverage tests for 20 missing diff lines

Covers meta extraction, gateway fallback paths, passthrough headers,
jq filter error handling, and isError attribute fallback.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(direct_proxy): restore session.initialize() and resolve tool name prefix

Two bugs discovered during E2E testing:

1. session.initialize() was commented out in all 5 direct_proxy code
   paths, causing 422 errors from remote MCP servers (MCP protocol
   requires initialization before operations).

2. invoke_tool_direct sent the gateway-prefixed slug name (e.g.
   "fast-test-get-system-time") to the remote server, which only
   knows the original name ("get_system_time"). Fixed by looking up
   the tool's original_name from the DB, with a slug-prefix stripping
   fallback for tools not yet cached locally.

Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: 010gvr <010gvr@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Venkat Ramachandran <venkat@contextforge.ai>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wxo wxo integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Dynamic tools/resources based on user context and server-side signals

2 participants