fix(ui): replace tojson with single-quoted literals in Fetch Tools onclick by omorros · Pull Request #3100 · IBM/mcp-context-forge

omorros · 2026-02-22T02:35:42Z

🔗 Related Issue

📝 Summary

The "Fetch Tools" button in gateways_partial.html was broken because |tojson outputs double-quoted JSON strings inside a
double-quoted onclick attribute, corrupting HTML parsing. Replaced with single-quoted JS literals to match the working
pattern used everywhere else in the templates.

🏷️ Type of Change

Bug fix

🧪 Verification

Check	Command	Status
Lint suite	`make lint`	✅
Unit tests	`make test`	✅
Coverage ≥ 80%	`make coverage`	✅

✅ Checklist

Code formatted (make black isort pre-commit)
Tests added/updated for changes
Documentation updated (if applicable)
No secrets or credentials committed

📓 Notes (optional)

Single line change in gateways_partial.html:41. The same button works correctly in admin.html (initial page load) which
already uses single quotes — this fix aligns the HTMX partial to match.

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

Precompile all regex patterns at module or configuration initialization time across 14 plugins, eliminating per-request compilation overhead. Closes IBM#1834 Signed-off-by: Shoumi <shoumimukherjee@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* update jwt cli with more inputs Signed-off-by: Madhav Kandukuri <madhav165@gmail.com> * fix: prevent non-expiring tokens from invalid expires_in_days - Add ge=1 validation to TokenCreateRequest.expires_in_days schema - Add guard in _generate_token to reject expires_at in the past - Use math.ceil() and max(1, ...) to ensure exp is always set for sub-minute expirations (prevents rounding to 0) - Mark --secret and --algo CLI args as deprecated (always uses config) - Add tests for past expiry rejection and ceiling behavior This fixes a security regression where negative/zero expires_in_days could create permanent tokens instead of expired ones. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: restore --secret and --algo CLI options The --secret and --algo CLI parameters now work as optional overrides: - When provided, they override the configuration values - When not provided, JWT_SECRET_KEY and JWT_ALGORITHM from config are used This preserves backward compatibility while still defaulting to configuration-based signing for consistency. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: require --secret when --algo is specified Prevent invalid token generation by requiring --secret when --algo is provided. Using --algo alone would mix config-based keys with a different algorithm, potentially producing tokens that fail validation. Also fixes stale docstring that still referenced DEFAULT_SECRET/DEFAULT_ALGO instead of the new empty-string defaults with config fallback. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Madhav Kandukuri <madhav165@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* Initial commit for filesystem server - added stdio simple server - implemented list_directory tool Signed-off-by: cafalchio <maolivei@tcd.ie> * Improved main runner - Added argument handler - improve tracing - implemented streamable-http Signed-off-by: cafalchio <maolivei@tcd.ie> * added tracing info for each call Signed-off-by: cafalchio <maolivei@tcd.ie> * Implemented search files recursively - use glob patterns - Added search to tools - Handle errors - Improve descriptions Signed-off-by: cafalchio <maolivei@tcd.ie> * Added files for new functions - Updated cargo file for file search Signed-off-by: cafalchio <maolivei@tcd.ie> * implemented case insensitive search: Lowercase filenames and patterns Signed-off-by: cafalchio <maolivei@tcd.ie> * Implemented read_file function and added to a server - check for Max file size 1Mb - check if the path is a file Signed-off-by: cafalchio <maolivei@tcd.ie> * Added tracing for read file, improved error description Signed-off-by: cafalchio <maolivei@tcd.ie> * Added get_file_info - get size, created, modified and permissions - Added to the server Signed-off-by: cafalchio <maolivei@tcd.ie> * Added Read multiple files - use read file for each content - read async - added to server tools Signed-off-by: cafalchio <maolivei@tcd.ie> * Added write_file toold func - Write to a tempfile uuid - rename file to actual name - remove tempfile Signed-off-by: cafalchio <maolivei@tcd.ie> * Added create_directory tool Signed-off-by: cafalchio <maolivei@tcd.ie> * Added create_directory to server and implemented placeholder for list_allowed_directories Signed-off-by: cafalchio <maolivei@tcd.ie> * Changed release config to reduce bin size Signed-off-by: cafalchio <maolivei@tcd.ie> * Implemented move file function. - fails if destination exists Signed-off-by: cafalchio <maolivei@tcd.ie> * Added move_file to the server Signed-off-by: cafalchio <maolivei@tcd.ie> * Added edit file to the server Signed-off-by: cafalchio <maolivei@tcd.ie> * Added edit_file - support dry_run - use similar to get diffs Signed-off-by: cafalchio <maolivei@tcd.ie> * Adding sandbox for path ccheck Signed-off-by: cafalchio <maolivei@tcd.ie> * Apply fix to reduce TOCTOU vulnerability - atomic write, no checks and write Signed-off-by: cafalchio <maolivei@tcd.ie> * improve read to reduce TOCTOU vulnerability Signed-off-by: cafalchio <maolivei@tcd.ie> * Stopped to follow symlink on search (security) Signed-off-by: cafalchio <maolivei@tcd.ie> * removed unused import Signed-off-by: cafalchio <maolivei@tcd.ie> * Improved Sandbox for TOCTOU safety - initialize sandbox once - check if new folders are inside root - resolve path inside root Signed-off-by: cafalchio <maolivei@tcd.ie> * Added get_roots for server list_allowed_directories Signed-off-by: cafalchio <maolivei@tcd.ie> * Using sanbox resolve path before ger file info Signed-off-by: cafalchio <maolivei@tcd.ie> * Added sandox to write_file and create_directory - validade parent folder - clean tempfile after - check new folder inside root Signed-off-by: cafalchio <maolivei@tcd.ie> * Added sandbox to write file - canonicalize and check new folders - check parent folders - on create directory, check if exists Signed-off-by: cafalchio <maolivei@tcd.ie> * Added sandbox check for list_directory Signed-off-by: cafalchio <maolivei@tcd.ie> * Added sandbox check for edit_file and move_file - check and canonicalize destination parent Signed-off-by: cafalchio <maolivei@tcd.ie> * Apply sandbox checks for read_file and read_multiple_files Signed-off-by: cafalchio <maolivei@tcd.ie> * Changed sandbox initialization from global to context - removed global sandbox - initialize sandbox in main and pass to each function Signed-off-by: cafalchio <maolivei@tcd.ie> * Added tests for searc_files and list_directories - test for symlinks - test for path outside roots - > 95% coverage - formatted using fmt Signed-off-by: cafalchio <maolivei@tcd.ie> * Formatted files using cargo fmt Signed-off-by: cafalchio <maolivei@tcd.ie> * Added tests for get_file_info Signed-off-by: cafalchio <maolivei@tcd.ie> * Added test coverage for read_file and read_multiple_files Signed-off-by: cafalchio <maolivei@tcd.ie> * Added unit test coverage for write_file and create_directory Signed-off-by: cafalchio <maolivei@tcd.ie> * Improve test coverage for edit_file and move_file Signed-off-by: cafalchio <maolivei@tcd.ie> * format edit.rs Signed-off-by: cafalchio <maolivei@tcd.ie> * Added test coverage for sandbox Signed-off-by: cafalchio <maolivei@tcd.ie> * Improve server runner - addded sever gracefully shutdown - Declare Config values in main - Improve server logs Signed-off-by: cafalchio <maolivei@tcd.ie> * Improved logs for list_allowed_directories and linted file Signed-off-by: cafalchio <maolivei@tcd.ie> * Improve test coverage for server - iimproved logs in server Signed-off-by: cafalchio <maolivei@tcd.ie> * Linted using cargo clippy Signed-off-by: cafalchio <maolivei@tcd.ie> * Simplified main.rs and moved server code to lib.rs for integration tests Signed-off-by: cafalchio <maolivei@tcd.ie> * Added integration tests to simulate workflows - file and folder manipulation workflow - permission and metadata workflow - search and organise workflow - server tests Signed-off-by: cafalchio <maolivei@tcd.ie> * formatted and clippped integration tests Signed-off-by: cafalchio <maolivei@tcd.ie> * added result where it was missing in tool call result - rename all outputs to result Signed-off-by: cafalchio <maolivei@tcd.ie> * Normalized write file and create directory output Signed-off-by: cafalchio <maolivei@tcd.ie> * Structured search and write outputs - update tests Signed-off-by: cafalchio <maolivei@tcd.ie> * Fixed write result not showing errors correctly - server will return WriteResult - create_directory return err or string - fixed test create_directory tests Signed-off-by: cafalchio <maolivei@tcd.ie> * Normalized output result for write_file Signed-off-by: cafalchio <maolivei@tcd.ie> * Normalized tool result outpus - Keep consistency between tools - Return MPC error or success - reorganised tools between files - updated tests Signed-off-by: cafalchio <maolivei@tcd.ie> * Added server start banner Signed-off-by: cafalchio <maolivei@tcd.ie> * fixed main receiving multiple roots Signed-off-by: cafalchio <maolivei@tcd.ie> * Clipped and formatted Signed-off-by: cafalchio <maolivei@tcd.ie> * Removed justfile and added Makefile Signed-off-by: cafalchio <maolivei@tcd.ie> * Added correct readme Signed-off-by: cafalchio <maolivei@tcd.ie> * Added dockerfile Signed-off-by: cafalchio <maolivei@tcd.ie> * Improved tracing logs Signed-off-by: cafalchio <maolivei@tcd.ie> * Update test coverage and binary size in readme Signed-off-by: cafalchio <maolivei@tcd.ie> * Removed umused dependency Signed-off-by: cafalchio <maolivei@tcd.ie> * Updated cargo dependencies Signed-off-by: cafalchio <maolivei@tcd.ie> * Updted deprecated InitializedRequestParam Signed-off-by: cafalchio <maolivei@tcd.ie> * fix: correct error messages and documentation in filesystem server - Fix misleading error messages in server.rs that said "Error writing file" when the actual operation was read, move, or get_file_info - Update README to accurately reflect that move_file overwrites destination (previously incorrectly stated it fails) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: cafalchio <maolivei@tcd.ie> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…ts (IBM#2345) * Fix proxy authentication Signed-off-by: Mohan Lakshmaiah <mohalaks@in.ibm.com> * Fix pylint errors Signed-off-by: Mohan Lakshmaiah <mohalaks@in.ibm.com> * fix: Correct lint issues in proxy auth tests - Add missing blank line between test classes - Remove unused jwt import - Fix excess blank lines Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Include plugin context in proxy auth for cross-hook sharing Add plugin_context_table and plugin_global_context to proxy authentication paths, matching the JWT authentication path. This ensures HTTP_AUTH_CHECK_PERMISSION hooks can access context set by HTTP_PRE_REQUEST hooks when using proxy authentication. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Address security concerns in proxy authentication 1. RBAC now checks auth_required when proxy header missing - Returns 401 for API requests, 302 redirect for browsers - Aligns HTTP behavior with WebSocket auth 2. Block anonymous users from token management - Add auth_method=="anonymous" to _require_interactive_session - Prevents token access when proxy header missing 3. Lookup proxy user admin status from database - Check platform_admin_email for admin match - Query EmailUser table for is_admin status - Enables plugin permission hooks to work correctly Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Align require_auth with RBAC proxy enforcement Update require_auth to check auth_required when proxy header is missing, matching the RBAC/WebSocket behavior. Previously returned "anonymous" even when auth_required=true. - Raise 401 when mcp_client_auth_enabled=false and no proxy header if auth_required=true - Update tests to cover both auth_required=true and false cases Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mohan Lakshmaiah <mohalaks@in.ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mohan Lakshmaiah <mohalaks@in.ibm.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

* fix: FastMCP compatibility * fix: normalize issuer URL for metadata validation and caching The original trailing slash fix introduced a bug where the issuer validation would fail when the server returned an issuer without trailing slash but the client passed one (or vice versa). Changes: - Normalize both the input issuer and metadata issuer for comparison - Use normalized issuer as cache key for consistent cache lookup - Add tests for trailing slash normalization scenarios - Update test to expect refresh_token in grant_types Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: complete issuer normalization and conditional refresh_token Address review feedback: 1. Normalize issuer consistently across the entire DCR flow: - Allowlist validation uses normalized comparison - Storage uses normalized issuer - Lookup uses normalized issuer 2. Make refresh_token conditional on AS support: - Check grant_types_supported in AS metadata - Only request refresh_token if AS advertises support 3. Fix grant_types fallback: - Use requested grant_types as fallback when AS response omits them - Previously hardcoded to ["authorization_code"] which dropped refresh_token 4. Add comprehensive tests: - Test refresh_token inclusion when AS supports it - Test grant_types fallback behavior - Test allowlist trailing slash normalization Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: handle null grant_types_supported and add issuer normalization migration Address additional review findings: 1. Fix TypeError when grant_types_supported is explicit null: - Use `metadata.get("grant_types_supported") or []` instead of `metadata.get("grant_types_supported", [])` - The latter returns None when key exists with null value 2. Add configurable permissive refresh_token mode: - New setting: dcr_request_refresh_token_when_unsupported - Default: False (strict mode - only request if AS advertises support) - When True: request refresh_token if AS omits grant_types_supported 3. Add Alembic migration to normalize legacy issuer values: - Strips trailing slashes from registered_oauth_clients.issuer - Idempotent and works with SQLite and PostgreSQL - Prevents duplicate registrations from legacy rows 4. Add comprehensive tests: - Test explicit null grant_types_supported handling - Test permissive refresh_token mode Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs: add DCR_REQUEST_REFRESH_TOKEN_WHEN_UNSUPPORTED to documentation Update documentation for new DCR refresh token configuration option: - README.md: Add to DCR settings table - charts/mcp-stack/values.yaml: Add with comment - charts/mcp-stack/README.md: Regenerated via helm-docs - docs/docs/manage/dcr.md: Add env var and behavior note - docs/docs/config.schema.json: Add schema definition Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* feat: add ARM64 load testing support - Add build section for fast-time-server to support ARM64 architecture - Use pre-built ghcr.io image by default for x86_64 performance - ARM64 users can build locally via environment variable override - Fix Dockerfile to use TARGETARCH for proper cross-compilation Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Lint Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…BM#2314) Signed-off-by: Satya <tsp.0713@gmail.com>

…ched Templates (IBM#2333) * Optimize SQLite JSON tag filtering with deterministic binds and cached templates Signed-off-by: Satya <tsp.0713@gmail.com> * feat: add tag filtering support to list resources template in main apis(non-template) Signed-off-by: Satya <tsp.0713@gmail.com> * removed unused fields - page, limit from list resource template from resource services Signed-off-by: Satya <tsp.0713@gmail.com> * fix: remove debug print statements from tool_service.py Remove debugging print statements that were accidentally left in the tag filtering code path. These were outputting query details to stdout which is not appropriate for production code. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Lint Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: use column-specific bind prefixes to prevent parameter collision When multiple json_contains_tag_expr calls are combined in the same query (e.g., filtering on tags from different columns), the fixed bind names (:p0, :p1) would collide and overwrite parameters. This fix adds column-specific prefixes to bind parameter names (e.g., :tools_tags_p0, :resources_tags_p0) to ensure uniqueness when composing multiple tag filter predicates. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test: add coverage for json_contains_tag_expr and resource template filters Add comprehensive tests for: - _sanitize_col_prefix helper function - json_contains_tag_expr for SQLite with match_any and match_all - Bind parameter collision prevention when combining multiple tag filters - LRU caching of SQL templates - New list_resource_templates filtering parameters (tags, visibility, include_inactive) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: use thread-safe counter for fully unique bind prefixes Address edge cases where bind parameters could still collide: 1. Same column filtered multiple times in one query 2. Different column refs that sanitize to identical strings (e.g., "a_b.c" and "a.b_c" both become "a_b_c") Replace static column-based prefix with a thread-safe counter that generates truly unique prefixes per call (e.g., "tools_tags_42_p0"). This removes the LRU caching of templates since each call now has a unique prefix, but ensures correctness in all edge cases. Add test for same-column collision scenario. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Satya <tsp.0713@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* optimize response_cache_by_prompt lookup with inverted index Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * fix type hint Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * flake8 fixes Signed-off-by: Shoumi <shoumimukherjee@gmail.com> * test: add unit tests for response_cache_by_prompt inverted index Add comprehensive test coverage for the inverted index optimization: - Tokenization and vectorization functions - Basic cache store and hit functionality - Inverted index population and candidate filtering - Eviction and index rebuild scenarios - Max entries cap with index consistency Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Shoumi <shoumimukherjee@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* feat: Add Gateway permission constants Add GATEWAYS_CREATE, GATEWAYS_READ, GATEWAYS_UPDATE, and GATEWAYS_DELETE permission constants to the Permissions class for consistency with other resource types (tools, resources, prompts, servers). Note: The original PR IBM#2186 attempted to fix issue IBM#2185 by modifying the visibility query logic, but that change was incorrect. The team filter should only show resources BELONGING to the filtered team, not all public resources globally. See todo/rbac.md for documentation. Issue IBM#2185 needs further investigation - the reported bug may have a different root cause. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * feat: Add gateway permission patterns to token scoping middleware Add gateway routes to token scoping middleware for consistent permission enforcement: - Add gateway pattern to _RESOURCE_PATTERNS for ID extraction - Add gateway CRUD patterns to _PERMISSION_PATTERNS: - POST /gateways (exact) -> gateways.create - POST /gateways/{id}/... (sub-resources) -> gateways.update - PUT/DELETE -> gateways.update/delete - Add gateway handling in _check_resource_team_ownership: - Public: accessible by all - Team: accessible by team members - Private: owner-only access (per RBAC doc) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Enforce owner-only access for private visibility across all resources Per RBAC doc, private visibility means "owner only" - not "team members". Fixed private visibility checks for all resource types to validate owner_email == requester instead of team membership: - Servers - Tools - Resources - Prompts - Gateways (already correct from previous commit) This aligns token scoping middleware with the documented RBAC model. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test: Add tests for gateway permissions and visibility RBAC Add unit tests covering: - Gateway permission patterns (POST create vs POST update sub-resources) - Private visibility enforces owner-only access - Team visibility allows team members only - Public visibility allows all authenticated users These tests validate the RBAC fixes in token scoping middleware. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* feat-2187: add additional default roles while bootstrap Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> * feat-2187: fix lint issues Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> * feat-2187: fixing review comments Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> * feat-2187: fixing review comments Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> * feat-2187: test fix Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> * fix: Improve bootstrap roles validation and documentation Fixes identified by code review: 1. Path resolution: Fixed parent.parent.parent -> parent.parent to correctly resolve project root from mcpgateway/bootstrap_db.py 2. JSON validation: Added validation that loaded JSON is a list of dicts with required keys (name, scope, permissions). Invalid entries are skipped with warnings instead of crashing bootstrap. 3. Improved logging: Log all attempted paths when file not found Added tests: - test_bootstrap_roles_with_dict_instead_of_list: Validates error when JSON is a dict instead of array - test_bootstrap_roles_with_missing_required_keys: Validates warning when roles are missing required fields Added documentation: - docs/docs/manage/rbac.md: New "Bootstrap Custom Roles" section with configuration examples for Docker Compose and Kubernetes - docs/docs/architecture/adr/036-bootstrap-custom-roles.md: ADR documenting the feature design, error handling, and security considerations Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Make description and is_system_role optional for bootstrap roles ChatGPT review identified that description and is_system_role were accessed unconditionally via role_def["key"], causing KeyError for minimal roles. Fix: - Use role_def.get("description", "") with empty string default - Use role_def.get("is_system_role", False) with False default Added test: - test_bootstrap_roles_with_minimal_valid_role: Verifies a role with only required fields (name, scope, permissions) is created successfully with correct defaults for optional fields Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Nithin Katta <Nithin.Katta@ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Nithin Katta <Nithin.Katta@ibm.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…y blockers (IBM#2394) * Remove last 2 security issues from Sonarqube Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com> * Remove 5 of 8 blocker maintainability issues Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com> * Correct linting errors Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com> --------- Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>

…ad (IBM#2157) * perf(crypto): offload Argon2/Fernet to threadpool via asyncio.to_thread Add async wrappers (hash_password_async, verify_password_async, encrypt_secret_async, decrypt_secret_async) and update all call sites to use them, preventing event loop blocking during CPU-intensive crypto operations. Closes IBM#1836 Signed-off-by: ESnark <31977180+ESnark@users.noreply.github.com> * fix(tests): update tests for async crypto operations Update test mocks to use async versions of password service and encryption service methods (hash_password_async, verify_password_async, encrypt_secret_async) following the changes in the crypto offload PR. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(sso): add missing await for async create/update provider methods The crypto offload PR made SSOService.create_provider() and update_provider() async, but forgot to update call sites: - mcpgateway/routers/sso.py: add await in admin endpoints - mcpgateway/utils/sso_bootstrap.py: convert to async, add awaits - mcpgateway/main.py: make attempt_to_bootstrap_sso_providers async Without this fix, the router endpoints would return coroutine objects instead of provider objects, causing runtime errors (500) when accessing provider.id. The bootstrap would silently skip provider creation with "coroutine was never awaited" warnings. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test(crypto): add tests for async crypto wrappers and SSO bootstrap Add test coverage for the async crypto operations introduced by the crypto offload PR: - test_async_crypto_wrappers.py: Tests for hash_password_async, verify_password_async, encrypt_secret_async, decrypt_secret_async including roundtrip verification and sync/async compatibility - test_sso_bootstrap.py: Tests for async SSO bootstrap ensuring create_provider and update_provider are properly awaited Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: ESnark <31977180+ESnark@users.noreply.github.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* chore-2193: add Rocky Linux setup script Add setup script for Rocky Linux and RHEL-compatible distributions. Adapts the Ubuntu setup script with the following changes: - Use dnf package manager instead of apt - Docker CE installation via RHEL repository - OS detection for Rocky, RHEL, CentOS, and AlmaLinux - Support for x86_64 and aarch64 architectures Closes IBM#2193 Signed-off-by: Jonathan Springer <jps@s390x.com> * chore-2193: add Docker login check before compose-up Check if Docker is logged in before running docker-compose to avoid image pull failures. If not logged in, prompt user with options: - Interactive login (username/password prompts) - Username with password from stdin (for automation) - Skip login (continue without authentication) Supports custom registry URLs for non-Docker Hub registries. Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: add non-interactive mode and git repo check to setup scripts Apply to both Rocky and Ubuntu setup scripts: - Add -y/--yes flag for fully non-interactive operation - Check for .git directory before running git pull - Fail fast with clear error if directory exists but isn't a git repo - Auto-confirm prompts in non-interactive mode - Exit with error on unsupported OS in non-interactive mode Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Linting Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix-2360: prevent asyncio CPU spin loop after SSE client disconnect Root cause: Fire-and-forget asyncio.create_task() patterns left orphaned tasks that caused anyio _deliver_cancellation to spin at 100% CPU per worker. Changes: - Add _respond_tasks dict to track respond tasks by session_id - Cancel respond tasks explicitly before session cleanup in remove_session() - Cancel all respond tasks during shutdown() - Pass disconnect callback to SSE transport for defensive cleanup - Convert database backend from fire-and-forget to structured concurrency The fix ensures all asyncio tasks are properly tracked, cancelled on disconnect, and awaited to completion, preventing orphaned tasks from spinning the event loop. Closes IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: additional fixes for CPU spin loop after SSE disconnect Follow-up fixes based on testing and review: 1. Cancellation timeout escalation (Finding 1): - _cancel_respond_task() now escalates on timeout by calling transport.disconnect() - Retries cancellation after escalation - Always removes task from tracking to prevent buildup 2. Redis respond loop exit path (Finding 2): - Changed from infinite pubsub.listen() to timeout-based get_message() polling - Added session existence check - loop exits if session removed - Allows loop to exit even without cancellation 3. Generator finally block cleanup (Finding 3): - Added on_disconnect_callback() in event_generator() finally block - Covers: CancelledError, GeneratorExit, exceptions, and normal completion - Idempotent - safe if callback already ran from on_client_close 4. Added load-test-spin-detector make target: - Spike/drop pattern to stress test session cleanup - Docker stats monitoring at each phase - Color-coded output with pass/fail indicators - Log file output to /tmp Closes IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: fix race condition in sse_endpoint and add stuck task tracking Finding 1 (HIGH): Fixed race condition in sse_endpoint where respond task was created AFTER create_sse_response(). If client disconnected during response setup, the disconnect callback ran before the task existed, leaving it orphaned. Now matches utility_sse_endpoint ordering: 1. Compute user_with_token 2. Create and register respond task 3. Call create_sse_response() Finding 2 (MEDIUM): Added _stuck_tasks dict to track tasks that couldn't be cancelled after escalation. Previously these were dropped from tracking entirely, losing visibility. Now they're moved to _stuck_tasks for monitoring and final cleanup during shutdown(). Updated tests to verify escalation behavior. Closes IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: add SSE failure cleanup, stuck task reaper, and full load test Finding 1 (HIGH): Fixed orphaned respond task when create_sse_response() fails. Added try/except around create_sse_response() in both sse_endpoint and utility_sse_endpoint - on failure, calls remove_session() to clean up the task and session before re-raising. Finding 2 (MEDIUM): Added stuck task reaper that runs every 30 seconds to: - Remove completed tasks from _stuck_tasks - Retry cancellation for still-stuck tasks - Prevent memory leaks from tasks that eventually complete Finding 3 (LOW): Added test for escalation path with fake transport to verify transport.disconnect() is called during escalation. Also added tests for the stuck task reaper lifecycle. Also updated load-test-spin-detector to be a full-featured test matching load-test-ui with JWT auth, all user classes, entity ID fetching, and the same 4000-user baseline. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: improve load-test-spin-detector output and reduce cycle sizes - Reduce logging level to WARNING to suppress noisy worker messages - Only run entity fetching and cleanup on master/standalone nodes - Reduce cycle sizes from 4000 to 1000 peak users for faster iteration - Update banner to reflect new cycle pattern (500 -> 750 -> 1000) - Remove verbose JWT token generation log Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: address remaining CPU spin loop findings Finding 1 (HIGH): Add explicit asyncio.CancelledError handling in SSE endpoints. In Python 3.8+, CancelledError inherits from BaseException, not Exception, so the previous except block wouldn't catch it. Now cleanup runs even when requests are cancelled during SSE handshake. Finding 2 (MEDIUM): Add sleep(0.1) when Redis get_message returns None to prevent tight loop. The loop now has guaranteed minimum sleep even when Redis returns immediately in certain states. Finding 3 (MEDIUM): Add _closing_sessions set to allow respond loops to exit early. remove_session() now marks the session as closing BEFORE attempting task cancellation, so the respond loop (Redis and DB backends) can exit immediately without waiting for the full cancellation timeout. Finding 4 (LOW): Already addressed in previous commit with test test_cancel_respond_task_escalation_calls_transport_disconnect. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: make load-test-spin-detector run unlimited cycles - Cycles now repeat indefinitely instead of stopping after 5 - Fixed log file path to /tmp/spin_detector.log for easy monitoring - Added periodic summary every 5 cycles showing PASS/WARN/FAIL counts - Cycle numbering now shows total count and pattern letter (e.g., "CYCLE 6 (A)") - Banner shows monitoring command: tail -f /tmp/spin_detector.log Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: add asyncio.CancelledError to SSE endpoint Raises docs Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Linting Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: remove redundant asyncio.CancelledError handlers CancelledError inherits from BaseException in Python 3.8+, so it won't be caught by 'except Exception' handlers. The explicit handlers were unnecessary and triggered pylint W0706 (try-except-raise). Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: restore asyncio.CancelledError in Raises docs for inner handlers Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix-2360: add sleep on non-message Redis pubsub types to prevent spin Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(pubsub): replace blocking listen() with timeout-based get_message() The blocking `async for message in pubsub.listen()` pattern doesn't respond to asyncio cancellation properly. When anyio's cancel scope tries to cancel tasks using this pattern, the tasks don't respond because the async iterator is blocked waiting for Redis messages. This causes anyio's `_deliver_cancellation` to continuously reschedule itself with `call_soon()`, creating a CPU spin loop that consumes 100% CPU per affected worker. Changed to timeout-based polling pattern: - Use `get_message(timeout=1.0)` with `asyncio.wait_for()` - Loop allows cancellation check every ~1 second - Added sleep on None/non-message responses to prevent edge case spins Files fixed: - mcpgateway/services/cancellation_service.py - mcpgateway/services/event_service.py Closes IBM#2360 (partial - additional spin sources may exist) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(cleanup): add timeouts to __aexit__ calls to prevent CPU spin loops The MCP session/transport __aexit__ methods can block indefinitely when internal tasks don't respond to cancellation. This causes anyio's _deliver_cancellation to spin in a tight loop, consuming ~800% CPU. Root cause: When calling session.__aexit__() or transport.__aexit__(), they attempt to cancel internal tasks (like post_writer waiting on memory streams). If these tasks don't respond to CancelledError, anyio's cancel scope keeps calling call_soon() to reschedule _deliver_cancellation, creating a CPU spin loop. Changes: - Add SESSION_CLEANUP_TIMEOUT constant (5 seconds) to mcp_session_pool.py - Wrap all __aexit__ calls in asyncio.wait_for() with timeout - Add timeout to pubsub cleanup in session_registry.py and registry_cache.py - Add timeout to streamable HTTP context cleanup in translate.py This is a continuation of the fix for issue IBM#2360. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * feat(config): make session cleanup timeout configurable Add MCP_SESSION_POOL_CLEANUP_TIMEOUT setting (default: 5.0 seconds) to control how long cleanup operations wait for session/transport __aexit__ calls to complete. Clarification: This timeout does NOT affect tool execution time (which uses TOOL_TIMEOUT). It only affects cleanup of idle/released sessions to prevent CPU spin loops when internal tasks don't respond to cancel. Changes: - Add mcp_session_pool_cleanup_timeout to config.py - Add MCP_SESSION_POOL_CLEANUP_TIMEOUT to .env.example with docs - Add to charts/mcp-stack/values.yaml - Update mcp_session_pool.py to use _get_cleanup_timeout() helper - Update session_registry.py and registry_cache.py to use config - Update translate.py to use config with fallback When to adjust: - Increase if you see frequent "cleanup timed out" warnings in logs - Decrease for faster shutdown (at risk of resource leaks) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(sse): add deadline to cancel scope to prevent CPU spin loop Fixes CPU spin loop (anyio#695) where _deliver_cancellation spins at 100% CPU when SSE task group tasks don't respond to cancellation. Root cause: When an SSE connection ends, sse_starlette's task group tries to cancel all tasks. If a task (like _listen_for_disconnect waiting on receive()) doesn't respond to cancellation, anyio's _deliver_cancellation keeps rescheduling itself in a tight loop. Fix: Override EventSourceResponse.__call__ to set a deadline on the cancel scope when cancellation starts. This ensures that if tasks don't respond within SSE_TASK_GROUP_CLEANUP_TIMEOUT (5 seconds), the scope times out instead of spinning indefinitely. References: - agronholm/anyio#695 - anthropics/claude-agent-sdk-python#378 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(translate): use patched EventSourceResponse to prevent CPU spin translate.py was importing EventSourceResponse directly from sse_starlette, bypassing the patched version in sse_transport.py that prevents the anyio _deliver_cancellation CPU spin loop (anyio#695). This change ensures all SSE connections in the translate module (stdio-to-SSE bridge) also benefit from the cancel scope deadline fix. Relates to: IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(cleanup): reduce cleanup timeouts from 5s to 0.5s With many concurrent connections (691 TCP sockets observed), each cancelled SSE task group spinning for up to 5 seconds caused sustained high CPU usage. Reducing the timeout to 0.5s minimizes CPU waste during spin loops while still allowing normal cleanup to complete. The cleanup timeout only affects cleanup of cancelled/released connections, not normal operation or tool execution time. Changes: - SSE_TASK_GROUP_CLEANUP_TIMEOUT: 5.0 -> 0.5 seconds - mcp_session_pool_cleanup_timeout: 5.0 -> 0.5 seconds - Updated .env.example and charts/mcp-stack/values.yaml Relates to: IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * refactor(cleanup): make SSE cleanup timeout configurable with safe defaults - Add SSE_TASK_GROUP_CLEANUP_TIMEOUT setting (default: 5.0s) - Make sse_transport.py read timeout from config via lazy loader - Keep MCP_SESSION_POOL_CLEANUP_TIMEOUT at 5.0s default - Override both to 0.5s in docker-compose.yml for testing The 5.0s default is safe for production. The 0.5s override in docker-compose.yml allows testing aggressive cleanup to verify it doesn't affect normal operation. Relates to: IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(gunicorn): reduce max_requests to recycle stuck workers The MCP SDK's internal anyio task groups don't respond to cancellation properly, causing CPU spin loops in _deliver_cancellation. This spin happens inside the MCP SDK (streamablehttp_client, sse_client) which we cannot patch. Reduce GUNICORN_MAX_REQUESTS from 10M to 5K to ensure workers are recycled frequently, cleaning up any accumulated stuck task groups. Root cause chain observed: 1. PostgreSQL idle transaction timeout 2. Gateway state change failures 3. SSE connections terminated 4. MCP SDK task groups spin (anyio#695) This is a workaround until the MCP SDK properly handles cancellation. Relates to: IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Linting Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(anyio): monkey-patch _deliver_cancellation to prevent CPU spin Root cause: anyio's _deliver_cancellation has no iteration limit. When tasks don't respond to CancelledError, it schedules call_soon() callbacks indefinitely, causing 100% CPU spin (anyio#695). Solution: - Monkey-patch CancelScope._deliver_cancellation to track iterations - Give up after 100 iterations and log warning - Clear _cancel_handle to stop further call_soon() callbacks Also switched from asyncio.wait_for() to anyio.move_on_after() for MCP session cleanup, which better propagates cancellation through anyio's cancel scope system. Trade-off: If cancellation gives up after 100 iterations, some tasks may not be properly cancelled. However, GUNICORN_MAX_REQUESTS=5000 worker recycling will eventually clean up orphaned tasks. Closes IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * refactor(anyio): make _deliver_cancellation patch optional and disabled by default The anyio monkey-patch is now feature-flagged and disabled by default: - ANYIO_CANCEL_DELIVERY_PATCH_ENABLED=false (default) - ANYIO_CANCEL_DELIVERY_MAX_ITERATIONS=100 This allows testing performance with and without the patch, and easy rollback if upstream anyio/MCP SDK fixes the issue. Added: - Config settings for enabling/disabling the patch - apply_anyio_cancel_delivery_patch() function for explicit control - remove_anyio_cancel_delivery_patch() to restore original behavior - Documentation in .env.example and docker-compose.yml To enable: set ANYIO_CANCEL_DELIVERY_PATCH_ENABLED=true Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs: add comprehensive CPU spin loop mitigation documentation (IBM#2360) Add multi-layered documentation for CPU spin loop mitigation settings across all configuration files. This ensures operators understand and can tune the workarounds for anyio#695. Changes: - .env.example: Add Layer 1/2/3 headers with cross-references to docs and issue IBM#2360, document all 6 mitigation variables - README.md: Expand "CPU Spin Loop Mitigation" section with all 3 layers, configuration tables, and tuning tips - docker-compose.yml: Consolidate all mitigation variables into one section with SSE protection (Layer 1), cleanup timeouts (Layer 2), and experimental anyio patch (Layer 3) - charts/mcp-stack/values.yaml: Add comprehensive mitigation section with layer documentation and cross-references - docs/docs/operations/cpu-spin-loop-mitigation.md: NEW - Full guide with root cause analysis, 4-layer defense diagram, configuration tables, diagnostic commands, and tuning recommendations - docs/docs/.pages: Add Operations section to navigation - docs/docs/operations/.pages: Add nav for operations docs Mitigation variables documented: - Layer 1: SSE_SEND_TIMEOUT, SSE_RAPID_YIELD_WINDOW_MS, SSE_RAPID_YIELD_MAX - Layer 2: MCP_SESSION_POOL_CLEANUP_TIMEOUT, SSE_TASK_GROUP_CLEANUP_TIMEOUT - Layer 3: ANYIO_CANCEL_DELIVERY_PATCH_ENABLED, ANYIO_CANCEL_DELIVERY_MAX_ITERATIONS Related: IBM#2360, anyio#695, claude-agent-sdk#378 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * feat(loadtest): aggressive spin detector with configurable timings Update spin detector load test for faster issue reproduction: - Increase user counts: 4000 → 4000 → 10000 pattern - Fast spawn rate: 1000 users/s - Shorter wait times: 0.01-0.1s between requests - Reduced connection timeouts: 5s (fail fast) Related: IBM#2360 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * compose mitigation Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * load test Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Defaults Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Defaults Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs: add docstring to cancel_on_finish for interrogate coverage Add docstring to nested cancel_on_finish function in EventSourceResponse.__call__ to achieve 100% interrogate coverage. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

IBM#2507) Updates unique constraints for Resources and Prompts tables to support Gateway-level namespacing. Previously, these entities enforced uniqueness globally per Team/Owner (team_id, owner_email, uri/name). This prevented users from registering the same Gateway multiple times with different names. Changes: - Add gateway_id to unique constraints for resources and prompts - Add partial unique indexes for local items (where gateway_id IS NULL) - Make migration idempotent with proper existence checks Closes IBM#2352 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…BM#2517) * fix(transport): support mixed content types from MCP server tool call response Closes IBM#2512 This fix addresses tool invocation failures for tools that return complex content types (like ResourceLink, ImageContent, AudioContent) or contain Pydantic-specific types like AnyUrl. Root causes fixed: 1. tool_service.py: Usage of model_dump() without mode='json' preserved pydantic.AnyUrl objects, violating internal model's str type constraints. 2. streamablehttp_transport.py: Code blindly assumed types.TextContent, accessing .text on every item, which crashed for ResourceLink or ImageContent. Changes: - Updated tool_service.py to use model_dump(by_alias=True, mode='json'), forcing conversion of AnyUrl to JSON-compatible strings. - Refactored streamablehttp_transport.py to inspect content.type and correctly map to proper MCP SDK types (TextContent, ImageContent, AudioContent, ResourceLink, EmbeddedResource) ensuring full protocol compatibility. - Updated return type annotation to include all MCP content types. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(transport): preserve metadata in mixed content type conversion Addresses dropped metadata fields identified in PR IBM#2517 review: - Preserve annotations and _meta for TextContent, ImageContent, AudioContent - Preserve size and _meta for ResourceLink (critical for file metadata) - Handle EmbeddedResource via model_validate Add comprehensive regression tests for: - Mixed content types (text, image, audio, resource_link, embedded) - Metadata preservation (annotations, _meta, size) - Unknown content type fallback - Missing optional metadata handling Closes IBM#2512 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(transport): convert gateway Annotations to dict for MCP SDK compatibility mcpgateway.common.models.Annotations is a different Pydantic class from mcp.types.Annotations. Passing gateway Annotations directly to MCP SDK types causes ValidationError at runtime when real MCP responses include annotations. Fix: - Add _convert_annotations() helper to convert gateway Annotations to dict - Add _convert_meta() helper for consistent meta handling - Apply conversion to all content types (text, image, audio, resource_link) Add regression tests using actual gateway model types: - test_call_tool_with_gateway_model_annotations - test_call_tool_with_gateway_model_image_annotations These tests use mcpgateway.common.models.TextContent/ImageContent with mcpgateway.common.models.Annotations to verify the conversion works. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test(tool_service): add AnyUrl serialization tests for mode='json' fix Add explicit tests for the AnyUrl serialization fix (Issue IBM#2512 root cause): - test_anyurl_serialization_without_mode_json - demonstrates the problem - test_anyurl_serialization_with_mode_json - verifies the fix - test_resource_link_anyurl_serialization - ResourceLink uri field - test_tool_result_with_resource_link_serialization - ToolResult with ResourceLink - test_mixed_content_with_anyurl_serialization - mixed content types These tests verify that mode='json' in model_dump() correctly serializes AnyUrl objects to strings, preventing validation errors when content is passed to MCP SDK types. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs(transport): add docstrings to _convert_annotations and _convert_meta Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs(transport): add Args/Returns to helper function docstrings Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Add user information (email, full_name, is_admin) to the plugin global context, enabling plugins like Cedar RBAC to make access control decisions based on user attributes beyond just email. Changes: - Add _inject_userinfo_instate() function to auth.py that populates global_context.user as a dictionary when include_user_info is enabled - Update GlobalContext.user type to Union[str, dict] for backward compat - Add include_user_info config option to plugin_settings (default: false) - Prevent tool_service from overwriting user dict with string email The feature is disabled by default to maintain backward compatibility with existing plugins that expect global_context.user to be a string. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

…BM#2529) * Add profling tools, memray Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * Add profling tools, memray Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(db): release DB sessions before external HTTP calls to prevent pool exhaustion This commit addresses issue IBM#2518 where DB connection pool exhaustion occurred during A2A and RPC tool calls due to sessions being held during slow upstream HTTP requests. Changes: - tool_service.py: Extract A2A agent data to local variables before calling db.commit(), allowing HTTP calls to proceed without holding the DB session. The A2A tool invocation logic now uses pre-extracted data instead of querying during the HTTP call phase. - rbac.py: Add db.commit() and db.close() calls before returning user context in all authentication paths (proxy, anonymous, disabled auth). This ensures DB sessions are released early and not held during subsequent request processing. - test_rbac.py: Update test to provide mock db parameter and verify that db.commit() and db.close() are called for proper session cleanup. The fix follows the pattern established in other services: extract all needed data from ORM objects, call db.commit() to release the transaction, then proceed with external HTTP calls. This prevents "idle in transaction" states that exhaust PgBouncer's connection pool under high load. Load test results (4000 concurrent users, 1M+ requests): - Success rate: 99.81% - 502 errors reduced to 0.02% (edge cases with very slow upstreams) - P50: 450ms, P95: 4300ms Closes IBM#2518 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * perf(config): tune connection pools for high concurrency Based on profiling with 4000 concurrent users (~2000 RPS): - MCP_SESSION_POOL_MAX_PER_KEY: 50 → 200 (reduce session creation) - IDLE_TRANSACTION_TIMEOUT: 120s → 300s (handle slow MCP calls) - CLIENT_IDLE_TIMEOUT: 120s → 300s (align with transaction timeout) - HTTPX_MAX_CONNECTIONS: 200 → 500 (more outbound capacity) - HTTPX_MAX_KEEPALIVE_CONNECTIONS: 100 → 300 - REDIS_MAX_CONNECTIONS: 150 → 100 (stay under maxclients) Results: - Failure rate: 0.446% → 0.102% (4.4x improvement) - RPC latency: 3,014ms → 1,740ms (42% faster) - CRUD latency: 1,207ms → 508ms (58% faster) See: todo/profile-full.md for detailed analysis Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* fix(helm): stabilize chart templates and configs Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(helm): align migration job with bootstrap Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs(helm): refresh chart README Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* docs: sync env defaults and references Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs: sync env templates and performance tuning Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* chore: stabilize coverage target Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: reduce test warnings Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: reduce test startup costs Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: resolve bandit warning Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* test(playwright): handle admin password change Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test(playwright): stabilize admin UI flows Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…sconfiguration (IBM#3033) * x-vault-tokens is not propagated even if the system is misscomfigured Signed-off-by: popagruia <adrian.popa@ro.ibm.com> * fix(vault): strip X-Vault-Tokens header when system tag is missing and validate parsed token type - Strip vault header even when system cannot be determined from gateway metadata - Validate that parsed vault tokens are a JSON object (dict), not array/string/etc. - Remove misleading type annotation on orjson.loads result - Update test to verify header stripping on missing system tag Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test: add coverage for no-system-tag and non-dict vault tokens paths Add two new test cases: - test_no_system_tag_no_vault_header_returns_empty: covers the branch where system key cannot be determined and no vault header is present - test_non_dict_json_vault_tokens_stripped: covers the isinstance guard ensuring non-dict JSON (e.g. arrays) strips the vault header without injecting auth headers Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: popagruia <adrian.popa@ro.ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: popagruia <adrian.popa@ro.ibm.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…link and config hack (IBM#3022) * fix: allow external plugins to be tested w/o symbolic link and config hack Signed-off-by: habeck <habeck@us.ibm.com> * fix: add None check for get_plugin return value in performance profiler Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: habeck <habeck@us.ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

Git-based plugin sources do not support verification, causing `helm plugin install` to fail with "plugin source does not support verification" even when passing `--verify=false`. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* feat: add playwright tests for Users page Signed-off-by: Marek Dano <mk.dano@gmail.com> * fix: address review findings in Playwright Users page tests - Fix user_has_badge() to use exact text matching (get_by_text with exact=True) instead of substring matching that could match "Active" within "Inactive" - Fix pydocstyle D205/D400 in module docstrings - Handle nullable text_content() return in test_edit_user - Assert deactivate response status in test_activate_user setup step - Apply black formatting Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Marek Dano <mk.dano@gmail.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…M#3058) Uncommented the .hidden { display: none; } rule in admin.css that was causing a Flash of Unstyled Content (FOUC) on page load. Until Tailwind CSS loads via CDN/JIT, the hidden class had no effect, causing all tab panels to be briefly visible. This was especially noticeable with the ToolOps panel which has hx-trigger="load" and starts fetching content immediately. Closes IBM#2933 Signed-off-by: SuciuDaniel <Daniel.Vasile.Suciu@ibm.com>

…igurable limits (IBM#2985) * feat(auth): add EntraID group overage Graph fallback limits Implement Microsoft Graph fallback for EntraID group overage in SSO flow, with configurable enable/timeout/max-group cap, expanded overage marker detection, bootstrap wiring, docs/config schema updates, and unit coverage.\n\nCloses IBM#2201 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(auth): resolve pylint R1716 in Entra group cap logic Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * test(auth): cover Entra Graph fallback edge branches Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(auth): use securityEnabledOnly and enforce timeout upper bound in Graph fallback - Change getMemberObjects securityEnabledOnly from false to true so only security-enabled groups and directory roles are returned (excludes administrative units that could cause unintended role mappings) - Enforce 120s upper bound on graph_api_timeout from provider_metadata overrides, matching the config schema constraint (ge=1, le=120) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(auth): use lazy log formatting and remove defensive getattr in Graph fallback Replace f-string concatenation with %s lazy formatting in the overage warning log and drop unnecessary getattr() calls in sso_bootstrap since the settings fields are defined with defaults on the Settings class. Also improve overage troubleshooting docs and fix stale references. Signed-off-by: Jonathan Springer <jps@s390x.com> * fix(auth): reject null/unsupported types in Graph fallback enabled override Replace blind bool() coercion with explicit int handling and a warning for unrecognised types so that provider_metadata.graph_api_enabled=null no longer silently disables the Graph overage fallback. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com>

* 2986 - Add Read Only Hint to tool rows Signed-off-by: Gabriel Costa <gabrielcg@proton.me> * test(ui): add tests for readOnlyHint annotation icon rendering Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Gabriel Costa <gabrielcg@proton.me> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

* Hardening: safer CORS + localhost bind defaults Signed-off-by: Theodor N. Engøy <theodornengoy@Mac.home> * langchain agent: DRY env parsing + Makefile HOST override Signed-off-by: Theodor N. Engøy <theodornengoy@eduroam-193-157-246-146.wlan.uio.no> * fix(security): harden CORS wildcard guard, validate LOG_LEVEL, add tests - Fix CORS wildcard bypass: check parsed origin list instead of raw string so '*,https://example.com' is caught - Validate LOG_LEVEL against allowed uvicorn levels with fallback - Add 44 differential tests for env_utils and CORS configuration - Remove unused pytest import (Ruff F401) Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Theodor N. Engøy <theodornengoy@Mac.home> Signed-off-by: Theodor N. Engøy <theodornengoy@eduroam-193-157-246-146.wlan.uio.no> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Co-authored-by: Theodor N. Engøy <theodornengoy@Mac.home> Co-authored-by: Theodor N. Engøy <theodornengoy@eduroam-193-157-246-146.wlan.uio.no> Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>

…y paths (IBM#3090) * feat: improve log hygiene across auth and gateway flows - streamline auth/team/sso/gateway log messages for consistency - remove token-derived value details from routine debug/error logs - add regression tests for logging behavior (unit + AST-based checks) - cover oversized SSO callback branch behavior - make middleware overhead timing test more deterministic Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix(ci): make helm-unittest install compatible with plugin verification defaults Use --verify=false when installing helm-unittest in linting-helm-unittest to avoid CI failures with plugin sources that do not provide verification metadata. Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

* docs: rebrand to ContextForge AI Gateway consistently across the project Unify all product naming from the inconsistent mix of "MCP Gateway", "Context Forge", "MCP Context Forge", and "ContextForge MCP Gateway" to consistently use "ContextForge" (or "ContextForge AI Gateway" for the full product name). Updated positioning to reflect all supported gateway patterns: - Tools Gateway (MCP, REST, gRPC, TOON) - Agent Gateway (A2A, OpenAI, Anthropic) - Model Gateway (LLM proxy, OpenAI API spec, 8+ providers) - API Gateway (rate limiting, auth, retries, reverse proxy) - Plugin Extensibility (40+ plugins) - Observability (OpenTelemetry) Preserved all code identifiers: mcpgateway (Python module), mcp-contextforge-gateway (PyPI), mcp-context-forge (GitHub/Docker), MCPGATEWAY_* (env vars), mcpContextForge (Helm), mcp.db (database). Closes IBM#2714 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: address review feedback on ContextForge rebrand - Fix failing playwright test (MCP_Gateway → ContextForge in Swagger title assertion) - Fix "ContextForge (ContextForge)" redundant parenthetical (6 locations) - Fix "ContextForges" wrong plural → "Gateways" or "ContextForge instances" - Fix missed "MCP Context-Forge" hyphenated variant (7 locations) - Fix missed "MCP CONTEXT FORGE" in Makefile header - Fix missed lowercase "MCP context forge" / "Context forge" in toolops, plugins - Drop article "the" before ContextForge (brand names don't take articles) - Fix "the ContextForge's" → "ContextForge's" - Update APP_NAME defaults in run.sh, .env.example, Helm schema, config schema Closes IBM#2714 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * feat: enable LLMCHAT_ENABLED by default Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * docs: update LLMCHAT_ENABLED default to true in docs and charts Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…#3092) Replace the manual `mcpgateway-dev` CE secret with an automated flow that builds `.env.deploy` from `.env.example` + GitHub Secrets (CF_* prefix) on every push to main. Secrets are never logged. Key changes: - Add early validation step (fail-fast before build/push) - Generate .env.deploy via Python (safe for special chars in secrets) - Reject secrets containing embedded newlines - Assert all expected keys were replaced in the template - Update-or-create pattern for CE secret (atomic, no data loss) - Cleanup .env.deploy via trap on exit Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…BM#3093) PR IBM#3091 mechanically replaced "MCP Gateway" with "ContextForge" in several places, creating nonsensical text like "Enterprise ContextForge" and fake product names like "Apigee ContextForge". Fixes: - Title: "Enterprise ContextForge" → "Enterprise AI, Agent and MCP Gateway" - Heading: "Why an ContextForge?" → "Why ContextForge?" - Heading: "ContextForge Landscape" → "MCP Gateway Landscape" - Vendor names: restore Apigee MCP Hub, Azure API Management, Docker MCP Toolkit - Mermaid diagram: "ContextForge Options" → "MCP Gateway Options" - Nav label: "Why use an ContextForge" → "Why use ContextForge" - Roadmap IBM#2272: restore original issue title with "MCP Gateway" - Copilot docs: "An ContextForge running" → "ContextForge running" Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…k handler Signed-off-by: Oriol Morros Vilaseca <OM368@student.aru.ac.uk>

marekdano

@omorros - thanks for the contribution!

✅ Root Cause Correctly Identified & Fixed

The issue was that |tojson filter outputs JSON with double quotes, causing HTML attribute corruption:

Broken output: onclick="fetchToolsForGateway("uuid", "name")"
Browser interpretation: Attribute ends at first internal ", breaking the onclick handler
Fixed output: onclick="fetchToolsForGateway('uuid', 'name')"

✅ Consistency with Working Code

The fix matches the working implementation in admin.html (line 4974), which already uses single quotes correctly:
`html
onclick="fetchToolsForGateway('{{ gateway.id }}', '{{ gateway.name }}')"
``
Tested manually in the UI, and it works as expected.

LGTM 🚀 - This is a clean, focused fix that directly addresses the root cause. The change is consistent with existing working code and follows best practices for Jinja template onclick handlers.

crivetimihai · 2026-02-23T14:23:56Z

Thanks @omorros — clean fix for the broken onclick handler. Matches the pattern used elsewhere in the templates.

crivetimihai · 2026-02-24T09:32:30Z

Reopened as #3179. CI/CD will re-run on the new PR. You are still credited as the author.

araujof and others added 30 commits January 24, 2026 19:33

test: add missing fields required by pydantic validation (IBM#2257)

23b8df8

Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>

make token teams consistent for sso (IBM#2252)

75e2460

Signed-off-by: Madhav Kandukuri <madhav165@gmail.com>

perf: reduce SQLite busy_timeout default to 5s and make configurable (I…

7b6e7f3

…BM#2314) Signed-off-by: Satya <tsp.0713@gmail.com>

Docs update and bug template update

752fbfc

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Docs update and bug template update

6e80822

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

selecting mcp gateway

d032b16

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

prevent ReDoS in plugin regex patterns (IBM#2513)

c0fd884

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

update llms.txt (IBM#2540)

0b8a49c

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

popagruia and others added 13 commits February 20, 2026 16:19

fix: replace tojson with single-quoted literals in Fetch Tools onclic…

36e5bcf

…k handler Signed-off-by: Oriol Morros Vilaseca <OM368@student.aru.ac.uk>

omorros requested a review from crivetimihai as a code owner February 22, 2026 02:35

omorros changed the title ~~fix: replace tojson with single-quoted literals in Fetch Tools onclic…~~ fix: replace tojson with single-quoted literals in Fetch Tools onclick handler Feb 22, 2026

marekdano self-requested a review February 23, 2026 13:07

marekdano approved these changes Feb 23, 2026

View reviewed changes

crivetimihai changed the title ~~fix: replace tojson with single-quoted literals in Fetch Tools onclick handler~~ fix(ui): replace tojson with single-quoted literals in Fetch Tools onclick Feb 23, 2026

crivetimihai added bug Something isn't working ui User Interface SHOULD P2: Important but not vital; high-value items that are not crucial for the immediate release labels Feb 23, 2026

crivetimihai added this to the Release 1.0.0-GA milestone Feb 23, 2026

crivetimihai closed this Feb 24, 2026

crivetimihai force-pushed the main branch from ef37827 to 517ced4 Compare February 24, 2026 08:49

crivetimihai mentioned this pull request Feb 24, 2026

fix(ui): replace tojson with single-quoted literals in Fetch Tools onclick #3179

Closed

5 tasks

This was referenced Feb 25, 2026

fix(ui): admin UI "Show" toggle for gateway tokens, passwords, and header values #3201

Closed

feat(ui): persist admin table filters across HTMX pagination and part… #3204

Closed

omorros mentioned this pull request Mar 12, 2026

feat(ui): persist admin table filters across HTMX pagination and partial refresh #3647

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ui): replace tojson with single-quoted literals in Fetch Tools onclick#3100

fix(ui): replace tojson with single-quoted literals in Fetch Tools onclick#3100
omorros wants to merge 2232 commits intoIBM:mainfrom
omorros:fix/fetch-tools-button-tojson-escaping

omorros commented Feb 22, 2026

Uh oh!

marekdano left a comment

Uh oh!

crivetimihai commented Feb 23, 2026

Uh oh!

crivetimihai commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

omorros commented Feb 22, 2026

🔗 Related Issue

📝 Summary

🏷️ Type of Change

🧪 Verification

✅ Checklist

📓 Notes (optional)

Uh oh!

marekdano left a comment

Choose a reason for hiding this comment

✅ Root Cause Correctly Identified & Fixed

✅ Consistency with Working Code

Uh oh!

crivetimihai commented Feb 23, 2026

Uh oh!

crivetimihai commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants