Skip to content

Fix Orphaned Gateway Registration Bug along with Isolating Structured log Session and Handling Flush Errors#3001

Merged
crivetimihai merged 1 commit intomainfrom
2987_gateway_failure_revert
Feb 17, 2026
Merged

Fix Orphaned Gateway Registration Bug along with Isolating Structured log Session and Handling Flush Errors#3001
crivetimihai merged 1 commit intomainfrom
2987_gateway_failure_revert

Conversation

@kevalmahajan
Copy link
Copy Markdown
Member

@kevalmahajan kevalmahajan commented Feb 17, 2026

🐛 Bug-fix PR

Closes #2987

📌 Summary

Fixes a bug where gateway registration failures (e.g., tool schema validation errors) leave orphaned gateway records in the database, blocking subsequent registration attempts with the same name.

The root cause is missing db.rollback() calls in exception handlers, along with two other contributing issues:

  • the structured logger committing the caller's dirty session.
  • the before_commit_handler silently swallowing flush exceptions.

Key changes:

  • Added db.rollback() to 13 exception handlers across 4 methods in gateway_service.py
  • IMPORTANT ONE: Isolated structured logger DB persistence using fresh_db_session()
  • Made before_commit_handler propagate flush exceptions instead of swallowing them
  • Updated all affected tests to match new behavior

🔁 Reproduction Steps

  1. Register a gateway with a tool that has an invalid schema (e.g., missing required inputSchema field)
  2. Registration fails with a validation error (expected)
  3. Attempt to register the same gateway name again
  4. Bug: Second registration fails with "Gateway name already exists" even though the first registration failed
  5. The orphaned gateway record has no tools associated with it

🐞 Root Cause

Three issues combine to cause the bug:

  1. Primary cause: Missing db.rollback() in exception handlers:
    In register_gateway(), after db.add(db_gateway) and db.flush() succeed (persisting the gateway row), tool validation fails during db.flush() of tool records. The except* handlers catch the error but never call db.rollback(), leaving the gateway object in the session's dirty state.

  2. Contributing factor 1: before_commit_handler swallows exceptions:
    db.py's before_commit_handler wraps session.flush() in a bare except Exception: pass, silently discarding any flush failure during commit. This means even when db.commit() is eventually called, the failed flush doesn't cause the commit to fail.

  3. Contributing factor 2: Structured logger commits caller's session:
    structured_logger._persist_to_database() was called with the caller's db session and called db.add(log_entry) + db.commit() on it. This inadvertently committed the orphaned gateway record along with the log entry.

The chain: db.flush() fails on tools -> exception handler skips rollback -> structured_logger.log(db=db) calls db.commit() on same session -> before_commit_handler swallows re-flush failure -> gateway record persisted without tools -> name permanently reserved.

💡 Fix Description

  1. Added db.rollback() to all exception handlers (gateway_service.py)
    Added explicit db.rollback() calls before any logging or re-raising in all exception handlers across 4 methods:

    • register_gateway - 7 except* handlers (validation errors, integrity errors, connection errors, timeout errors, general exceptions)
    • update_gateway - 3 exception handlers (name conflict, not found, integrity error)
    • set_gateway_state - 1 exception handler (permission error)
    • fetch_tools_after_oauth - 2 exception handlers (connection error, general exception)
      db.rollback() is always safe to call - it's a no-op on a clean session with no pending changes.
  2. Isolated structured logger persistence (structured_logger.py)
    Changed _persist_to_database() to use fresh_db_session() instead of the caller's session. This ensures:

    • Log persistence can never accidentally commit the caller's dirty state
    • Logs are persisted even when the business transaction fails
    • A broken caller session doesn't prevent log persistence
    • Complete session lifecycle isolation (fresh_db_session handles commit/rollback/close)
  3. Propagate flush exceptions in before_commit_handler (db.py)
    Removed the bare except Exception: pass from before_commit_handler. The handler now simply calls session.flush() without catching exceptions, so flush failures properly cause the commit to fail, triggering rollback in get_db().

  4. Updated tests

    • test_structured_logger.py - Updated 7 tests to mock fresh_db_session instead of mock_db; replaced 2 tests with fresh-session-aware equivalents
    • test_logging_service_comprehensive.py - Updated 2 tests for fresh_db_session pattern
    • test_db.py - Updated before_commit_handler test to verify exception propagation instead of swallowing

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 80 % make coverage
Manual regression no longer fails steps / screenshots

📐 MCP Compliance (if relevant)

  • Matches current MCP spec
  • No breaking change to MCP clients

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • No secrets/credentials committed

@kevalmahajan kevalmahajan added the wxo wxo integration label Feb 17, 2026
@kevalmahajan kevalmahajan force-pushed the 2987_gateway_failure_revert branch from 6a41c5e to a2e8a6e Compare February 17, 2026 13:41
@crivetimihai crivetimihai force-pushed the 2987_gateway_failure_revert branch from a2e8a6e to 9aa2761 Compare February 17, 2026 14:27
…gger session

- Add db.rollback() to exception handlers in gateway_service.py to
  prevent orphaned gateway records when registration fails after flush
- Isolate structured logger DB persistence using fresh_db_session() to
  avoid accidentally committing caller's dirty session state
- Propagate flush exceptions in before_commit_handler instead of
  swallowing them so commit failures trigger proper rollback
- Remove db parameter from structured_logger.log() signature since
  the logger now manages its own independent session
- Update tests to verify new session isolation and exception propagation

Closes #2987

Signed-off-by: Keval Mahajan <mahajankeval23@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the 2987_gateway_failure_revert branch from 9aa2761 to 5c48bb6 Compare February 17, 2026 14:37
Copy link
Copy Markdown
Member

@crivetimihai crivetimihai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Approved

The core fix is correct and well-reasoned. The root cause chain — missing db.rollback() → structured logger committing caller's dirty session → before_commit_handler swallowing flush errors — is accurately identified and all three links are properly addressed.

Changes made during rebase

Squashed the 4 original commits into a single clean commit preserving original author, and applied the following fixes:

1. Completed db=db cleanup (12 remaining call sites)

The original PR removed db=db from structured_logger.log() callers in several files but missed 12 call sites across 5 files. Since log() now accepts **kwargs, leftover db=db would leak a SQLAlchemy Session object into the log entry dict via entry.update(kwargs). Removed from:

  • mcpgateway/admin.py — 3 calls (plugin marketplace error handlers)
  • mcpgateway/services/gateway_service.py — 4 calls (get_gateway, delete_gateway success + exception handlers)
  • mcpgateway/services/prompt_service.py — 2 calls (bulk registration, get_prompt)
  • mcpgateway/services/resource_service.py — 2 calls (bulk registration, get_resource)
  • mcpgateway/services/tool_service.py — 1 call (get_tool)

Verified zero remaining via AST scan.

2. Fixed stale rollback test + added missing rollback assertions

  • test_register_gateway_exception_rollback: Updated stale comment ("method doesn't actually call rollback") and changed assert_called()assert_called_once() for strictness.
  • Added test_db.rollback = Mock() setup and test_db.rollback.assert_called_once() assertions to test_register_gateway_value_error, test_register_gateway_runtime_error, and test_register_gateway_integrity_error — these exercise the core bug fix paths.

3. Removed vacuous test assertions

Two tests (test_logging_service_comprehensive.py, test_structured_logger.py) created caller_db = MagicMock() but never passed it to _persist_to_database(), making caller_db.add.assert_not_called() always-true. Replaced with mock_fresh.assert_called_once() to validate fresh session usage.

What was verified

  • All 482+ affected tests pass (gateway_service, structured_logger, logging_service_comprehensive, test_db)
  • AST scan confirms zero structured_logger calls with db=db remain
  • fresh_db_session() correctly auto-commits via get_db() context manager — no explicit commit() needed
  • before_commit_handler exception propagation is safe — get_db() has proper nested try/except/finally with rollback + invalidate fallback

@crivetimihai crivetimihai merged commit ca07c47 into main Feb 17, 2026
54 checks passed
@crivetimihai crivetimihai deleted the 2987_gateway_failure_revert branch February 17, 2026 15:06
vishu-bh pushed a commit that referenced this pull request Feb 18, 2026
…gger session (#3001)

- Add db.rollback() to exception handlers in gateway_service.py to
  prevent orphaned gateway records when registration fails after flush
- Isolate structured logger DB persistence using fresh_db_session() to
  avoid accidentally committing caller's dirty session state
- Propagate flush exceptions in before_commit_handler instead of
  swallowing them so commit failures trigger proper rollback
- Remove db parameter from structured_logger.log() signature since
  the logger now manages its own independent session
- Update tests to verify new session isolation and exception propagation

Closes #2987

Signed-off-by: Keval Mahajan <mahajankeval23@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Vishu Bhatnagar <vishu.bhatnagar@ibm.com>
cafalchio pushed a commit that referenced this pull request Feb 26, 2026
…gger session (#3001)

- Add db.rollback() to exception handlers in gateway_service.py to
  prevent orphaned gateway records when registration fails after flush
- Isolate structured logger DB persistence using fresh_db_session() to
  avoid accidentally committing caller's dirty session state
- Propagate flush exceptions in before_commit_handler instead of
  swallowing them so commit failures trigger proper rollback
- Remove db parameter from structured_logger.log() signature since
  the logger now manages its own independent session
- Update tests to verify new session isolation and exception propagation

Closes #2987

Signed-off-by: Keval Mahajan <mahajankeval23@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wxo wxo integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: When a toolkit import fails for any reason, subsequent attempts to import a toolkit with the same tool name are blocked

2 participants