fix(db): Guard against inactive transaction during async cleanup by crivetimihai · Pull Request #2351 · IBM/mcp-context-forge

crivetimihai · 2026-01-23T14:49:15Z

Summary

This PR fixes two related issues affecting gateway registration with Moody's MCP server:

[BUG]: MCP CF crashes while listing tools from moody's mcp server #2341 - Transaction Inactive Crash: Gateway crashes with InvalidRequestError: This transaction is inactive after successfully fetching tools/resources
[BUG]: Multiple gateway import failing with unique constraint error for resources #2352 - Unique Constraint Violation: Re-adding a gateway fails with UniqueViolation on uq_team_owner_uri_resource

Root Cause Analysis

Issue #2341: Transaction Inactive During Async Cleanup

Timeline from logs:

Timestamp	Event
09:42:54.938Z	`Fetched 10 tools from gateway` ✓
09:42:55.506Z	`Fetched 53 resources from gateway` ✓
09:42:56.024Z	`Fetched 1 resource templates from gateway` ✓
09:42:56.378Z	`DELETE .../mcp "HTTP/1.1 204 No Content"` ✓
09:42:56.416Z	`InvalidRequestError('This transaction is inactive')` ✗

Root Cause: CancelledError during MCP session cleanup (DELETE request) causes the transaction to become inactive. Then get_db() tries to commit() on the inactive transaction.

Fix: Add db.is_active checks before commit() and rollback() in get_db().

Issue #2352: Unique Constraint Violation on Re-Registration

The Constraint:

# Resources unique on (team_id, owner_email, uri) - NOT including gateway_id
UniqueConstraint("team_id", "owner_email", "uri", name="uq_team_owner_uri_resource")

Failure Sequence:

Gateway A registered → resources created with (team_id, owner_email, uri)
Gateway A deleted → but [BUG]: MCP CF crashes while listing tools from moody's mcp server #2341 crash leaves orphaned resources
Gateway B registered (same MCP server) → tries to INSERT same resources → CONFLICT

Fix: Add upsert logic for orphaned resources and prompts only:

Query for existing records where gateway_id IS NULL or points to non-existent gateway
Update orphaned records instead of creating duplicates
Resources belonging to active gateways are NOT touched

Changes

`mcpgateway/main.py`

Added db.is_active guard before commit() in get_db()
Added db.is_active guard before rollback() in exception handler
Added db.invalidate() fallback for broken connections

`mcpgateway/services/gateway_service.py`

Added upsert logic for orphaned resources only (query by team_id, owner_email, uri)
Added upsert logic for orphaned prompts only (query by team_id, owner_email, name)
Uses per-resource team/owner overrides when building lookup key
Only updates records where gateway_id IS NULL or gateway doesn't exist

`scripts/cleanup_orphaned_resources.py` (NEW)

Cleanup script to remove orphaned records from affected databases

`tests/unit/mcpgateway/services/test_gateway_resources_prompts.py`

Added test_register_gateway_updates_orphaned_resources
Added test_register_gateway_does_not_update_active_gateway_resources

Manual Cleanup (For Affected Users)

If you experienced the #2341 crash, you may have orphaned resources/prompts/tools in your database. A cleanup script is provided:

# Dry run (default) - shows what would be deleted
python scripts/cleanup_orphaned_resources.py

# Actually delete orphaned records
python scripts/cleanup_orphaned_resources.py --execute

# Filter by team or owner
python scripts/cleanup_orphaned_resources.py --team-id <team_id>
python scripts/cleanup_orphaned_resources.py --owner-email <email>

Note: The upsert fix will handle orphaned records automatically on re-registration (they get reassigned to the new gateway). The cleanup script is only needed if you want to proactively remove orphaned records without re-registering.

Code Review Findings (Addressed)

Finding	Severity	Status
Upsert updates any existing resource without verifying it's orphaned	High	✅ Fixed - now only updates records where `gateway_id IS NULL` or gateway doesn't exist
Resource lookup uses gateway-level keys but inserts use per-resource overrides	Medium	✅ Fixed - lookup now uses per-resource `team_id` and `owner_email` overrides
No tests for upsert logic	Low	✅ Fixed - added 2 tests for orphaned resource handling

Test Plan

Run gateway service tests: 75 passed
Run gateway resources/prompts tests: 6 passed
Run all service tests: 1795 passed, 34 skipped
Integration test with Moody's MCP server

Closes #2341
Closes #2352

When registering MCP servers with long initialization times (like Moody's), a CancelledError can occur during the MCP session teardown (DELETE request). This causes the database transaction to become inactive before get_db() attempts to commit, resulting in: sqlalchemy.exc.InvalidRequestError: This transaction is inactive Add db.is_active checks before commit() and rollback() to handle cases where the transaction becomes inactive during async context manager cleanup. Closes #2341 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

crivetimihai · 2026-01-23T14:49:30Z

Root Cause Analysis

Timeline from Logs

Timestamp	Event
09:42:54.938Z	`Fetched 10 tools from gateway` ✓
09:42:55.506Z	`Fetched 53 resources from gateway` ✓
09:42:56.024Z	`Fetched 1 resource templates from gateway` ✓
09:42:56.378Z	`DELETE https://screening-ai-services.npe.moodys.cloud/mcp "204 No Content"` ✓
09:42:56.416Z	`InvalidRequestError('This transaction is inactive')` ✗
09:42:56.418Z	`POST /gateways HTTP/1.1" 500` ✗

The Trigger

The logs show a CancelledError during MCP session cleanup:

receive_response_body.failed exception=CancelledError("Cancelled via cancel scope ... by
<Task pending name='starlette.middleware.base.BaseHTTPMiddleware.__call__.<locals>.call_next.<locals>.coro' ...")

Sequence of Events

Request starts → get_db() creates session, yields it
Gateway registration → Tools/resources fetched successfully
MCP session cleanup → DELETE request sent
CancelledError → During response body reception
Exception propagates → Transaction implicitly rolled back
FastAPI cleanup → get_db() resumes after yield
Commit attempted → Transaction already inactive → CRASH

Moody's Server Details

Tools: 10 (getLegalDisclaimers, getAuthStatus, screenAndWait, etc.)
Resources: 53
Resource Templates: 1
Protocol Version: 2025-11-25

…straint violations When re-registering a gateway (e.g., after deletion or crash), orphaned resources and prompts from previous registrations could cause unique constraint violations on `(team_id, owner_email, uri)` for resources and `(team_id, owner_email, name)` for prompts. This fix adds upsert logic that: 1. Queries for existing resources/prompts matching the unique constraint 2. Updates existing records instead of creating duplicates 3. Creates new records only when no match exists This handles scenarios like: - Gateway deletion that didn't properly clean up resources (issue #2341) - Re-registration of the same MCP server under a new gateway name - Race conditions during concurrent operations Closes #2352 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Adds a utility script to identify and remove database records that were left orphaned due to incomplete gateway deletions (e.g., #2341 crash). Usage: # Dry run (default) - shows what would be deleted python scripts/cleanup_orphaned_resources.py # Actually delete orphaned records python scripts/cleanup_orphaned_resources.py --execute # Filter by team or owner python scripts/cleanup_orphaned_resources.py --team-id <id> --owner-email <email> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Addresses code review findings: 1. HIGH: Only update truly orphaned records (gateway_id IS NULL or points to non-existent gateway). Resources belonging to active gateways are no longer at risk of being reassigned. 2. MEDIUM: Use per-resource team/owner overrides when building lookup key, matching exactly what would be inserted to avoid constraint mismatches. 3. LOW: Added tests for orphaned resource upsert logic: - test_register_gateway_updates_orphaned_resources - test_register_gateway_does_not_update_active_gateway_resources Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Addresses remaining code review findings: 1. ROLLBACK GUARD FIX: - REMOVED the `if db.is_active:` check before `db.rollback()` - Empirical testing proved that: * After IntegrityError, is_active becomes False * rollback() when is_active=False SUCCEEDS (doesn't fail!) * rollback() restores is_active to True, cleaning up the session * Skipping rollback when is_active=False leaves session unusable - The is_active guard for commit() is CORRECT (commit fails when False) - The is_active guard for rollback() was WRONG (rollback is always safe) 2. TEST ASSERTIONS: - Rewrote orphaned resource tests with proper assertions - Tests now directly verify: * Orphaned resources are detected and added to map * Resource fields are actually updated during upsert * Resources with deleted gateways are detected as orphaned * Resources with active gateways are NOT touched * Per-resource owner/team overrides are used in lookup key Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

…#2351) * fix(db): Guard against inactive transaction during async cleanup When registering MCP servers with long initialization times (like Moody's), a CancelledError can occur during the MCP session teardown (DELETE request). This causes the database transaction to become inactive before get_db() attempts to commit, resulting in: sqlalchemy.exc.InvalidRequestError: This transaction is inactive Add db.is_active checks before commit() and rollback() to handle cases where the transaction becomes inactive during async context manager cleanup. Closes IBM#2341 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Add upsert logic for resources and prompts to prevent unique constraint violations When re-registering a gateway (e.g., after deletion or crash), orphaned resources and prompts from previous registrations could cause unique constraint violations on `(team_id, owner_email, uri)` for resources and `(team_id, owner_email, name)` for prompts. This fix adds upsert logic that: 1. Queries for existing resources/prompts matching the unique constraint 2. Updates existing records instead of creating duplicates 3. Creates new records only when no match exists This handles scenarios like: - Gateway deletion that didn't properly clean up resources (issue IBM#2341) - Re-registration of the same MCP server under a new gateway name - Race conditions during concurrent operations Closes IBM#2352 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: Add cleanup script for orphaned resources/prompts/tools Adds a utility script to identify and remove database records that were left orphaned due to incomplete gateway deletions (e.g., IBM#2341 crash). Usage: # Dry run (default) - shows what would be deleted python scripts/cleanup_orphaned_resources.py # Actually delete orphaned records python scripts/cleanup_orphaned_resources.py --execute # Filter by team or owner python scripts/cleanup_orphaned_resources.py --team-id <id> --owner-email <email> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Only upsert orphaned resources/prompts, add tests Addresses code review findings: 1. HIGH: Only update truly orphaned records (gateway_id IS NULL or points to non-existent gateway). Resources belonging to active gateways are no longer at risk of being reassigned. 2. MEDIUM: Use per-resource team/owner overrides when building lookup key, matching exactly what would be inserted to avoid constraint mismatches. 3. LOW: Added tests for orphaned resource upsert logic: - test_register_gateway_updates_orphaned_resources - test_register_gateway_does_not_update_active_gateway_resources Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Always call rollback() in exception handler, improve tests Addresses remaining code review findings: 1. ROLLBACK GUARD FIX: - REMOVED the `if db.is_active:` check before `db.rollback()` - Empirical testing proved that: * After IntegrityError, is_active becomes False * rollback() when is_active=False SUCCEEDS (doesn't fail!) * rollback() restores is_active to True, cleaning up the session * Skipping rollback when is_active=False leaves session unusable - The is_active guard for commit() is CORRECT (commit fails when False) - The is_active guard for rollback() was WRONG (rollback is always safe) 2. TEST ASSERTIONS: - Rewrote orphaned resource tests with proper assertions - Tests now directly verify: * Orphaned resources are detected and added to map * Resource fields are actually updated during upsert * Resources with deleted gateways are detected as orphaned * Resources with active gateways are NOT touched * Per-resource owner/team overrides are used in lookup key Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: Add file encoding header to cleanup script Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

crivetimihai added 2 commits January 23, 2026 15:15

crivetimihai requested review from kevalmahajan and madhav165 as code owners January 23, 2026 15:39

crivetimihai added 2 commits January 23, 2026 15:48

chore: Add file encoding header to cleanup script

5b3e9f7

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

crivetimihai merged commit 5f14122 into main Jan 23, 2026
53 checks passed

crivetimihai deleted the 2341-mcp-issue branch January 23, 2026 17:06

kevalmahajan mentioned this pull request Jan 26, 2026

Consistent duplicate constraints/working in tools, prompts and resources #2507

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): Guard against inactive transaction during async cleanup#2351

fix(db): Guard against inactive transaction during async cleanup#2351
crivetimihai merged 6 commits intomainfrom
2341-mcp-issue

crivetimihai commented Jan 23, 2026 •

edited

Loading

Uh oh!

crivetimihai commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crivetimihai commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause Analysis

Issue #2341: Transaction Inactive During Async Cleanup

Issue #2352: Unique Constraint Violation on Re-Registration

Changes

mcpgateway/main.py

mcpgateway/services/gateway_service.py

scripts/cleanup_orphaned_resources.py (NEW)

tests/unit/mcpgateway/services/test_gateway_resources_prompts.py

Manual Cleanup (For Affected Users)

Code Review Findings (Addressed)

Test Plan

Uh oh!

crivetimihai commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause Analysis

Timeline from Logs

The Trigger

Sequence of Events

Moody's Server Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crivetimihai commented Jan 23, 2026 •

edited

Loading

`mcpgateway/main.py`

`mcpgateway/services/gateway_service.py`

`scripts/cleanup_orphaned_resources.py` (NEW)

`tests/unit/mcpgateway/services/test_gateway_resources_prompts.py`

crivetimihai commented Jan 23, 2026 •

edited

Loading