fix(db): Guard against inactive transaction during async cleanup#2351
Merged
crivetimihai merged 6 commits intomainfrom Jan 23, 2026
Merged
fix(db): Guard against inactive transaction during async cleanup#2351crivetimihai merged 6 commits intomainfrom
crivetimihai merged 6 commits intomainfrom
Conversation
When registering MCP servers with long initialization times (like Moody's), a CancelledError can occur during the MCP session teardown (DELETE request). This causes the database transaction to become inactive before get_db() attempts to commit, resulting in: sqlalchemy.exc.InvalidRequestError: This transaction is inactive Add db.is_active checks before commit() and rollback() to handle cases where the transaction becomes inactive during async context manager cleanup. Closes #2341 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Member
Author
Root Cause AnalysisTimeline from Logs
The TriggerThe logs show a Sequence of Events
Moody's Server Details
|
…straint violations When re-registering a gateway (e.g., after deletion or crash), orphaned resources and prompts from previous registrations could cause unique constraint violations on `(team_id, owner_email, uri)` for resources and `(team_id, owner_email, name)` for prompts. This fix adds upsert logic that: 1. Queries for existing resources/prompts matching the unique constraint 2. Updates existing records instead of creating duplicates 3. Creates new records only when no match exists This handles scenarios like: - Gateway deletion that didn't properly clean up resources (issue #2341) - Re-registration of the same MCP server under a new gateway name - Race conditions during concurrent operations Closes #2352 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
This was referenced Jan 23, 2026
Closed
Adds a utility script to identify and remove database records that were left orphaned due to incomplete gateway deletions (e.g., #2341 crash). Usage: # Dry run (default) - shows what would be deleted python scripts/cleanup_orphaned_resources.py # Actually delete orphaned records python scripts/cleanup_orphaned_resources.py --execute # Filter by team or owner python scripts/cleanup_orphaned_resources.py --team-id <id> --owner-email <email> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Addresses code review findings: 1. HIGH: Only update truly orphaned records (gateway_id IS NULL or points to non-existent gateway). Resources belonging to active gateways are no longer at risk of being reassigned. 2. MEDIUM: Use per-resource team/owner overrides when building lookup key, matching exactly what would be inserted to avoid constraint mismatches. 3. LOW: Added tests for orphaned resource upsert logic: - test_register_gateway_updates_orphaned_resources - test_register_gateway_does_not_update_active_gateway_resources Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Addresses remaining code review findings:
1. ROLLBACK GUARD FIX:
- REMOVED the `if db.is_active:` check before `db.rollback()`
- Empirical testing proved that:
* After IntegrityError, is_active becomes False
* rollback() when is_active=False SUCCEEDS (doesn't fail!)
* rollback() restores is_active to True, cleaning up the session
* Skipping rollback when is_active=False leaves session unusable
- The is_active guard for commit() is CORRECT (commit fails when False)
- The is_active guard for rollback() was WRONG (rollback is always safe)
2. TEST ASSERTIONS:
- Rewrote orphaned resource tests with proper assertions
- Tests now directly verify:
* Orphaned resources are detected and added to map
* Resource fields are actually updated during upsert
* Resources with deleted gateways are detected as orphaned
* Resources with active gateways are NOT touched
* Per-resource owner/team overrides are used in lookup key
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
4 tasks
kcostell06
pushed a commit
to kcostell06/mcp-context-forge
that referenced
this pull request
Feb 24, 2026
…#2351) * fix(db): Guard against inactive transaction during async cleanup When registering MCP servers with long initialization times (like Moody's), a CancelledError can occur during the MCP session teardown (DELETE request). This causes the database transaction to become inactive before get_db() attempts to commit, resulting in: sqlalchemy.exc.InvalidRequestError: This transaction is inactive Add db.is_active checks before commit() and rollback() to handle cases where the transaction becomes inactive during async context manager cleanup. Closes IBM#2341 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Add upsert logic for resources and prompts to prevent unique constraint violations When re-registering a gateway (e.g., after deletion or crash), orphaned resources and prompts from previous registrations could cause unique constraint violations on `(team_id, owner_email, uri)` for resources and `(team_id, owner_email, name)` for prompts. This fix adds upsert logic that: 1. Queries for existing resources/prompts matching the unique constraint 2. Updates existing records instead of creating duplicates 3. Creates new records only when no match exists This handles scenarios like: - Gateway deletion that didn't properly clean up resources (issue IBM#2341) - Re-registration of the same MCP server under a new gateway name - Race conditions during concurrent operations Closes IBM#2352 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: Add cleanup script for orphaned resources/prompts/tools Adds a utility script to identify and remove database records that were left orphaned due to incomplete gateway deletions (e.g., IBM#2341 crash). Usage: # Dry run (default) - shows what would be deleted python scripts/cleanup_orphaned_resources.py # Actually delete orphaned records python scripts/cleanup_orphaned_resources.py --execute # Filter by team or owner python scripts/cleanup_orphaned_resources.py --team-id <id> --owner-email <email> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Only upsert orphaned resources/prompts, add tests Addresses code review findings: 1. HIGH: Only update truly orphaned records (gateway_id IS NULL or points to non-existent gateway). Resources belonging to active gateways are no longer at risk of being reassigned. 2. MEDIUM: Use per-resource team/owner overrides when building lookup key, matching exactly what would be inserted to avoid constraint mismatches. 3. LOW: Added tests for orphaned resource upsert logic: - test_register_gateway_updates_orphaned_resources - test_register_gateway_does_not_update_active_gateway_resources Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * fix: Always call rollback() in exception handler, improve tests Addresses remaining code review findings: 1. ROLLBACK GUARD FIX: - REMOVED the `if db.is_active:` check before `db.rollback()` - Empirical testing proved that: * After IntegrityError, is_active becomes False * rollback() when is_active=False SUCCEEDS (doesn't fail!) * rollback() restores is_active to True, cleaning up the session * Skipping rollback when is_active=False leaves session unusable - The is_active guard for commit() is CORRECT (commit fails when False) - The is_active guard for rollback() was WRONG (rollback is always safe) 2. TEST ASSERTIONS: - Rewrote orphaned resource tests with proper assertions - Tests now directly verify: * Orphaned resources are detected and added to map * Resource fields are actually updated during upsert * Resources with deleted gateways are detected as orphaned * Resources with active gateways are NOT touched * Per-resource owner/team overrides are used in lookup key Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> * chore: Add file encoding header to cleanup script Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> --------- Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes two related issues affecting gateway registration with Moody's MCP server:
InvalidRequestError: This transaction is inactiveafter successfully fetching tools/resourcesUniqueViolationonuq_team_owner_uri_resourceRoot Cause Analysis
Issue #2341: Transaction Inactive During Async Cleanup
Timeline from logs:
Fetched 10 tools from gateway✓Fetched 53 resources from gateway✓Fetched 1 resource templates from gateway✓DELETE .../mcp "HTTP/1.1 204 No Content"✓InvalidRequestError('This transaction is inactive')✗Root Cause:
CancelledErrorduring MCP session cleanup (DELETE request) causes the transaction to become inactive. Thenget_db()tries tocommit()on the inactive transaction.Fix: Add
db.is_activechecks beforecommit()androllback()inget_db().Issue #2352: Unique Constraint Violation on Re-Registration
The Constraint:
Failure Sequence:
(team_id, owner_email, uri)Fix: Add upsert logic for orphaned resources and prompts only:
gateway_id IS NULLor points to non-existent gatewayChanges
mcpgateway/main.pydb.is_activeguard beforecommit()inget_db()db.is_activeguard beforerollback()in exception handlerdb.invalidate()fallback for broken connectionsmcpgateway/services/gateway_service.pyteam_id, owner_email, uri)team_id, owner_email, name)gateway_id IS NULLor gateway doesn't existscripts/cleanup_orphaned_resources.py(NEW)tests/unit/mcpgateway/services/test_gateway_resources_prompts.pytest_register_gateway_updates_orphaned_resourcestest_register_gateway_does_not_update_active_gateway_resourcesManual Cleanup (For Affected Users)
If you experienced the #2341 crash, you may have orphaned resources/prompts/tools in your database. A cleanup script is provided:
Note: The upsert fix will handle orphaned records automatically on re-registration (they get reassigned to the new gateway). The cleanup script is only needed if you want to proactively remove orphaned records without re-registering.
Code Review Findings (Addressed)
gateway_id IS NULLor gateway doesn't existteam_idandowner_emailoverridesTest Plan
Closes #2341
Closes #2352