-
Notifications
You must be signed in to change notification settings - Fork 614
[TESTING][UPGRADE]: Version Upgrades, Database Migrations, and Rollback Procedures #2474
Copy link
Copy link
Open
Open
Copy link
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)
Milestone
Description
[TESTING][UPGRADE]: Version Upgrades, Database Migrations, and Rollback Procedures
Goal
Produce a comprehensive manual test plan for validating smooth upgrade paths from previous versions to GA, including database migrations, rolling upgrades, and rollback procedures.
Why Now?
Upgrade testing is critical for production operations:
- User Trust: Failed upgrades erode confidence in the platform
- Data Integrity: Migrations must preserve all user data
- Zero Downtime: Kubernetes upgrades must not interrupt service
- Rollback Safety: Must have tested fallback when issues arise
- Migration Chain: All intermediate migrations must apply correctly
User Stories
US-1: Admin - Smooth Upgrade
As an administrator
I want to upgrade from RC1 to GA smoothly
So that I can adopt new features without disruption
Acceptance Criteria:
Feature: Version Upgrade
Scenario: RC1 to GA upgrade
Given a running RC1 gateway with data
When I perform the upgrade to GA
Then all existing data should be preserved
And all services should continue functioning
And new features should be availableUS-2: DBA - Safe Migrations
As a database administrator
I want Alembic migrations to be idempotent
So that I can safely retry failed migrations
Acceptance Criteria:
Feature: Database Migrations
Scenario: Idempotent migration
Given a migration that was partially applied
When I run the migration again
Then it should complete without errors
And the database should be in correct stateArchitecture
UPGRADE FLOW
+------------------------------------------------------------------------+
| |
| Current State Migration Target State |
| ------------- --------- ------------ |
| |
| +-------------+ +-----------+ +-------------+ |
| | RC1 | | Alembic | | GA | |
| | Gateway | ---> | Migration | ---> | Gateway | |
| | + Data | | Chain | | + Data | |
| +-------------+ +-----------+ +-------------+ |
| | | | |
| v v v |
| +---------+ +---------+ +---------+ |
| | DB v1 | -----> | DB v2 | -----> | DB v3 | |
| +---------+ +---------+ +---------+ |
| |
| Rollback Path: |
| +-------------+ +-----------+ +-------------+ |
| | GA | | Alembic | | RC1 | |
| | Gateway | ---> | Downgrade | ---> | Gateway | |
| +-------------+ +-----------+ +-------------+ |
| |
+------------------------------------------------------------------------+
Test Environment Setup
# Set up test database with RC1 data
export DATABASE_URL="postgresql://user:pass@localhost/upgrade_test"
export RC1_VERSION="1.0.0-rc1"
export GA_VERSION="1.0.0"
# Create backup before upgrade
pg_dump $DATABASE_URL > pre_upgrade_backup.sql
# Clone repo at RC1 version
git clone --branch v$RC1_VERSION https://github.com/org/mcpgateway rc1_gateway
cd rc1_gateway && make install && make serve &
# Create test data
export GATEWAY_URL="http://localhost:8000"
export TOKEN=$(python -m mcpgateway.utils.create_jwt_token --username admin@example.com --secret "$JWT_SECRET")Manual Test Cases
| Case | Scenario | From | To | Expected Result |
|---|---|---|---|---|
| UPG-01 | RC1 to GA | 1.0.0-RC1 | 1.0.0-GA | All data preserved |
| UPG-02 | Fresh install | None | 1.0.0-GA | Clean setup works |
| UPG-03 | Populated migration | Full DB | 1.0.0-GA | All entities migrated |
| UPG-04 | Rollback | 1.0.0-GA | 1.0.0-RC1 | Graceful downgrade |
| UPG-05 | Skip version | 0.9.x | 1.0.0-GA | Migration chain works |
| UPG-06 | Rolling K8s upgrade | RC1 pods | GA pods | Zero downtime |
| UPG-07 | Idempotent migration | Partial | Complete | No errors on retry |
UPG-01: RC1 to GA Upgrade
Pre-Upgrade Data Setup:
# Create entities in RC1
curl -s -X POST "$GATEWAY_URL/gateways" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "test-server-1", "url": "http://localhost:9000"}'
curl -s -X POST "$GATEWAY_URL/api/teams" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "upgrade-test-team"}'
# Record counts
GATEWAY_COUNT=$(curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length')
TEAM_COUNT=$(curl -s "$GATEWAY_URL/api/teams" -H "Authorization: Bearer $TOKEN" | jq 'length')
echo "Pre-upgrade: $GATEWAY_COUNT gateways, $TEAM_COUNT teams"Upgrade Steps:
# Stop RC1 gateway
pkill -f "mcpgateway"
# Switch to GA version
cd ../ga_gateway # or: git checkout v1.0.0
# Run migrations
cd mcpgateway && alembic upgrade head
# Start GA gateway
make serve &
sleep 10Post-Upgrade Validation:
# Verify counts match
NEW_GATEWAY_COUNT=$(curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length')
NEW_TEAM_COUNT=$(curl -s "$GATEWAY_URL/api/teams" -H "Authorization: Bearer $TOKEN" | jq 'length')
[ "$GATEWAY_COUNT" = "$NEW_GATEWAY_COUNT" ] && echo "PASS: Gateway count matches" || echo "FAIL"
[ "$TEAM_COUNT" = "$NEW_TEAM_COUNT" ] && echo "PASS: Team count matches" || echo "FAIL"
# Verify functionality
curl -s -X POST "$GATEWAY_URL/mcp/http" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}' | jq '.result.tools | length'Expected Result:
- All entity counts match pre-upgrade
- All API endpoints respond correctly
- New GA features are available
UPG-04: Rollback Procedure
Trigger Rollback:
# Simulate issue discovered post-upgrade
# Stop GA gateway
pkill -f "mcpgateway"
# Downgrade database (if supported)
cd mcpgateway && alembic downgrade -1
# Start RC1 gateway
cd ../rc1_gateway && make serve &
sleep 10Validate Rollback:
# Verify service restored
curl -s "$GATEWAY_URL/health" | jq '.status'
# Verify data accessible
curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length'Expected Result:
- Gateway starts successfully on RC1
- All data remains accessible
- No data corruption
UPG-06: Kubernetes Rolling Upgrade
Helm Upgrade:
# Update values with new image tag
helm upgrade mcpgateway ./charts/mcpgateway \
--set image.tag=$GA_VERSION \
--set strategy.type=RollingUpdate \
--set strategy.rollingUpdate.maxUnavailable=0 \
--set strategy.rollingUpdate.maxSurge=1
# Monitor rollout
kubectl rollout status deployment/mcpgateway -wZero-Downtime Validation:
# Run continuous health check during upgrade
while true; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GATEWAY_URL/health")
[ "$STATUS" != "200" ] && echo "DOWNTIME DETECTED at $(date)" && break
sleep 1
done &
# Perform upgrade (above helm command)
# After upgrade completes, kill the health check loopExpected Result:
- All health checks return 200 throughout upgrade
- Old pods drain gracefully
- New pods pass readiness probes before receiving traffic
UPG-07: Idempotent Migration Test
Test Partial Application:
# Simulate partial migration (manually interrupt)
cd mcpgateway && timeout 2 alembic upgrade head || true
# Retry migration
alembic upgrade head
# Verify success
alembic currentExpected Result:
- Migration completes without errors on retry
- Database is in correct final state
- No duplicate constraints or errors
Test Matrix
| Upgrade Path | Database | Pre-data | Rolling | Rollback | Pass Criteria |
|---|---|---|---|---|---|
| RC1 -> GA | PostgreSQL | Yes | N/A | N/A | Data preserved |
| RC1 -> GA | SQLite | Yes | N/A | N/A | Data preserved |
| Fresh -> GA | PostgreSQL | No | N/A | N/A | Clean install |
| GA -> RC1 | PostgreSQL | Yes | N/A | Yes | Rollback works |
| RC1 -> GA | PostgreSQL | Yes | K8s | N/A | Zero downtime |
Success Criteria
- RC1 to GA upgrade preserves all data
- Fresh GA install works correctly
- All migrations are idempotent
- Rollback procedure tested and documented
- Kubernetes rolling upgrade has zero downtime
- Upgrade runbook documented with timing
- Skip-version upgrades work (migration chain)
Related Files
mcpgateway/alembic/- Database migrationsmcpgateway/alembic/versions/- Individual migration scriptscharts/mcpgateway/- Helm chart for K8s upgradesCHANGELOG.md- Version history
Related Issues
- [TESTING][DEPLOYMENT]: Docker, Docker Compose, Kubernetes/Helm, and Bare Metal Installation #2475 - Deployment methods
- [TESTING][OPERATIONS]: Backup and Restore Manual Test Plan (SQLite, PostgreSQL, Disaster Recovery) #2459 - Backup and restore
- [TESTING][RESILIENCE]: PostgreSQL Resilience Manual Test Plan (Connection Loss, Failover, Recovery) #2466 - PostgreSQL resilience
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)