Skip to content

[TESTING][UPGRADE]: Version Upgrades, Database Migrations, and Rollback Procedures #2474

@crivetimihai

Description

@crivetimihai

[TESTING][UPGRADE]: Version Upgrades, Database Migrations, and Rollback Procedures

Goal

Produce a comprehensive manual test plan for validating smooth upgrade paths from previous versions to GA, including database migrations, rolling upgrades, and rollback procedures.

Why Now?

Upgrade testing is critical for production operations:

  1. User Trust: Failed upgrades erode confidence in the platform
  2. Data Integrity: Migrations must preserve all user data
  3. Zero Downtime: Kubernetes upgrades must not interrupt service
  4. Rollback Safety: Must have tested fallback when issues arise
  5. Migration Chain: All intermediate migrations must apply correctly

User Stories

US-1: Admin - Smooth Upgrade

As an administrator
I want to upgrade from RC1 to GA smoothly
So that I can adopt new features without disruption

Acceptance Criteria:

Feature: Version Upgrade

  Scenario: RC1 to GA upgrade
    Given a running RC1 gateway with data
    When I perform the upgrade to GA
    Then all existing data should be preserved
    And all services should continue functioning
    And new features should be available
US-2: DBA - Safe Migrations

As a database administrator
I want Alembic migrations to be idempotent
So that I can safely retry failed migrations

Acceptance Criteria:

Feature: Database Migrations

  Scenario: Idempotent migration
    Given a migration that was partially applied
    When I run the migration again
    Then it should complete without errors
    And the database should be in correct state

Architecture

                      UPGRADE FLOW
+------------------------------------------------------------------------+
|                                                                        |
|   Current State         Migration            Target State              |
|   -------------         ---------            ------------              |
|                                                                        |
|   +-------------+      +-----------+      +-------------+              |
|   |  RC1        |      | Alembic   |      |  GA         |              |
|   |  Gateway    | ---> | Migration | ---> |  Gateway    |              |
|   |  + Data     |      | Chain     |      |  + Data     |              |
|   +-------------+      +-----------+      +-------------+              |
|        |                    |                   |                      |
|        v                    v                   v                      |
|   +---------+         +---------+         +---------+                  |
|   | DB v1   |  -----> | DB v2   |  -----> | DB v3   |                  |
|   +---------+         +---------+         +---------+                  |
|                                                                        |
|   Rollback Path:                                                       |
|   +-------------+      +-----------+      +-------------+              |
|   |  GA         |      | Alembic   |      |  RC1        |              |
|   |  Gateway    | ---> | Downgrade | ---> |  Gateway    |              |
|   +-------------+      +-----------+      +-------------+              |
|                                                                        |
+------------------------------------------------------------------------+

Test Environment Setup

# Set up test database with RC1 data
export DATABASE_URL="postgresql://user:pass@localhost/upgrade_test"
export RC1_VERSION="1.0.0-rc1"
export GA_VERSION="1.0.0"

# Create backup before upgrade
pg_dump $DATABASE_URL > pre_upgrade_backup.sql

# Clone repo at RC1 version
git clone --branch v$RC1_VERSION https://github.com/org/mcpgateway rc1_gateway
cd rc1_gateway && make install && make serve &

# Create test data
export GATEWAY_URL="http://localhost:8000"
export TOKEN=$(python -m mcpgateway.utils.create_jwt_token --username admin@example.com --secret "$JWT_SECRET")

Manual Test Cases

Case Scenario From To Expected Result
UPG-01 RC1 to GA 1.0.0-RC1 1.0.0-GA All data preserved
UPG-02 Fresh install None 1.0.0-GA Clean setup works
UPG-03 Populated migration Full DB 1.0.0-GA All entities migrated
UPG-04 Rollback 1.0.0-GA 1.0.0-RC1 Graceful downgrade
UPG-05 Skip version 0.9.x 1.0.0-GA Migration chain works
UPG-06 Rolling K8s upgrade RC1 pods GA pods Zero downtime
UPG-07 Idempotent migration Partial Complete No errors on retry

UPG-01: RC1 to GA Upgrade

Pre-Upgrade Data Setup:

# Create entities in RC1
curl -s -X POST "$GATEWAY_URL/gateways" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "test-server-1", "url": "http://localhost:9000"}'

curl -s -X POST "$GATEWAY_URL/api/teams" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "upgrade-test-team"}'

# Record counts
GATEWAY_COUNT=$(curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length')
TEAM_COUNT=$(curl -s "$GATEWAY_URL/api/teams" -H "Authorization: Bearer $TOKEN" | jq 'length')
echo "Pre-upgrade: $GATEWAY_COUNT gateways, $TEAM_COUNT teams"

Upgrade Steps:

# Stop RC1 gateway
pkill -f "mcpgateway"

# Switch to GA version
cd ../ga_gateway  # or: git checkout v1.0.0

# Run migrations
cd mcpgateway && alembic upgrade head

# Start GA gateway
make serve &
sleep 10

Post-Upgrade Validation:

# Verify counts match
NEW_GATEWAY_COUNT=$(curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length')
NEW_TEAM_COUNT=$(curl -s "$GATEWAY_URL/api/teams" -H "Authorization: Bearer $TOKEN" | jq 'length')

[ "$GATEWAY_COUNT" = "$NEW_GATEWAY_COUNT" ] && echo "PASS: Gateway count matches" || echo "FAIL"
[ "$TEAM_COUNT" = "$NEW_TEAM_COUNT" ] && echo "PASS: Team count matches" || echo "FAIL"

# Verify functionality
curl -s -X POST "$GATEWAY_URL/mcp/http" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}' | jq '.result.tools | length'

Expected Result:

  • All entity counts match pre-upgrade
  • All API endpoints respond correctly
  • New GA features are available
UPG-04: Rollback Procedure

Trigger Rollback:

# Simulate issue discovered post-upgrade
# Stop GA gateway
pkill -f "mcpgateway"

# Downgrade database (if supported)
cd mcpgateway && alembic downgrade -1

# Start RC1 gateway
cd ../rc1_gateway && make serve &
sleep 10

Validate Rollback:

# Verify service restored
curl -s "$GATEWAY_URL/health" | jq '.status'

# Verify data accessible
curl -s "$GATEWAY_URL/gateways" -H "Authorization: Bearer $TOKEN" | jq '.items | length'

Expected Result:

  • Gateway starts successfully on RC1
  • All data remains accessible
  • No data corruption
UPG-06: Kubernetes Rolling Upgrade

Helm Upgrade:

# Update values with new image tag
helm upgrade mcpgateway ./charts/mcpgateway \
  --set image.tag=$GA_VERSION \
  --set strategy.type=RollingUpdate \
  --set strategy.rollingUpdate.maxUnavailable=0 \
  --set strategy.rollingUpdate.maxSurge=1

# Monitor rollout
kubectl rollout status deployment/mcpgateway -w

Zero-Downtime Validation:

# Run continuous health check during upgrade
while true; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$GATEWAY_URL/health")
  [ "$STATUS" != "200" ] && echo "DOWNTIME DETECTED at $(date)" && break
  sleep 1
done &

# Perform upgrade (above helm command)
# After upgrade completes, kill the health check loop

Expected Result:

  • All health checks return 200 throughout upgrade
  • Old pods drain gracefully
  • New pods pass readiness probes before receiving traffic
UPG-07: Idempotent Migration Test

Test Partial Application:

# Simulate partial migration (manually interrupt)
cd mcpgateway && timeout 2 alembic upgrade head || true

# Retry migration
alembic upgrade head

# Verify success
alembic current

Expected Result:

  • Migration completes without errors on retry
  • Database is in correct final state
  • No duplicate constraints or errors

Test Matrix

Upgrade Path Database Pre-data Rolling Rollback Pass Criteria
RC1 -> GA PostgreSQL Yes N/A N/A Data preserved
RC1 -> GA SQLite Yes N/A N/A Data preserved
Fresh -> GA PostgreSQL No N/A N/A Clean install
GA -> RC1 PostgreSQL Yes N/A Yes Rollback works
RC1 -> GA PostgreSQL Yes K8s N/A Zero downtime

Success Criteria

  • RC1 to GA upgrade preserves all data
  • Fresh GA install works correctly
  • All migrations are idempotent
  • Rollback procedure tested and documented
  • Kubernetes rolling upgrade has zero downtime
  • Upgrade runbook documented with timing
  • Skip-version upgrades work (migration chain)

Related Files

  • mcpgateway/alembic/ - Database migrations
  • mcpgateway/alembic/versions/ - Individual migration scripts
  • charts/mcpgateway/ - Helm chart for K8s upgrades
  • CHANGELOG.md - Version history

Related Issues

Metadata

Metadata

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafechoreLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesreadyValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions