-
Notifications
You must be signed in to change notification settings - Fork 614
[TESTING][OPERATIONS]: Backup and Restore Manual Test Plan (SQLite, PostgreSQL, Disaster Recovery) #2459
Description
💾 [TESTING][OPERATIONS]: Backup and Restore Manual Test Plan
Goal
Produce a comprehensive manual test plan for database backup and restore operations, validating data integrity, recovery procedures, and operational runbooks for disaster recovery scenarios.
Why Now?
Backup and restore capabilities are critical for production deployments:
- Disaster Recovery: Validate the ability to recover from data loss or corruption
- Migration Support: Enable safe migration between environments
- Compliance Requirements: Meet data retention and recovery SLAs
- Operational Confidence: Ensure operators can recover from failures
📖 User Stories
US-1: Operations Engineer - Database Backup Procedures
As an Operations Engineer
I want documented and tested backup procedures
So that I can create reliable backups before maintenance windows
Acceptance Criteria:
Feature: Database Backup Operations
Scenario: SQLite database backup
Given a running MCP Gateway with SQLite database
And the database contains servers, tools, resources, and users
When I execute the backup procedure
Then a complete backup file should be created
And the backup should be verifiable
And the gateway should remain operational during backup
Scenario: PostgreSQL database backup
Given a running MCP Gateway with PostgreSQL database
When I execute pg_dump with appropriate flags
Then a complete backup should be created
And include all schemas and data
And the backup file should be compressed
Scenario: Backup verification
Given a database backup file
When I verify the backup integrity
Then the file should pass checksum validation
And schema should be extractable
And row counts should match sourceTechnical Requirements:
- SQLite: File copy or
.backupcommand - PostgreSQL:
pg_dumpwith--format=custom - Backup includes all tables and indexes
- Backup metadata (timestamp, version, row counts)
- Backup file compression and encryption options
US-2: DBA - Database Restore Procedures
As a Database Administrator
I want tested restore procedures
So that I can recover from backups with confidence
Acceptance Criteria:
Feature: Database Restore Operations
Scenario: Restore SQLite from backup
Given a valid SQLite backup file
And a fresh MCP Gateway installation
When I restore the database from backup
Then all data should be recovered
And the gateway should start successfully
And all entities should be accessible
Scenario: Restore PostgreSQL from backup
Given a valid PostgreSQL backup file
And a clean PostgreSQL database
When I execute pg_restore
Then all schemas should be restored
And all data should be recovered
And foreign key constraints should be valid
Scenario: Point-in-time recovery (PostgreSQL)
Given WAL archiving is enabled
And a base backup from yesterday
When I restore to a specific timestamp
Then the database should reflect that point in time
And transactions after that time should not be presentUS-3: SRE - Disaster Recovery Validation
As an SRE
I want end-to-end disaster recovery tests
So that I can validate our RTO/RPO targets
Acceptance Criteria:
Feature: Disaster Recovery
Scenario: Complete environment recovery
Given a backup from production
And a fresh infrastructure deployment
When I restore all data and configuration
Then the gateway should be fully operational
And all integrations should work
And users should be able to authenticate
Scenario: Recovery time validation
Given a 10GB database backup
When I measure the full restore process
Then restore should complete within RTO target
And data loss should be within RPO target🏗 Architecture
Backup and Restore Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ BACKUP AND RESTORE ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘
BACKUP FLOW
───────────
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ MCP Gateway │────▶│ Database │────▶│ Backup File │
│ (running) │ │ Dump/Copy │ │ (compressed) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Continue │ │ Lock/MVCC │ │ Checksum │
│ Operations │ │ Consistent │ │ Verify │
└─────────────┘ └─────────────┘ └─────────────┘
RESTORE FLOW
────────────
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Backup File │────▶│ Validate & │────▶│ Stop Gateway │
│ │ │ Decompress │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
│
┌──────▼──────┐
│ Restore DB │
│ (pg_restore │
│ or copy) │
└──────┬──────┘
│
┌──────────────┐ ┌──────────────┐ ┌──────▼──────┐
│ Validate │◀────│ Start │◀────│ Run │
│ Functionality│ │ Gateway │ │ Migrations │
└──────────────┘ └──────────────┘ └─────────────┘
Database Components
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATABASE SCHEMA OVERVIEW │
└─────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ Core Entities │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Servers │ │ Tools │ │Resources │ │ Prompts │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Gateways │ │ A2A │ │ Roots │ │ Tags │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ Auth & Security Entities │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Users │ │ Teams │ │ Tokens │ │ Sessions │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ OAuth │ │ SSO │ │ Perms │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ Operational Entities │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Metrics │ │ Audit │ │ Logs │ │ Events │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────┘
📋 Test Environment Setup
Prerequisites
# Environment variables
export DATABASE_URL="postgresql://user:pass@localhost:5432/mcpgateway"
export BACKUP_DIR="/var/backups/mcpgateway"
export GATEWAY_URL="http://localhost:8000"
# Create backup directory
mkdir -p $BACKUP_DIR
# Verify database connectivity
python -m mcpgateway.utils.db_isreadyTest Data Setup
# Create test entities for backup validation
curl -X POST "$GATEWAY_URL/servers" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "backup-test-server", "url": "http://example.com"}'
curl -X POST "$GATEWAY_URL/tools" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "backup-test-tool", "description": "Test tool for backup validation"}'
# Record entity counts for validation
export SERVER_COUNT=$(curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total')
export TOOL_COUNT=$(curl -s "$GATEWAY_URL/tools" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total')🧪 Manual Test Cases
Section 1: SQLite Backup Operations
| Case | Operation | Database | Expected | Validation |
|---|---|---|---|---|
| BK-01 | Create backup | SQLite | File created | Size > 0, valid SQLite |
| BK-02 | Backup during operations | SQLite | Consistent | No corruption |
| BK-03 | Backup verification | SQLite | Valid | Schema matches |
BK-01: Create SQLite Backup
Preconditions:
- MCP Gateway running with SQLite
- Database has test data (servers, tools, etc.)
Steps:
# Step 1: Check current database
ls -la ./mcp.db
sqlite3 ./mcp.db "SELECT COUNT(*) FROM servers;"
# Step 2: Create backup using SQLite backup command
BACKUP_FILE="$BACKUP_DIR/mcp_$(date +%Y%m%d_%H%M%S).db"
sqlite3 ./mcp.db ".backup '$BACKUP_FILE'"
# Step 3: Verify backup file
ls -la $BACKUP_FILE
sqlite3 $BACKUP_FILE "PRAGMA integrity_check;"
sqlite3 $BACKUP_FILE "SELECT COUNT(*) FROM servers;"Expected Result:
- Backup file created with same size as source
- Integrity check returns "ok"
- Row counts match source database
BK-02: Backup During Active Operations
Preconditions:
- MCP Gateway actively handling requests
- Background load running
Steps:
# Step 1: Start background load
for i in {1..100}; do
curl -s "$GATEWAY_URL/health" &
done
# Step 2: Create backup while under load
BACKUP_FILE="$BACKUP_DIR/mcp_load_$(date +%Y%m%d_%H%M%S).db"
sqlite3 ./mcp.db ".backup '$BACKUP_FILE'"
# Step 3: Wait for load to complete
wait
# Step 4: Verify backup consistency
sqlite3 $BACKUP_FILE "PRAGMA integrity_check;"
sqlite3 $BACKUP_FILE "PRAGMA foreign_key_check;"Expected Result:
- Backup completes without errors
- No foreign key violations
- Gateway continues operating normally
Section 2: PostgreSQL Backup Operations
| Case | Operation | Method | Expected | Validation |
|---|---|---|---|---|
| PG-01 | Full backup | pg_dump | Complete | All tables |
| PG-02 | Compressed backup | pg_dump -Fc | Smaller | Decompresses |
| PG-03 | Schema-only | pg_dump -s | DDL only | No data |
| PG-04 | Parallel backup | pg_dump -j4 | Faster | Complete |
PG-01: Full PostgreSQL Backup
Preconditions:
- PostgreSQL database with MCP Gateway data
- pg_dump available
Steps:
# Step 1: Check database size
psql $DATABASE_URL -c "SELECT pg_size_pretty(pg_database_size(current_database()));"
# Step 2: Create full backup
BACKUP_FILE="$BACKUP_DIR/mcpgateway_$(date +%Y%m%d_%H%M%S).dump"
pg_dump $DATABASE_URL --format=custom --file=$BACKUP_FILE
# Step 3: Verify backup
pg_restore --list $BACKUP_FILE | head -50
# Step 4: Check backup metadata
pg_restore --list $BACKUP_FILE | grep -c "TABLE DATA"Expected Result:
- Backup file created
- All tables included in backup
- Table data entries match source table count
PG-02: Compressed Parallel Backup
Preconditions:
- Large PostgreSQL database
- Multiple CPU cores available
Steps:
# Step 1: Create parallel backup
BACKUP_DIR_PG="$BACKUP_DIR/pg_parallel_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR_PG
pg_dump $DATABASE_URL --format=directory --jobs=4 --file=$BACKUP_DIR_PG
# Step 2: Verify parallel backup
ls -la $BACKUP_DIR_PG/
cat $BACKUP_DIR_PG/toc.dat | head -20
# Step 3: Compare with single-threaded time (optional)
time pg_dump $DATABASE_URL --format=custom --file=/tmp/single.dumpExpected Result:
- Parallel backup creates directory structure
- Multiple data files created
- Faster than single-threaded for large databases
Section 3: Restore Operations
| Case | Source | Target | Method | Expected |
|---|---|---|---|---|
| RS-01 | SQLite backup | Fresh SQLite | File copy | Full data |
| RS-02 | PostgreSQL backup | Fresh PostgreSQL | pg_restore | Full data |
| RS-03 | Old version backup | New version | Migrate | Compatible |
| RS-04 | Corrupted backup | Any | pg_restore | Error |
RS-01: Restore SQLite Database
Preconditions:
- Valid SQLite backup file
- Gateway stopped
Steps:
# Step 1: Stop the gateway
pkill -f mcpgateway || true
# Step 2: Backup current database (safety)
mv ./mcp.db ./mcp.db.pre-restore
# Step 3: Restore from backup
cp $BACKUP_FILE ./mcp.db
# Step 4: Verify restoration
sqlite3 ./mcp.db "PRAGMA integrity_check;"
sqlite3 ./mcp.db "SELECT COUNT(*) FROM servers;"
# Step 5: Start gateway
make dev &
sleep 5
# Step 6: Validate functionality
curl -s "$GATEWAY_URL/health"
curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'Expected Result:
- Database file replaced
- Integrity check passes
- Gateway starts successfully
- Entity counts match backup source
RS-02: Restore PostgreSQL Database
Preconditions:
- Valid PostgreSQL backup file
- Target database exists (empty or drop/recreate)
Steps:
# Step 1: Stop the gateway
pkill -f mcpgateway || true
# Step 2: Drop and recreate database
psql postgres -c "DROP DATABASE IF EXISTS mcpgateway_restore;"
psql postgres -c "CREATE DATABASE mcpgateway_restore;"
# Step 3: Restore from backup
pg_restore --dbname=postgresql://user:pass@localhost:5432/mcpgateway_restore \
--verbose --clean --if-exists $BACKUP_FILE
# Step 4: Verify restoration
psql postgresql://user:pass@localhost:5432/mcpgateway_restore \
-c "SELECT COUNT(*) FROM servers;"
# Step 5: Start gateway with restored database
DATABASE_URL=postgresql://user:pass@localhost:5432/mcpgateway_restore make dev &
sleep 5
# Step 6: Validate functionality
curl -s "$GATEWAY_URL/health"Expected Result:
- Database restored without errors
- All tables and data present
- Gateway operates normally
RS-03: Cross-Version Restore with Migration
Preconditions:
- Backup from older gateway version
- New gateway version installed
Steps:
# Step 1: Restore old backup
pg_restore --dbname=$DATABASE_URL --clean --if-exists $OLD_BACKUP_FILE
# Step 2: Check current migration state
cd mcpgateway && alembic current
# Step 3: Run pending migrations
cd mcpgateway && alembic upgrade head
# Step 4: Verify schema
cd mcpgateway && alembic check
# Step 5: Start gateway and verify
make dev &
sleep 5
curl -s "$GATEWAY_URL/health"Expected Result:
- Old data restored
- Migrations apply successfully
- Gateway starts with new schema
Section 4: Disaster Recovery
| Case | Scenario | RTO Target | RPO Target | Validation |
|---|---|---|---|---|
| DR-01 | Full recovery | 30 min | 1 hour | All systems |
| DR-02 | Partial recovery | 15 min | 15 min | Core only |
| DR-03 | Cross-region recovery | 2 hours | 24 hours | All data |
DR-01: Full Environment Recovery
Preconditions:
- Complete backup available
- Fresh infrastructure deployed
- Runbook available
Steps:
# Record start time
START_TIME=$(date +%s)
# Step 1: Deploy fresh infrastructure
docker-compose up -d postgres redis
# Step 2: Wait for services
sleep 30
python -m mcpgateway.utils.db_isready
python -m mcpgateway.utils.redis_isready
# Step 3: Restore database
pg_restore --dbname=$DATABASE_URL --clean --if-exists $BACKUP_FILE
# Step 4: Run migrations
cd mcpgateway && alembic upgrade head
# Step 5: Start gateway
make serve &
sleep 10
# Step 6: Validate all endpoints
curl -s "$GATEWAY_URL/health" | jq .
curl -s "$GATEWAY_URL/ready" | jq .
curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'
# Calculate recovery time
END_TIME=$(date +%s)
RECOVERY_TIME=$((END_TIME - START_TIME))
echo "Recovery completed in $RECOVERY_TIME seconds"Expected Result:
- All services running
- All data recovered
- Recovery time within RTO target
📊 Test Matrix
| Test Case | SQLite | PostgreSQL | Hot Backup | Migration | DR |
|---|---|---|---|---|---|
| BK-01 | ✓ | ||||
| BK-02 | ✓ | ✓ | |||
| BK-03 | ✓ | ||||
| PG-01 | ✓ | ||||
| PG-02 | ✓ | ||||
| PG-03 | ✓ | ||||
| PG-04 | ✓ | ✓ | |||
| RS-01 | ✓ | ||||
| RS-02 | ✓ | ||||
| RS-03 | ✓ | ✓ | |||
| RS-04 | ✓ | ||||
| DR-01 | ✓ | ✓ | |||
| DR-02 | ✓ | ✓ | |||
| DR-03 | ✓ | ✓ |
✅ Success Criteria
- All 14 test cases documented and executable
- SQLite backup/restore procedures validated
- PostgreSQL backup/restore procedures validated
- Backup during active operations verified safe
- Cross-version migration tested
- Disaster recovery completed within RTO
- Backup verification procedures documented
- Runbooks created for each scenario
📚 Definition of Done
- Test plan reviewed by Operations team
- All test cases executed successfully
- Runbooks published to documentation
- Backup schedule configured in production
- Monitoring alerts configured for backup failures
- Recovery procedures tested quarterly
🔗 Related Files
mcpgateway/db.py- Database modelsmcpgateway/alembic/- Database migrationsmcpgateway/utils/db_isready.py- Database readiness check
🔗 Related Issues
- None identified