Skip to content

[TESTING][OPERATIONS]: Backup and Restore Manual Test Plan (SQLite, PostgreSQL, Disaster Recovery) #2459

@crivetimihai

Description

@crivetimihai

💾 [TESTING][OPERATIONS]: Backup and Restore Manual Test Plan

Goal

Produce a comprehensive manual test plan for database backup and restore operations, validating data integrity, recovery procedures, and operational runbooks for disaster recovery scenarios.

Why Now?

Backup and restore capabilities are critical for production deployments:

  1. Disaster Recovery: Validate the ability to recover from data loss or corruption
  2. Migration Support: Enable safe migration between environments
  3. Compliance Requirements: Meet data retention and recovery SLAs
  4. Operational Confidence: Ensure operators can recover from failures

📖 User Stories

US-1: Operations Engineer - Database Backup Procedures

As an Operations Engineer
I want documented and tested backup procedures
So that I can create reliable backups before maintenance windows

Acceptance Criteria:

Feature: Database Backup Operations

  Scenario: SQLite database backup
    Given a running MCP Gateway with SQLite database
    And the database contains servers, tools, resources, and users
    When I execute the backup procedure
    Then a complete backup file should be created
    And the backup should be verifiable
    And the gateway should remain operational during backup

  Scenario: PostgreSQL database backup
    Given a running MCP Gateway with PostgreSQL database
    When I execute pg_dump with appropriate flags
    Then a complete backup should be created
    And include all schemas and data
    And the backup file should be compressed

  Scenario: Backup verification
    Given a database backup file
    When I verify the backup integrity
    Then the file should pass checksum validation
    And schema should be extractable
    And row counts should match source

Technical Requirements:

  • SQLite: File copy or .backup command
  • PostgreSQL: pg_dump with --format=custom
  • Backup includes all tables and indexes
  • Backup metadata (timestamp, version, row counts)
  • Backup file compression and encryption options
US-2: DBA - Database Restore Procedures

As a Database Administrator
I want tested restore procedures
So that I can recover from backups with confidence

Acceptance Criteria:

Feature: Database Restore Operations

  Scenario: Restore SQLite from backup
    Given a valid SQLite backup file
    And a fresh MCP Gateway installation
    When I restore the database from backup
    Then all data should be recovered
    And the gateway should start successfully
    And all entities should be accessible

  Scenario: Restore PostgreSQL from backup
    Given a valid PostgreSQL backup file
    And a clean PostgreSQL database
    When I execute pg_restore
    Then all schemas should be restored
    And all data should be recovered
    And foreign key constraints should be valid

  Scenario: Point-in-time recovery (PostgreSQL)
    Given WAL archiving is enabled
    And a base backup from yesterday
    When I restore to a specific timestamp
    Then the database should reflect that point in time
    And transactions after that time should not be present
US-3: SRE - Disaster Recovery Validation

As an SRE
I want end-to-end disaster recovery tests
So that I can validate our RTO/RPO targets

Acceptance Criteria:

Feature: Disaster Recovery

  Scenario: Complete environment recovery
    Given a backup from production
    And a fresh infrastructure deployment
    When I restore all data and configuration
    Then the gateway should be fully operational
    And all integrations should work
    And users should be able to authenticate

  Scenario: Recovery time validation
    Given a 10GB database backup
    When I measure the full restore process
    Then restore should complete within RTO target
    And data loss should be within RPO target

🏗 Architecture

Backup and Restore Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                        BACKUP AND RESTORE ARCHITECTURE                       │
└─────────────────────────────────────────────────────────────────────────────┘

                         BACKUP FLOW
                         ───────────
    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
    │ MCP Gateway  │────▶│ Database     │────▶│ Backup File  │
    │ (running)    │     │ Dump/Copy    │     │ (compressed) │
    └──────────────┘     └──────────────┘     └──────────────┘
           │                    │                    │
           │                    │                    │
    ┌──────▼──────┐      ┌──────▼──────┐      ┌──────▼──────┐
    │ Continue    │      │ Lock/MVCC   │      │ Checksum    │
    │ Operations  │      │ Consistent  │      │ Verify      │
    └─────────────┘      └─────────────┘      └─────────────┘

                         RESTORE FLOW
                         ────────────
    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
    │ Backup File  │────▶│ Validate &   │────▶│ Stop Gateway │
    │              │     │ Decompress   │     │              │
    └──────────────┘     └──────────────┘     └──────────────┘
                                                     │
                                              ┌──────▼──────┐
                                              │ Restore DB  │
                                              │ (pg_restore │
                                              │  or copy)   │
                                              └──────┬──────┘
                                                     │
    ┌──────────────┐     ┌──────────────┐     ┌──────▼──────┐
    │ Validate     │◀────│ Start        │◀────│ Run         │
    │ Functionality│     │ Gateway      │     │ Migrations  │
    └──────────────┘     └──────────────┘     └─────────────┘

Database Components

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATABASE SCHEMA OVERVIEW                             │
└─────────────────────────────────────────────────────────────────────────────┘

    ┌────────────────────────────────────────────────────────────────────┐
    │                        Core Entities                                │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
    │  │ Servers  │  │ Tools    │  │Resources │  │ Prompts  │           │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
    │  │ Gateways │  │ A2A      │  │ Roots    │  │ Tags     │           │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
    └────────────────────────────────────────────────────────────────────┘

    ┌────────────────────────────────────────────────────────────────────┐
    │                    Auth & Security Entities                         │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
    │  │ Users    │  │ Teams    │  │ Tokens   │  │ Sessions │           │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐                         │
    │  │ OAuth    │  │ SSO      │  │ Perms    │                         │
    │  └──────────┘  └──────────┘  └──────────┘                         │
    └────────────────────────────────────────────────────────────────────┘

    ┌────────────────────────────────────────────────────────────────────┐
    │                    Operational Entities                             │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
    │  │ Metrics  │  │ Audit    │  │ Logs     │  │ Events   │           │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
    └────────────────────────────────────────────────────────────────────┘

📋 Test Environment Setup

Prerequisites

# Environment variables
export DATABASE_URL="postgresql://user:pass@localhost:5432/mcpgateway"
export BACKUP_DIR="/var/backups/mcpgateway"
export GATEWAY_URL="http://localhost:8000"

# Create backup directory
mkdir -p $BACKUP_DIR

# Verify database connectivity
python -m mcpgateway.utils.db_isready

Test Data Setup

# Create test entities for backup validation
curl -X POST "$GATEWAY_URL/servers" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "backup-test-server", "url": "http://example.com"}'

curl -X POST "$GATEWAY_URL/tools" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "backup-test-tool", "description": "Test tool for backup validation"}'

# Record entity counts for validation
export SERVER_COUNT=$(curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total')
export TOOL_COUNT=$(curl -s "$GATEWAY_URL/tools" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total')

🧪 Manual Test Cases

Section 1: SQLite Backup Operations

Case Operation Database Expected Validation
BK-01 Create backup SQLite File created Size > 0, valid SQLite
BK-02 Backup during operations SQLite Consistent No corruption
BK-03 Backup verification SQLite Valid Schema matches
BK-01: Create SQLite Backup

Preconditions:

  • MCP Gateway running with SQLite
  • Database has test data (servers, tools, etc.)

Steps:

# Step 1: Check current database
ls -la ./mcp.db
sqlite3 ./mcp.db "SELECT COUNT(*) FROM servers;"

# Step 2: Create backup using SQLite backup command
BACKUP_FILE="$BACKUP_DIR/mcp_$(date +%Y%m%d_%H%M%S).db"
sqlite3 ./mcp.db ".backup '$BACKUP_FILE'"

# Step 3: Verify backup file
ls -la $BACKUP_FILE
sqlite3 $BACKUP_FILE "PRAGMA integrity_check;"
sqlite3 $BACKUP_FILE "SELECT COUNT(*) FROM servers;"

Expected Result:

  • Backup file created with same size as source
  • Integrity check returns "ok"
  • Row counts match source database
BK-02: Backup During Active Operations

Preconditions:

  • MCP Gateway actively handling requests
  • Background load running

Steps:

# Step 1: Start background load
for i in {1..100}; do
  curl -s "$GATEWAY_URL/health" &
done

# Step 2: Create backup while under load
BACKUP_FILE="$BACKUP_DIR/mcp_load_$(date +%Y%m%d_%H%M%S).db"
sqlite3 ./mcp.db ".backup '$BACKUP_FILE'"

# Step 3: Wait for load to complete
wait

# Step 4: Verify backup consistency
sqlite3 $BACKUP_FILE "PRAGMA integrity_check;"
sqlite3 $BACKUP_FILE "PRAGMA foreign_key_check;"

Expected Result:

  • Backup completes without errors
  • No foreign key violations
  • Gateway continues operating normally

Section 2: PostgreSQL Backup Operations

Case Operation Method Expected Validation
PG-01 Full backup pg_dump Complete All tables
PG-02 Compressed backup pg_dump -Fc Smaller Decompresses
PG-03 Schema-only pg_dump -s DDL only No data
PG-04 Parallel backup pg_dump -j4 Faster Complete
PG-01: Full PostgreSQL Backup

Preconditions:

  • PostgreSQL database with MCP Gateway data
  • pg_dump available

Steps:

# Step 1: Check database size
psql $DATABASE_URL -c "SELECT pg_size_pretty(pg_database_size(current_database()));"

# Step 2: Create full backup
BACKUP_FILE="$BACKUP_DIR/mcpgateway_$(date +%Y%m%d_%H%M%S).dump"
pg_dump $DATABASE_URL --format=custom --file=$BACKUP_FILE

# Step 3: Verify backup
pg_restore --list $BACKUP_FILE | head -50

# Step 4: Check backup metadata
pg_restore --list $BACKUP_FILE | grep -c "TABLE DATA"

Expected Result:

  • Backup file created
  • All tables included in backup
  • Table data entries match source table count
PG-02: Compressed Parallel Backup

Preconditions:

  • Large PostgreSQL database
  • Multiple CPU cores available

Steps:

# Step 1: Create parallel backup
BACKUP_DIR_PG="$BACKUP_DIR/pg_parallel_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR_PG
pg_dump $DATABASE_URL --format=directory --jobs=4 --file=$BACKUP_DIR_PG

# Step 2: Verify parallel backup
ls -la $BACKUP_DIR_PG/
cat $BACKUP_DIR_PG/toc.dat | head -20

# Step 3: Compare with single-threaded time (optional)
time pg_dump $DATABASE_URL --format=custom --file=/tmp/single.dump

Expected Result:

  • Parallel backup creates directory structure
  • Multiple data files created
  • Faster than single-threaded for large databases

Section 3: Restore Operations

Case Source Target Method Expected
RS-01 SQLite backup Fresh SQLite File copy Full data
RS-02 PostgreSQL backup Fresh PostgreSQL pg_restore Full data
RS-03 Old version backup New version Migrate Compatible
RS-04 Corrupted backup Any pg_restore Error
RS-01: Restore SQLite Database

Preconditions:

  • Valid SQLite backup file
  • Gateway stopped

Steps:

# Step 1: Stop the gateway
pkill -f mcpgateway || true

# Step 2: Backup current database (safety)
mv ./mcp.db ./mcp.db.pre-restore

# Step 3: Restore from backup
cp $BACKUP_FILE ./mcp.db

# Step 4: Verify restoration
sqlite3 ./mcp.db "PRAGMA integrity_check;"
sqlite3 ./mcp.db "SELECT COUNT(*) FROM servers;"

# Step 5: Start gateway
make dev &
sleep 5

# Step 6: Validate functionality
curl -s "$GATEWAY_URL/health"
curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'

Expected Result:

  • Database file replaced
  • Integrity check passes
  • Gateway starts successfully
  • Entity counts match backup source
RS-02: Restore PostgreSQL Database

Preconditions:

  • Valid PostgreSQL backup file
  • Target database exists (empty or drop/recreate)

Steps:

# Step 1: Stop the gateway
pkill -f mcpgateway || true

# Step 2: Drop and recreate database
psql postgres -c "DROP DATABASE IF EXISTS mcpgateway_restore;"
psql postgres -c "CREATE DATABASE mcpgateway_restore;"

# Step 3: Restore from backup
pg_restore --dbname=postgresql://user:pass@localhost:5432/mcpgateway_restore \
  --verbose --clean --if-exists $BACKUP_FILE

# Step 4: Verify restoration
psql postgresql://user:pass@localhost:5432/mcpgateway_restore \
  -c "SELECT COUNT(*) FROM servers;"

# Step 5: Start gateway with restored database
DATABASE_URL=postgresql://user:pass@localhost:5432/mcpgateway_restore make dev &
sleep 5

# Step 6: Validate functionality
curl -s "$GATEWAY_URL/health"

Expected Result:

  • Database restored without errors
  • All tables and data present
  • Gateway operates normally
RS-03: Cross-Version Restore with Migration

Preconditions:

  • Backup from older gateway version
  • New gateway version installed

Steps:

# Step 1: Restore old backup
pg_restore --dbname=$DATABASE_URL --clean --if-exists $OLD_BACKUP_FILE

# Step 2: Check current migration state
cd mcpgateway && alembic current

# Step 3: Run pending migrations
cd mcpgateway && alembic upgrade head

# Step 4: Verify schema
cd mcpgateway && alembic check

# Step 5: Start gateway and verify
make dev &
sleep 5
curl -s "$GATEWAY_URL/health"

Expected Result:

  • Old data restored
  • Migrations apply successfully
  • Gateway starts with new schema

Section 4: Disaster Recovery

Case Scenario RTO Target RPO Target Validation
DR-01 Full recovery 30 min 1 hour All systems
DR-02 Partial recovery 15 min 15 min Core only
DR-03 Cross-region recovery 2 hours 24 hours All data
DR-01: Full Environment Recovery

Preconditions:

  • Complete backup available
  • Fresh infrastructure deployed
  • Runbook available

Steps:

# Record start time
START_TIME=$(date +%s)

# Step 1: Deploy fresh infrastructure
docker-compose up -d postgres redis

# Step 2: Wait for services
sleep 30
python -m mcpgateway.utils.db_isready
python -m mcpgateway.utils.redis_isready

# Step 3: Restore database
pg_restore --dbname=$DATABASE_URL --clean --if-exists $BACKUP_FILE

# Step 4: Run migrations
cd mcpgateway && alembic upgrade head

# Step 5: Start gateway
make serve &
sleep 10

# Step 6: Validate all endpoints
curl -s "$GATEWAY_URL/health" | jq .
curl -s "$GATEWAY_URL/ready" | jq .
curl -s "$GATEWAY_URL/servers" -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'

# Calculate recovery time
END_TIME=$(date +%s)
RECOVERY_TIME=$((END_TIME - START_TIME))
echo "Recovery completed in $RECOVERY_TIME seconds"

Expected Result:

  • All services running
  • All data recovered
  • Recovery time within RTO target

📊 Test Matrix

Test Case SQLite PostgreSQL Hot Backup Migration DR
BK-01
BK-02
BK-03
PG-01
PG-02
PG-03
PG-04
RS-01
RS-02
RS-03
RS-04
DR-01
DR-02
DR-03

✅ Success Criteria

  • All 14 test cases documented and executable
  • SQLite backup/restore procedures validated
  • PostgreSQL backup/restore procedures validated
  • Backup during active operations verified safe
  • Cross-version migration tested
  • Disaster recovery completed within RTO
  • Backup verification procedures documented
  • Runbooks created for each scenario

📚 Definition of Done

  • Test plan reviewed by Operations team
  • All test cases executed successfully
  • Runbooks published to documentation
  • Backup schedule configured in production
  • Monitoring alerts configured for backup failures
  • Recovery procedures tested quarterly

🔗 Related Files

  • mcpgateway/db.py - Database models
  • mcpgateway/alembic/ - Database migrations
  • mcpgateway/utils/db_isready.py - Database readiness check

🔗 Related Issues

  • None identified

Metadata

Metadata

Assignees

No one assigned

    Labels

    SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesreadyValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions