Skip to content

[EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing #2347

@crivetimihai

Description

@crivetimihai

🧪 Epic: Automated MCP Server Compatibility Regression Suite - Top 100+ Server Testing

Goal

Implement a comprehensive automated regression testing framework that continuously validates ContextForge compatibility with 100+ popular MCP servers from the ecosystem. This CI/CD-integrated test suite runs on every build via GitHub Actions, detecting protocol compatibility regressions, transport issues, schema validation failures, and behavioral deviations before they reach production.

Why Now?

With ContextForge's growing adoption and the rapidly expanding MCP ecosystem, maintaining compatibility is critical:

  1. Ecosystem Growth: The MCP ecosystem now includes 100+ public servers spanning databases, APIs, cloud services, AI tools, and developer utilities - each a potential integration point
  2. Protocol Evolution: MCP specification updates, transport changes (stdio/SSE/WebSocket/HTTP Streamable), and schema modifications require continuous validation
  3. Regression Prevention: Recent issues ([BUG]: Few MCP servers are not supported - Error when adding gateway #2322) showed JSON validation failures breaking gateway functionality - automated detection would have caught this earlier
  4. User Trust: Organizations depend on ContextForge to reliably proxy their MCP infrastructure - compatibility regressions erode confidence
  5. Release Velocity: Manual compatibility testing is unsustainable at scale - automation enables confident frequent releases
  6. Community Health: Publishing compatibility reports positions ContextForge as the authoritative MCP compatibility reference

By implementing this as an automated GitHub Actions workflow, we enable continuous compatibility assurance without human intervention while generating valuable compatibility matrices for the community.


📖 User Stories

US-1: Platform Maintainer - Automated Regression Detection

As a Platform Maintainer
I want automated tests to verify MCP server compatibility on every PR/push
So that compatibility regressions are detected before merging

Acceptance Criteria:

Given a PR is opened against the main branch
When the GitHub Actions workflow triggers
Then the system should:
  - Spin up ContextForge in test mode
  - Launch 100+ MCP servers from the registry
  - Attempt registration via POST /gateways
  - Validate tool/resource/prompt discovery
  - Execute sample tool invocations
  - Record pass/fail status per server
  - Fail the CI if >5% of servers regress from baseline
  - Generate a compatibility report artifact

Technical Requirements:

  • GitHub Actions workflow with matrix strategy
  • Parallel server testing (10-20 concurrent)
  • Baseline comparison for regression detection
  • Artifact upload for compatibility reports
  • PR comment with test summary
US-2: Release Manager - Compatibility Matrix Generation

As a Release Manager
I want each release to include a compatibility matrix
So that users know which MCP servers are verified compatible

Acceptance Criteria:

Given a new release is tagged (e.g., v1.1.0)
When the release workflow runs
Then the system should:
  - Run full compatibility suite against all servers
  - Generate compatibility matrix (server × transport × status)
  - Categorize: ✅ Compatible, ⚠️ Partial, ❌ Incompatible, ⏭️ Skipped
  - Publish matrix to release notes
  - Upload matrix as release asset (JSON + Markdown)
  - Update docs/compatibility.md automatically

Technical Requirements:

  • Full suite execution on release tags
  • Matrix generation in JSON and Markdown formats
  • Automatic docs update via PR
  • Release asset attachment
  • Historical tracking per version
US-3: Developer - Fast Feedback on Compatibility Impact

As a Developer
I want quick feedback on whether my changes affect MCP compatibility
So that I can fix issues before review

Acceptance Criteria:

Given I push a commit to my feature branch
When the smoke test workflow completes (<5 minutes)
Then I should see:
  - Status check: "MCP Compatibility - Top 20 Servers"
  - Pass/fail indicator in PR checks
  - Link to full logs on failure
  - Diff from baseline if any servers regress

Technical Requirements:

  • "Smoke test" subset: top 20 most popular servers
  • Target runtime: <5 minutes
  • Clear pass/fail status checks
  • Direct link to failure details
US-4: QA Engineer - Detailed Failure Analysis

As a QA Engineer
I want detailed failure reports with reproduction steps
So that I can diagnose and fix compatibility issues

Acceptance Criteria:

Given a server fails compatibility testing
Then the report should include:
  - Server name, version, transport type
  - Failure phase: registration | discovery | invocation | response
  - Error message and stack trace
  - Request/response payloads (sanitized)
  - Expected vs actual behavior
  - Link to server repository
  - Suggested fix category: gateway | server | protocol

Technical Requirements:

  • Structured failure reports (JSON)
  • Payload capture with secret redaction
  • Error categorization taxonomy
  • Reproduction script generation
  • Integration with issue templates
US-5: Community Contributor - Server Registration

As a Community Contributor
I want to add my MCP server to the compatibility suite
So that it's automatically tested with each ContextForge release

Acceptance Criteria:

Given I maintain an MCP server
When I submit a PR adding my server to mcp-servers/registry.yaml
Then my server should:
  - Be validated for required fields (name, repo, transport)
  - Be included in nightly compatibility runs
  - Appear in the public compatibility matrix
  - Receive notifications if compatibility breaks

Technical Requirements:

  • Registry schema with validation
  • PR template for server additions
  • Server health check before inclusion
  • Notification webhook for maintainers
  • Badge generation for server READMEs
US-6: Operations Team - Nightly Full Suite

As an Operations Engineer
I want nightly runs of the full 100+ server suite
So that we catch issues from upstream server changes

Acceptance Criteria:

Given it's 2:00 AM UTC
When the nightly schedule triggers
Then the system should:
  - Pull latest versions of all registered servers
  - Run full compatibility suite
  - Compare against previous night's results
  - Alert on new failures (Slack/email)
  - Generate trend report (last 7 days)
  - Archive results for historical analysis

Technical Requirements:

  • Scheduled GitHub Actions (cron)
  • Server version pinning vs latest
  • Delta detection and alerting
  • Time-series data storage
  • Trend visualization
US-7: Security Team - Isolated Test Execution

As a Security Engineer
I want MCP servers tested in isolated containers
So that malicious servers cannot compromise CI infrastructure

Acceptance Criteria:

Given an MCP server is being tested
Then the test runner should:
  - Execute server in isolated Docker container
  - Apply network policies (no external egress)
  - Limit resource usage (CPU, memory, time)
  - Scan for known vulnerabilities before testing
  - Terminate on suspicious behavior
  - Log all I/O for audit

Technical Requirements:

  • Docker-based isolation
  • Network policy enforcement
  • Resource limits (cgroups)
  • Timeout and kill mechanisms
  • Audit logging
US-8: Product Manager - Public Dashboard

As a Product Manager
I want a public compatibility dashboard
So that users can check server compatibility before adoption

Acceptance Criteria:

Given a user visits contextforge.io/compatibility
Then they should see:
  - Searchable list of 100+ MCP servers
  - Compatibility status per server (badge)
  - Last tested date and ContextForge version
  - Supported transports per server
  - Link to detailed test results
  - Historical compatibility trend

Technical Requirements:

  • Static site generation (GitHub Pages)
  • Automatic updates from CI artifacts
  • Search and filter functionality
  • Mobile-responsive design
  • Badge embed codes

🏗 Architecture

System Overview

graph TB
    subgraph "GitHub Actions"
        GHA[GitHub Actions Runner]
        Matrix[Matrix Strategy]
        Parallel[Parallel Jobs x20]
    end

    subgraph "Test Infrastructure"
        CF[ContextForge Test Instance]
        Docker[Docker Containers]
        Network[Isolated Network]
    end

    subgraph "MCP Servers Registry"
        Registry[(registry.yaml)]
        S1[Server 1: filesystem]
        S2[Server 2: github]
        S3[Server 3: postgres]
        SN[Server N: ...]
    end

    subgraph "Test Phases"
        P1[1. Registration]
        P2[2. Discovery]
        P3[3. Invocation]
        P4[4. Validation]
    end

    subgraph "Outputs"
        Report[Compatibility Report]
        Matrix2[Compatibility Matrix]
        Artifacts[CI Artifacts]
        Dashboard[Public Dashboard]
    end

    GHA --> Matrix --> Parallel
    Parallel --> Docker
    Docker --> CF
    Docker --> S1 & S2 & S3 & SN

    Registry --> Docker

    CF --> P1 --> P2 --> P3 --> P4

    P4 --> Report --> Artifacts
    Artifacts --> Matrix2
    Artifacts --> Dashboard
Loading

Test Execution Flow

sequenceDiagram
    participant GHA as GitHub Actions
    participant Runner as Test Runner
    participant CF as ContextForge
    participant Server as MCP Server
    participant Report as Reporter

    GHA->>Runner: Trigger workflow (push/PR/schedule)
    Runner->>Runner: Load registry.yaml
    Runner->>Runner: Create test matrix

    par Parallel Server Tests
        Runner->>Server: docker run mcp-server-X
        Server-->>Runner: Container ready
        Runner->>CF: POST /gateways (register server)

        alt Registration Success
            CF-->>Runner: 201 Created
            Runner->>CF: GET /tools (discovery)
            CF-->>Runner: Tool list

            loop For each sample tool
                Runner->>CF: POST /tools/invoke
                CF->>Server: Forward invocation
                Server-->>CF: Tool result
                CF-->>Runner: Response
                Runner->>Runner: Validate response schema
            end

            Runner->>Report: Record SUCCESS
        else Registration Failure
            CF-->>Runner: Error response
            Runner->>Report: Record FAILURE (phase: registration)
        end
    end

    Report->>Report: Aggregate results
    Report->>GHA: Upload artifacts
    Report->>GHA: Set check status

    alt Any Regressions
        Report->>GHA: FAIL build
        Report->>GHA: Post PR comment
    else All Pass
        Report->>GHA: PASS build
    end
Loading

Server Registry Schema

# mcp-servers/registry.yaml
servers:
  - name: "mcp-server-filesystem"
    description: "File system operations"
    repository: "https://github.com/modelcontextprotocol/servers"
    package: "@modelcontextprotocol/server-filesystem"
    install: "npx"
    command: "npx -y @modelcontextprotocol/server-filesystem /tmp/test"
    transports: [stdio]
    category: "filesystem"
    popularity: 95  # 0-100 score
    tier: "core"    # core | popular | community
    maintainer: "anthropic"
    test_config:
      timeout_seconds: 30
      sample_tools:
        - name: "read_file"
          args: { path: "/tmp/test/sample.txt" }
          setup: "echo 'test content' > /tmp/test/sample.txt"
        - name: "list_directory"
          args: { path: "/tmp/test" }
      expected_tools: ["read_file", "write_file", "list_directory"]
      expected_resources: []

  - name: "mcp-server-github"
    description: "GitHub API integration"
    repository: "https://github.com/modelcontextprotocol/servers"
    package: "@modelcontextprotocol/server-github"
    install: "npx"
    command: "npx -y @modelcontextprotocol/server-github"
    transports: [stdio]
    category: "api"
    popularity: 92
    tier: "core"
    env_vars:
      GITHUB_TOKEN: "${{ secrets.TEST_GITHUB_TOKEN }}"
    test_config:
      timeout_seconds: 60
      sample_tools:
        - name: "search_repositories"
          args: { query: "mcp language:python" }
      expected_tools: ["search_repositories", "get_repository", "list_issues"]

  - name: "mcp-server-postgres"
    description: "PostgreSQL database queries"
    repository: "https://github.com/modelcontextprotocol/servers"
    package: "@modelcontextprotocol/server-postgres"
    install: "npx"
    command: "npx -y @modelcontextprotocol/server-postgres"
    transports: [stdio]
    category: "database"
    popularity: 88
    tier: "core"
    requires_service: "postgres"
    env_vars:
      POSTGRES_URL: "postgresql://test:test@localhost:5432/testdb"
    test_config:
      timeout_seconds: 45
      sample_tools:
        - name: "query"
          args: { sql: "SELECT 1 as test" }
      expected_tools: ["query"]

  # ... 97+ more servers

Compatibility Report Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "metadata": {
      "type": "object",
      "properties": {
        "contextforge_version": { "type": "string" },
        "contextforge_commit": { "type": "string" },
        "test_run_id": { "type": "string" },
        "timestamp": { "type": "string", "format": "date-time" },
        "duration_seconds": { "type": "number" },
        "trigger": { "enum": ["push", "pull_request", "schedule", "manual"] },
        "runner": { "type": "string" }
      }
    },
    "summary": {
      "type": "object",
      "properties": {
        "total_servers": { "type": "integer" },
        "compatible": { "type": "integer" },
        "partial": { "type": "integer" },
        "incompatible": { "type": "integer" },
        "skipped": { "type": "integer" },
        "pass_rate": { "type": "number" }
      }
    },
    "results": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "server_name": { "type": "string" },
          "server_version": { "type": "string" },
          "transport": { "type": "string" },
          "status": { "enum": ["compatible", "partial", "incompatible", "skipped"] },
          "phases": {
            "type": "object",
            "properties": {
              "registration": { "$ref": "#/definitions/phase_result" },
              "discovery": { "$ref": "#/definitions/phase_result" },
              "invocation": { "$ref": "#/definitions/phase_result" },
              "validation": { "$ref": "#/definitions/phase_result" }
            }
          },
          "duration_ms": { "type": "integer" },
          "error": { "type": "string" },
          "details": { "type": "object" }
        }
      }
    },
    "regressions": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "server_name": { "type": "string" },
          "previous_status": { "type": "string" },
          "current_status": { "type": "string" },
          "since_version": { "type": "string" }
        }
      }
    }
  },
  "definitions": {
    "phase_result": {
      "type": "object",
      "properties": {
        "status": { "enum": ["pass", "fail", "skip"] },
        "duration_ms": { "type": "integer" },
        "error": { "type": "string" },
        "details": { "type": "object" }
      }
    }
  }
}

📋 Implementation Tasks

Phase 1: Test Infrastructure Setup ✅

  • Create Test Directory Structure

    • Create tests/compatibility/ directory
    • Create tests/compatibility/conftest.py with pytest fixtures
    • Create tests/compatibility/test_runner.py main test orchestrator
    • Create tests/compatibility/server_launcher.py for container management
    • Create tests/compatibility/report_generator.py for output generation
  • Server Registry Implementation

    • Create mcp-servers/registry.yaml with schema
    • Add top 20 core MCP servers (Anthropic official)
    • Add top 30 popular community servers
    • Add remaining 50+ servers from ecosystem scan
    • Implement registry validation script
    • Create PR template for adding servers
  • Docker Test Environment

    • Create tests/compatibility/Dockerfile.testenv
    • Create docker-compose.test.yaml for services (postgres, redis)
    • Configure isolated network (no external egress)
    • Set resource limits (CPU, memory, timeout)
    • Implement container cleanup

Phase 2: Core Test Framework ✅

  • Test Phases Implementation

    • Registration Phase: POST to /gateways, validate response
    • Discovery Phase: GET /tools, /resources, /prompts
    • Invocation Phase: POST /tools/invoke with sample args
    • Validation Phase: Schema validation, response content checks
  • Server Launcher Module

    • Parse registry.yaml server definitions
    • Support multiple install methods: npx, pip, docker
    • Handle environment variable injection (secrets)
    • Implement health check polling
    • Container lifecycle management (start, stop, cleanup)
    • Timeout handling with graceful shutdown
  • ContextForge Test Client

    • Create tests/compatibility/cf_client.py
    • Gateway registration endpoint wrapper
    • Tool discovery endpoint wrapper
    • Tool invocation endpoint wrapper
    • Error handling and retry logic
    • Response validation utilities
  • Result Collection

    • Define result data structures (Pydantic models)
    • Implement per-server result aggregation
    • Track timing metrics per phase
    • Capture error messages and stack traces
    • Payload logging with secret redaction

Phase 3: GitHub Actions Workflows ✅

  • Smoke Test Workflow (PR/Push)

    • Create .github/workflows/mcp-compatibility-smoke.yaml
    • Trigger on: push, pull_request
    • Test top 20 servers only
    • Target runtime: <5 minutes
    • Matrix strategy: 4 parallel jobs
    • Status check: "MCP Compatibility Smoke Test"
  • Full Suite Workflow (Nightly/Release)

    • Create .github/workflows/mcp-compatibility-full.yaml
    • Trigger on: schedule (nightly), release tags
    • Test all 100+ servers
    • Matrix strategy: 20 parallel jobs
    • Estimated runtime: 15-30 minutes
    • Upload compatibility report artifact
  • Workflow Utilities

    • Baseline comparison script
    • Regression detection logic
    • PR comment generator
    • Slack notification integration
    • Artifact retention policy (90 days)

Phase 4: Reporting & Artifacts ✅

  • Report Generator

    • JSON report with full details
    • Markdown summary for PR comments
    • HTML report for artifacts
    • Compatibility matrix (CSV/JSON)
    • Regression diff report
  • PR Integration

    • Automatic PR comment with summary
    • Status check with pass/fail
    • Link to full report artifact
    • Regression highlight in comment
  • Release Integration

    • Attach compatibility matrix to releases
    • Auto-update docs/compatibility.md
    • Generate badge data
    • Archive historical results

Phase 5: Server Population ✅

  • Core Tier (20 servers) - Anthropic official + critical

    • mcp-server-filesystem
    • mcp-server-github
    • mcp-server-gitlab
    • mcp-server-postgres
    • mcp-server-sqlite
    • mcp-server-memory
    • mcp-server-brave-search
    • mcp-server-google-drive
    • mcp-server-slack
    • mcp-server-puppeteer
    • mcp-server-sequential-thinking
    • mcp-server-fetch
    • mcp-server-everart
    • mcp-server-everything
    • mcp-server-aws-kb-retrieval
    • mcp-server-google-maps
    • mcp-server-time
    • mcp-server-sentry
    • mcp-server-raygun
    • mcp-server-git
  • Popular Tier (30 servers) - Community high-usage

    • Scan npmjs.org for mcp-server-* packages
    • Scan PyPI for MCP server packages
    • Rank by downloads/stars
    • Add top 30 with test configs
    • Verify each server launches successfully
    • Document any special requirements
  • Community Tier (50+ servers) - Broader ecosystem

    • GitHub search for MCP servers
    • Awesome-MCP list scan
    • Add servers with basic test configs
    • Mark experimental/alpha servers
    • Allow community additions via PR

Phase 6: Advanced Features ✅

  • Transport Testing

    • stdio transport tests
    • SSE transport tests
    • WebSocket transport tests
    • HTTP Streamable transport tests
    • Transport fallback validation
  • Version Compatibility

    • Test against multiple server versions (latest, latest-1)
    • Test against multiple CF versions (current, previous release)
    • Version matrix in reports
    • Breaking change detection
  • Performance Baselines

    • Record response times per server
    • Track performance regressions
    • Set performance thresholds (warn/fail)
    • Performance trend graphs
  • Chaos Testing (Optional)

    • Network latency injection
    • Packet loss simulation
    • Server timeout scenarios
    • Malformed response handling

Phase 7: Public Dashboard ✅

  • Dashboard Implementation

    • Static site with compatibility matrix
    • GitHub Pages deployment
    • Auto-update from CI artifacts
    • Search and filter functionality
    • Server detail pages
  • Badges & Embeds

    • shields.io compatible badge endpoint
    • Per-server compatibility badges
    • Overall pass rate badge
    • Embed code generator
  • Historical Tracking

    • Store results in time-series format
    • Compatibility trend visualization
    • Version-over-version comparison
    • Regression timeline

Phase 8: Documentation & Testing ✅

  • Documentation

    • README for tests/compatibility/
    • Registry contribution guide
    • Troubleshooting common failures
    • CI workflow documentation
    • Dashboard usage guide
  • Meta-Testing

    • Unit tests for test framework
    • Mock server for framework validation
    • Report generator tests
    • Workflow syntax validation

Phase 9: Quality & Polish ✅

  • Code Quality

    • Run make autoflake isort black on test code
    • Type hints for all test modules
    • Docstrings for public functions
    • Pass make verify checks
  • Performance Optimization

    • Parallel container startup
    • Connection pooling for CF client
    • Async test execution where possible
    • Caching of server images
  • Reliability

    • Retry logic for flaky servers
    • Graceful handling of unavailable servers
    • Timeout tuning per server category
    • Self-healing container cleanup

⚙️ Configuration Examples

GitHub Actions Smoke Test

# .github/workflows/mcp-compatibility-smoke.yaml
name: MCP Compatibility Smoke Test

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  compatibility:
    name: Test Top 20 MCP Servers
    runs-on: ubuntu-latest
    timeout-minutes: 10

    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: test
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    strategy:
      fail-fast: false
      matrix:
        server-batch: [1, 2, 3, 4]  # 5 servers per batch

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install uv
          uv pip install -e ".[dev]" --system

      - name: Start ContextForge
        run: |
          make dev &
          sleep 10  # Wait for startup

      - name: Run compatibility tests
        run: |
          python -m pytest tests/compatibility/ \
            --tier=core \
            --batch=${{ matrix.server-batch }} \
            --junitxml=results-${{ matrix.server-batch }}.xml
        env:
          TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }}

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: compatibility-results-${{ matrix.server-batch }}
          path: results-*.xml

  report:
    needs: compatibility
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4

      - name: Generate report
        run: python scripts/generate_compatibility_report.py

      - name: Comment on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('compatibility-summary.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: report
            });

Full Suite Workflow

# .github/workflows/mcp-compatibility-full.yaml
name: MCP Compatibility Full Suite

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM UTC daily
  release:
    types: [published]
  workflow_dispatch:

jobs:
  compatibility:
    name: Test All MCP Servers
    runs-on: ubuntu-latest
    timeout-minutes: 45

    strategy:
      fail-fast: false
      matrix:
        server-batch: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # 10+ servers per batch

    steps:
      # ... similar to smoke test but with all tiers

      - name: Run full compatibility tests
        run: |
          python -m pytest tests/compatibility/ \
            --tier=all \
            --batch=${{ matrix.server-batch }} \
            --verbose \
            --capture-payloads

  publish:
    needs: compatibility
    runs-on: ubuntu-latest
    steps:
      - name: Generate compatibility matrix
        run: python scripts/generate_matrix.py

      - name: Update docs
        run: |
          cp compatibility-matrix.md docs/compatibility.md
          git add docs/compatibility.md
          git commit -m "docs: update compatibility matrix [skip ci]"
          git push

      - name: Upload to release
        if: github.event_name == 'release'
        uses: softprops/action-gh-release@v1
        with:
          files: |
            compatibility-matrix.json
            compatibility-matrix.md
            compatibility-report.html

Test Configuration

# tests/compatibility/conftest.py
import pytest
from pathlib import Path
import yaml

def pytest_addoption(parser):
    parser.addoption("--tier", default="core", choices=["core", "popular", "community", "all"])
    parser.addoption("--batch", type=int, default=1)
    parser.addoption("--capture-payloads", action="store_true")

@pytest.fixture(scope="session")
def registry():
    registry_path = Path(__file__).parent.parent.parent / "mcp-servers" / "registry.yaml"
    with open(registry_path) as f:
        return yaml.safe_load(f)

@pytest.fixture(scope="session")
def contextforge_client():
    from tests.compatibility.cf_client import ContextForgeClient
    return ContextForgeClient(base_url="http://localhost:8000")

@pytest.fixture
def server_launcher():
    from tests.compatibility.server_launcher import ServerLauncher
    launcher = ServerLauncher()
    yield launcher
    launcher.cleanup_all()

✅ Success Criteria

  • Coverage: 100+ MCP servers in registry with test configurations
  • Speed: Smoke tests complete in <5 minutes; full suite in <30 minutes
  • Reliability: <2% flaky test rate across runs
  • Regression Detection: Catches 100% of compatibility regressions
  • CI Integration: Status checks on all PRs; artifacts on releases
  • Documentation: Complete registry contribution guide
  • Reporting: JSON, Markdown, HTML reports generated automatically
  • Public Visibility: Dashboard live with search and badges
  • Security: Isolated containers with no external network access
  • Maintainability: Easy to add new servers via PR

🏁 Definition of Done

  • tests/compatibility/ directory with test framework
  • mcp-servers/registry.yaml with 100+ servers
  • Smoke test workflow running on all PRs
  • Full suite workflow running nightly
  • Compatibility matrix attached to releases
  • PR comments with test summaries
  • docs/compatibility.md auto-updated
  • Public dashboard deployed
  • Contribution guide for adding servers
  • Meta-tests for test framework passing
  • Code passes make verify checks
  • <5 minute smoke test runtime achieved
  • 95%+ pass rate on full suite

📝 Additional Notes

🔹 Server Categories:

  • Core: Official MCP servers from Anthropic - must maintain 100% compatibility
  • Popular: Top community servers by downloads - target 95%+ compatibility
  • Community: Broader ecosystem - monitor and report

🔹 Test Phases:

  • Registration: Can the server be registered with ContextForge?
  • Discovery: Are tools/resources/prompts correctly discovered?
  • Invocation: Do sample tool calls succeed?
  • Validation: Do responses match expected schemas?

🔹 Failure Categories:

  • Gateway Issue: ContextForge bug - needs fix
  • Server Issue: Upstream server bug - document and notify
  • Protocol Issue: MCP spec interpretation difference - clarify
  • Environment Issue: CI/test infrastructure problem - flaky

🔹 Performance Budget:

  • Server startup: <10 seconds
  • Registration: <2 seconds
  • Discovery: <5 seconds
  • Invocation: <10 seconds per tool

🔹 Security Considerations:

  • All servers run in isolated Docker containers
  • No network egress allowed from test containers
  • Secrets injected via GitHub encrypted secrets
  • Payloads logged with automatic secret redaction
  • Container images scanned for vulnerabilities

🔹 Ecosystem Benefits:

  • Public compatibility matrix helps MCP adopters
  • Regression alerts help server maintainers
  • Badge system encourages compatibility
  • Test configs serve as integration examples

🔗 Related Issues


📚 References

Metadata

Metadata

Assignees

No one assigned

    Labels

    MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafecicdIssue with CI/CD process (GitHub Actions, scaffolding)enhancementNew feature or requestepicLarge feature spanning multiple issuesmcp-serversMCP Server SamplestestingTesting (unit, e2e, manual, automated, etc)

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions