[EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing

# 🧪 Epic: Automated MCP Server Compatibility Regression Suite - Top 100+ Server Testing

## Goal

Implement a **comprehensive automated regression testing framework** that continuously validates **ContextForge compatibility with 100+ popular MCP servers** from the ecosystem. This CI/CD-integrated test suite runs on every build via GitHub Actions, detecting protocol compatibility regressions, transport issues, schema validation failures, and behavioral deviations before they reach production.

## Why Now?

With ContextForge's growing adoption and the rapidly expanding MCP ecosystem, maintaining compatibility is critical:

1. **Ecosystem Growth**: The MCP ecosystem now includes 100+ public servers spanning databases, APIs, cloud services, AI tools, and developer utilities - each a potential integration point
2. **Protocol Evolution**: MCP specification updates, transport changes (stdio/SSE/WebSocket/HTTP Streamable), and schema modifications require continuous validation
3. **Regression Prevention**: Recent issues (#2322) showed JSON validation failures breaking gateway functionality - automated detection would have caught this earlier
4. **User Trust**: Organizations depend on ContextForge to reliably proxy their MCP infrastructure - compatibility regressions erode confidence
5. **Release Velocity**: Manual compatibility testing is unsustainable at scale - automation enables confident frequent releases
6. **Community Health**: Publishing compatibility reports positions ContextForge as the authoritative MCP compatibility reference

By implementing this as an **automated GitHub Actions workflow**, we enable continuous compatibility assurance without human intervention while generating valuable compatibility matrices for the community.

---

## 📖 User Stories

<details>
<summary>US-1: Platform Maintainer - Automated Regression Detection</summary>

**As a** Platform Maintainer
**I want** automated tests to verify MCP server compatibility on every PR/push
**So that** compatibility regressions are detected before merging

**Acceptance Criteria:**

```gherkin
Given a PR is opened against the main branch
When the GitHub Actions workflow triggers
Then the system should:
 - Spin up ContextForge in test mode
 - Launch 100+ MCP servers from the registry
 - Attempt registration via POST /gateways
 - Validate tool/resource/prompt discovery
 - Execute sample tool invocations
 - Record pass/fail status per server
 - Fail the CI if >5% of servers regress from baseline
 - Generate a compatibility report artifact
```

**Technical Requirements:**
- GitHub Actions workflow with matrix strategy
- Parallel server testing (10-20 concurrent)
- Baseline comparison for regression detection
- Artifact upload for compatibility reports
- PR comment with test summary

</details>

<details>
<summary>US-2: Release Manager - Compatibility Matrix Generation</summary>

**As a** Release Manager
**I want** each release to include a compatibility matrix
**So that** users know which MCP servers are verified compatible

**Acceptance Criteria:**

```gherkin
Given a new release is tagged (e.g., v1.1.0)
When the release workflow runs
Then the system should:
 - Run full compatibility suite against all servers
 - Generate compatibility matrix (server × transport × status)
 - Categorize: ✅ Compatible, ⚠️ Partial, ❌ Incompatible, ⏭️ Skipped
 - Publish matrix to release notes
 - Upload matrix as release asset (JSON + Markdown)
 - Update docs/compatibility.md automatically
```

**Technical Requirements:**
- Full suite execution on release tags
- Matrix generation in JSON and Markdown formats
- Automatic docs update via PR
- Release asset attachment
- Historical tracking per version

</details>

<details>
<summary>US-3: Developer - Fast Feedback on Compatibility Impact</summary>

**As a** Developer
**I want** quick feedback on whether my changes affect MCP compatibility
**So that** I can fix issues before review

**Acceptance Criteria:**

```gherkin
Given I push a commit to my feature branch
When the smoke test workflow completes (<5 minutes)
Then I should see:
 - Status check: "MCP Compatibility - Top 20 Servers"
 - Pass/fail indicator in PR checks
 - Link to full logs on failure
 - Diff from baseline if any servers regress
```

**Technical Requirements:**
- "Smoke test" subset: top 20 most popular servers
- Target runtime: <5 minutes
- Clear pass/fail status checks
- Direct link to failure details

</details>

<details>
<summary>US-4: QA Engineer - Detailed Failure Analysis</summary>

**As a** QA Engineer
**I want** detailed failure reports with reproduction steps
**So that** I can diagnose and fix compatibility issues

**Acceptance Criteria:**

```gherkin
Given a server fails compatibility testing
Then the report should include:
 - Server name, version, transport type
 - Failure phase: registration | discovery | invocation | response
 - Error message and stack trace
 - Request/response payloads (sanitized)
 - Expected vs actual behavior
 - Link to server repository
 - Suggested fix category: gateway | server | protocol
```

**Technical Requirements:**
- Structured failure reports (JSON)
- Payload capture with secret redaction
- Error categorization taxonomy
- Reproduction script generation
- Integration with issue templates

</details>

<details>
<summary>US-5: Community Contributor - Server Registration</summary>

**As a** Community Contributor
**I want** to add my MCP server to the compatibility suite
**So that** it's automatically tested with each ContextForge release

**Acceptance Criteria:**

```gherkin
Given I maintain an MCP server
When I submit a PR adding my server to mcp-servers/registry.yaml
Then my server should:
 - Be validated for required fields (name, repo, transport)
 - Be included in nightly compatibility runs
 - Appear in the public compatibility matrix
 - Receive notifications if compatibility breaks
```

**Technical Requirements:**
- Registry schema with validation
- PR template for server additions
- Server health check before inclusion
- Notification webhook for maintainers
- Badge generation for server READMEs

</details>

<details>
<summary>US-6: Operations Team - Nightly Full Suite</summary>

**As an** Operations Engineer
**I want** nightly runs of the full 100+ server suite
**So that** we catch issues from upstream server changes

**Acceptance Criteria:**

```gherkin
Given it's 2:00 AM UTC
When the nightly schedule triggers
Then the system should:
 - Pull latest versions of all registered servers
 - Run full compatibility suite
 - Compare against previous night's results
 - Alert on new failures (Slack/email)
 - Generate trend report (last 7 days)
 - Archive results for historical analysis
```

**Technical Requirements:**
- Scheduled GitHub Actions (cron)
- Server version pinning vs latest
- Delta detection and alerting
- Time-series data storage
- Trend visualization

</details>

<details>
<summary>US-7: Security Team - Isolated Test Execution</summary>

**As a** Security Engineer
**I want** MCP servers tested in isolated containers
**So that** malicious servers cannot compromise CI infrastructure

**Acceptance Criteria:**

```gherkin
Given an MCP server is being tested
Then the test runner should:
 - Execute server in isolated Docker container
 - Apply network policies (no external egress)
 - Limit resource usage (CPU, memory, time)
 - Scan for known vulnerabilities before testing
 - Terminate on suspicious behavior
 - Log all I/O for audit
```

**Technical Requirements:**
- Docker-based isolation
- Network policy enforcement
- Resource limits (cgroups)
- Timeout and kill mechanisms
- Audit logging

</details>

<details>
<summary>US-8: Product Manager - Public Dashboard</summary>

**As a** Product Manager
**I want** a public compatibility dashboard
**So that** users can check server compatibility before adoption

**Acceptance Criteria:**

```gherkin
Given a user visits contextforge.io/compatibility
Then they should see:
 - Searchable list of 100+ MCP servers
 - Compatibility status per server (badge)
 - Last tested date and ContextForge version
 - Supported transports per server
 - Link to detailed test results
 - Historical compatibility trend
```

**Technical Requirements:**
- Static site generation (GitHub Pages)
- Automatic updates from CI artifacts
- Search and filter functionality
- Mobile-responsive design
- Badge embed codes

</details>

---

## 🏗 Architecture

### System Overview

```mermaid
graph TB
 subgraph "GitHub Actions"
 GHA[GitHub Actions Runner]
 Matrix[Matrix Strategy]
 Parallel[Parallel Jobs x20]
 end

 subgraph "Test Infrastructure"
 CF[ContextForge Test Instance]
 Docker[Docker Containers]
 Network[Isolated Network]
 end

 subgraph "MCP Servers Registry"
 Registry[(registry.yaml)]
 S1[Server 1: filesystem]
 S2[Server 2: github]
 S3[Server 3: postgres]
 SN[Server N: ...]
 end

 subgraph "Test Phases"
 P1[1. Registration]
 P2[2. Discovery]
 P3[3. Invocation]
 P4[4. Validation]
 end

 subgraph "Outputs"
 Report[Compatibility Report]
 Matrix2[Compatibility Matrix]
 Artifacts[CI Artifacts]
 Dashboard[Public Dashboard]
 end

 GHA --> Matrix --> Parallel
 Parallel --> Docker
 Docker --> CF
 Docker --> S1 & S2 & S3 & SN

 Registry --> Docker

 CF --> P1 --> P2 --> P3 --> P4

 P4 --> Report --> Artifacts
 Artifacts --> Matrix2
 Artifacts --> Dashboard
```

### Test Execution Flow

```mermaid
sequenceDiagram
 participant GHA as GitHub Actions
 participant Runner as Test Runner
 participant CF as ContextForge
 participant Server as MCP Server
 participant Report as Reporter

 GHA->>Runner: Trigger workflow (push/PR/schedule)
 Runner->>Runner: Load registry.yaml
 Runner->>Runner: Create test matrix

 par Parallel Server Tests
 Runner->>Server: docker run mcp-server-X
 Server-->>Runner: Container ready
 Runner->>CF: POST /gateways (register server)

 alt Registration Success
 CF-->>Runner: 201 Created
 Runner->>CF: GET /tools (discovery)
 CF-->>Runner: Tool list

 loop For each sample tool
 Runner->>CF: POST /tools/invoke
 CF->>Server: Forward invocation
 Server-->>CF: Tool result
 CF-->>Runner: Response
 Runner->>Runner: Validate response schema
 end

 Runner->>Report: Record SUCCESS
 else Registration Failure
 CF-->>Runner: Error response
 Runner->>Report: Record FAILURE (phase: registration)
 end
 end

 Report->>Report: Aggregate results
 Report->>GHA: Upload artifacts
 Report->>GHA: Set check status

 alt Any Regressions
 Report->>GHA: FAIL build
 Report->>GHA: Post PR comment
 else All Pass
 Report->>GHA: PASS build
 end
```

### Server Registry Schema

```yaml
# mcp-servers/registry.yaml
servers:
 - name: "mcp-server-filesystem"
 description: "File system operations"
 repository: "https://github.com/modelcontextprotocol/servers"
 package: "@modelcontextprotocol/server-filesystem"
 install: "npx"
 command: "npx -y @modelcontextprotocol/server-filesystem /tmp/test"
 transports: [stdio]
 category: "filesystem"
 popularity: 95 # 0-100 score
 tier: "core" # core | popular | community
 maintainer: "anthropic"
 test_config:
 timeout_seconds: 30
 sample_tools:
 - name: "read_file"
 args: { path: "/tmp/test/sample.txt" }
 setup: "echo 'test content' > /tmp/test/sample.txt"
 - name: "list_directory"
 args: { path: "/tmp/test" }
 expected_tools: ["read_file", "write_file", "list_directory"]
 expected_resources: []

 - name: "mcp-server-github"
 description: "GitHub API integration"
 repository: "https://github.com/modelcontextprotocol/servers"
 package: "@modelcontextprotocol/server-github"
 install: "npx"
 command: "npx -y @modelcontextprotocol/server-github"
 transports: [stdio]
 category: "api"
 popularity: 92
 tier: "core"
 env_vars:
 GITHUB_TOKEN: "${{ secrets.TEST_GITHUB_TOKEN }}"
 test_config:
 timeout_seconds: 60
 sample_tools:
 - name: "search_repositories"
 args: { query: "mcp language:python" }
 expected_tools: ["search_repositories", "get_repository", "list_issues"]

 - name: "mcp-server-postgres"
 description: "PostgreSQL database queries"
 repository: "https://github.com/modelcontextprotocol/servers"
 package: "@modelcontextprotocol/server-postgres"
 install: "npx"
 command: "npx -y @modelcontextprotocol/server-postgres"
 transports: [stdio]
 category: "database"
 popularity: 88
 tier: "core"
 requires_service: "postgres"
 env_vars:
 POSTGRES_URL: "postgresql://test:test@localhost:5432/testdb"
 test_config:
 timeout_seconds: 45
 sample_tools:
 - name: "query"
 args: { sql: "SELECT 1 as test" }
 expected_tools: ["query"]

 # ... 97+ more servers
```

### Compatibility Report Schema

```json
{
 "$schema": "https://json-schema.org/draft/2020-12/schema",
 "type": "object",
 "properties": {
 "metadata": {
 "type": "object",
 "properties": {
 "contextforge_version": { "type": "string" },
 "contextforge_commit": { "type": "string" },
 "test_run_id": { "type": "string" },
 "timestamp": { "type": "string", "format": "date-time" },
 "duration_seconds": { "type": "number" },
 "trigger": { "enum": ["push", "pull_request", "schedule", "manual"] },
 "runner": { "type": "string" }
 }
 },
 "summary": {
 "type": "object",
 "properties": {
 "total_servers": { "type": "integer" },
 "compatible": { "type": "integer" },
 "partial": { "type": "integer" },
 "incompatible": { "type": "integer" },
 "skipped": { "type": "integer" },
 "pass_rate": { "type": "number" }
 }
 },
 "results": {
 "type": "array",
 "items": {
 "type": "object",
 "properties": {
 "server_name": { "type": "string" },
 "server_version": { "type": "string" },
 "transport": { "type": "string" },
 "status": { "enum": ["compatible", "partial", "incompatible", "skipped"] },
 "phases": {
 "type": "object",
 "properties": {
 "registration": { "$ref": "#/definitions/phase_result" },
 "discovery": { "$ref": "#/definitions/phase_result" },
 "invocation": { "$ref": "#/definitions/phase_result" },
 "validation": { "$ref": "#/definitions/phase_result" }
 }
 },
 "duration_ms": { "type": "integer" },
 "error": { "type": "string" },
 "details": { "type": "object" }
 }
 }
 },
 "regressions": {
 "type": "array",
 "items": {
 "type": "object",
 "properties": {
 "server_name": { "type": "string" },
 "previous_status": { "type": "string" },
 "current_status": { "type": "string" },
 "since_version": { "type": "string" }
 }
 }
 }
 },
 "definitions": {
 "phase_result": {
 "type": "object",
 "properties": {
 "status": { "enum": ["pass", "fail", "skip"] },
 "duration_ms": { "type": "integer" },
 "error": { "type": "string" },
 "details": { "type": "object" }
 }
 }
 }
}
```

---

## 📋 Implementation Tasks

### Phase 1: Test Infrastructure Setup ✅

- [ ] **Create Test Directory Structure**
 - [ ] Create `tests/compatibility/` directory
 - [ ] Create `tests/compatibility/conftest.py` with pytest fixtures
 - [ ] Create `tests/compatibility/test_runner.py` main test orchestrator
 - [ ] Create `tests/compatibility/server_launcher.py` for container management
 - [ ] Create `tests/compatibility/report_generator.py` for output generation

- [ ] **Server Registry Implementation**
 - [ ] Create `mcp-servers/registry.yaml` with schema
 - [ ] Add top 20 core MCP servers (Anthropic official)
 - [ ] Add top 30 popular community servers
 - [ ] Add remaining 50+ servers from ecosystem scan
 - [ ] Implement registry validation script
 - [ ] Create PR template for adding servers

- [ ] **Docker Test Environment**
 - [ ] Create `tests/compatibility/Dockerfile.testenv`
 - [ ] Create `docker-compose.test.yaml` for services (postgres, redis)
 - [ ] Configure isolated network (no external egress)
 - [ ] Set resource limits (CPU, memory, timeout)
 - [ ] Implement container cleanup

### Phase 2: Core Test Framework ✅

- [ ] **Test Phases Implementation**
 - [ ] **Registration Phase**: POST to /gateways, validate response
 - [ ] **Discovery Phase**: GET /tools, /resources, /prompts
 - [ ] **Invocation Phase**: POST /tools/invoke with sample args
 - [ ] **Validation Phase**: Schema validation, response content checks

- [ ] **Server Launcher Module**
 - [ ] Parse registry.yaml server definitions
 - [ ] Support multiple install methods: npx, pip, docker
 - [ ] Handle environment variable injection (secrets)
 - [ ] Implement health check polling
 - [ ] Container lifecycle management (start, stop, cleanup)
 - [ ] Timeout handling with graceful shutdown

- [ ] **ContextForge Test Client**
 - [ ] Create `tests/compatibility/cf_client.py`
 - [ ] Gateway registration endpoint wrapper
 - [ ] Tool discovery endpoint wrapper
 - [ ] Tool invocation endpoint wrapper
 - [ ] Error handling and retry logic
 - [ ] Response validation utilities

- [ ] **Result Collection**
 - [ ] Define result data structures (Pydantic models)
 - [ ] Implement per-server result aggregation
 - [ ] Track timing metrics per phase
 - [ ] Capture error messages and stack traces
 - [ ] Payload logging with secret redaction

### Phase 3: GitHub Actions Workflows ✅

- [ ] **Smoke Test Workflow (PR/Push)**
 - [ ] Create `.github/workflows/mcp-compatibility-smoke.yaml`
 - [ ] Trigger on: push, pull_request
 - [ ] Test top 20 servers only
 - [ ] Target runtime: <5 minutes
 - [ ] Matrix strategy: 4 parallel jobs
 - [ ] Status check: "MCP Compatibility Smoke Test"

- [ ] **Full Suite Workflow (Nightly/Release)**
 - [ ] Create `.github/workflows/mcp-compatibility-full.yaml`
 - [ ] Trigger on: schedule (nightly), release tags
 - [ ] Test all 100+ servers
 - [ ] Matrix strategy: 20 parallel jobs
 - [ ] Estimated runtime: 15-30 minutes
 - [ ] Upload compatibility report artifact

- [ ] **Workflow Utilities**
 - [ ] Baseline comparison script
 - [ ] Regression detection logic
 - [ ] PR comment generator
 - [ ] Slack notification integration
 - [ ] Artifact retention policy (90 days)

### Phase 4: Reporting & Artifacts ✅

- [ ] **Report Generator**
 - [ ] JSON report with full details
 - [ ] Markdown summary for PR comments
 - [ ] HTML report for artifacts
 - [ ] Compatibility matrix (CSV/JSON)
 - [ ] Regression diff report

- [ ] **PR Integration**
 - [ ] Automatic PR comment with summary
 - [ ] Status check with pass/fail
 - [ ] Link to full report artifact
 - [ ] Regression highlight in comment

- [ ] **Release Integration**
 - [ ] Attach compatibility matrix to releases
 - [ ] Auto-update `docs/compatibility.md`
 - [ ] Generate badge data
 - [ ] Archive historical results

### Phase 5: Server Population ✅

- [ ] **Core Tier (20 servers)** - Anthropic official + critical
 - [ ] mcp-server-filesystem
 - [ ] mcp-server-github
 - [ ] mcp-server-gitlab
 - [ ] mcp-server-postgres
 - [ ] mcp-server-sqlite
 - [ ] mcp-server-memory
 - [ ] mcp-server-brave-search
 - [ ] mcp-server-google-drive
 - [ ] mcp-server-slack
 - [ ] mcp-server-puppeteer
 - [ ] mcp-server-sequential-thinking
 - [ ] mcp-server-fetch
 - [ ] mcp-server-everart
 - [ ] mcp-server-everything
 - [ ] mcp-server-aws-kb-retrieval
 - [ ] mcp-server-google-maps
 - [ ] mcp-server-time
 - [ ] mcp-server-sentry
 - [ ] mcp-server-raygun
 - [ ] mcp-server-git

- [ ] **Popular Tier (30 servers)** - Community high-usage
 - [ ] Scan npmjs.org for `mcp-server-*` packages
 - [ ] Scan PyPI for MCP server packages
 - [ ] Rank by downloads/stars
 - [ ] Add top 30 with test configs
 - [ ] Verify each server launches successfully
 - [ ] Document any special requirements

- [ ] **Community Tier (50+ servers)** - Broader ecosystem
 - [ ] GitHub search for MCP servers
 - [ ] Awesome-MCP list scan
 - [ ] Add servers with basic test configs
 - [ ] Mark experimental/alpha servers
 - [ ] Allow community additions via PR

### Phase 6: Advanced Features ✅

- [ ] **Transport Testing**
 - [ ] stdio transport tests
 - [ ] SSE transport tests
 - [ ] WebSocket transport tests
 - [ ] HTTP Streamable transport tests
 - [ ] Transport fallback validation

- [ ] **Version Compatibility**
 - [ ] Test against multiple server versions (latest, latest-1)
 - [ ] Test against multiple CF versions (current, previous release)
 - [ ] Version matrix in reports
 - [ ] Breaking change detection

- [ ] **Performance Baselines**
 - [ ] Record response times per server
 - [ ] Track performance regressions
 - [ ] Set performance thresholds (warn/fail)
 - [ ] Performance trend graphs

- [ ] **Chaos Testing (Optional)**
 - [ ] Network latency injection
 - [ ] Packet loss simulation
 - [ ] Server timeout scenarios
 - [ ] Malformed response handling

### Phase 7: Public Dashboard ✅

- [ ] **Dashboard Implementation**
 - [ ] Static site with compatibility matrix
 - [ ] GitHub Pages deployment
 - [ ] Auto-update from CI artifacts
 - [ ] Search and filter functionality
 - [ ] Server detail pages

- [ ] **Badges & Embeds**
 - [ ] shields.io compatible badge endpoint
 - [ ] Per-server compatibility badges
 - [ ] Overall pass rate badge
 - [ ] Embed code generator

- [ ] **Historical Tracking**
 - [ ] Store results in time-series format
 - [ ] Compatibility trend visualization
 - [ ] Version-over-version comparison
 - [ ] Regression timeline

### Phase 8: Documentation & Testing ✅

- [ ] **Documentation**
 - [ ] README for tests/compatibility/
 - [ ] Registry contribution guide
 - [ ] Troubleshooting common failures
 - [ ] CI workflow documentation
 - [ ] Dashboard usage guide

- [ ] **Meta-Testing**
 - [ ] Unit tests for test framework
 - [ ] Mock server for framework validation
 - [ ] Report generator tests
 - [ ] Workflow syntax validation

### Phase 9: Quality & Polish ✅

- [ ] **Code Quality**
 - [ ] Run `make autoflake isort black` on test code
 - [ ] Type hints for all test modules
 - [ ] Docstrings for public functions
 - [ ] Pass `make verify` checks

- [ ] **Performance Optimization**
 - [ ] Parallel container startup
 - [ ] Connection pooling for CF client
 - [ ] Async test execution where possible
 - [ ] Caching of server images

- [ ] **Reliability**
 - [ ] Retry logic for flaky servers
 - [ ] Graceful handling of unavailable servers
 - [ ] Timeout tuning per server category
 - [ ] Self-healing container cleanup

---

## ⚙️ Configuration Examples

### GitHub Actions Smoke Test

```yaml
# .github/workflows/mcp-compatibility-smoke.yaml
name: MCP Compatibility Smoke Test

on:
 push:
 branches: [main]
 pull_request:
 branches: [main]

jobs:
 compatibility:
 name: Test Top 20 MCP Servers
 runs-on: ubuntu-latest
 timeout-minutes: 10

 services:
 postgres:
 image: postgres:15
 env:
 POSTGRES_PASSWORD: test
 options: >-
 --health-cmd pg_isready
 --health-interval 10s
 --health-timeout 5s
 --health-retries 5

 strategy:
 fail-fast: false
 matrix:
 server-batch: [1, 2, 3, 4] # 5 servers per batch

 steps:
 - uses: actions/checkout@v4

 - name: Set up Python
 uses: actions/setup-python@v5
 with:
 python-version: '3.11'

 - name: Install dependencies
 run: |
 pip install uv
 uv pip install -e ".[dev]" --system

 - name: Start ContextForge
 run: |
 make dev &
 sleep 10 # Wait for startup

 - name: Run compatibility tests
 run: |
 python -m pytest tests/compatibility/ \
 --tier=core \
 --batch=${{ matrix.server-batch }} \
 --junitxml=results-${{ matrix.server-batch }}.xml
 env:
 TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }}

 - name: Upload results
 uses: actions/upload-artifact@v4
 with:
 name: compatibility-results-${{ matrix.server-batch }}
 path: results-*.xml

 report:
 needs: compatibility
 runs-on: ubuntu-latest
 steps:
 - uses: actions/download-artifact@v4

 - name: Generate report
 run: python scripts/generate_compatibility_report.py

 - name: Comment on PR
 if: github.event_name == 'pull_request'
 uses: actions/github-script@v7
 with:
 script: |
 const fs = require('fs');
 const report = fs.readFileSync('compatibility-summary.md', 'utf8');
 github.rest.issues.createComment({
 issue_number: context.issue.number,
 owner: context.repo.owner,
 repo: context.repo.repo,
 body: report
 });
```

### Full Suite Workflow

```yaml
# .github/workflows/mcp-compatibility-full.yaml
name: MCP Compatibility Full Suite

on:
 schedule:
 - cron: '0 2 * * *' # 2 AM UTC daily
 release:
 types: [published]
 workflow_dispatch:

jobs:
 compatibility:
 name: Test All MCP Servers
 runs-on: ubuntu-latest
 timeout-minutes: 45

 strategy:
 fail-fast: false
 matrix:
 server-batch: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # 10+ servers per batch

 steps:
 # ... similar to smoke test but with all tiers

 - name: Run full compatibility tests
 run: |
 python -m pytest tests/compatibility/ \
 --tier=all \
 --batch=${{ matrix.server-batch }} \
 --verbose \
 --capture-payloads

 publish:
 needs: compatibility
 runs-on: ubuntu-latest
 steps:
 - name: Generate compatibility matrix
 run: python scripts/generate_matrix.py

 - name: Update docs
 run: |
 cp compatibility-matrix.md docs/compatibility.md
 git add docs/compatibility.md
 git commit -m "docs: update compatibility matrix [skip ci]"
 git push

 - name: Upload to release
 if: github.event_name == 'release'
 uses: softprops/action-gh-release@v1
 with:
 files: |
 compatibility-matrix.json
 compatibility-matrix.md
 compatibility-report.html
```

### Test Configuration

```python
# tests/compatibility/conftest.py
import pytest
from pathlib import Path
import yaml

def pytest_addoption(parser):
 parser.addoption("--tier", default="core", choices=["core", "popular", "community", "all"])
 parser.addoption("--batch", type=int, default=1)
 parser.addoption("--capture-payloads", action="store_true")

@pytest.fixture(scope="session")
def registry():
 registry_path = Path(__file__).parent.parent.parent / "mcp-servers" / "registry.yaml"
 with open(registry_path) as f:
 return yaml.safe_load(f)

@pytest.fixture(scope="session")
def contextforge_client():
 from tests.compatibility.cf_client import ContextForgeClient
 return ContextForgeClient(base_url="http://localhost:8000")

@pytest.fixture
def server_launcher():
 from tests.compatibility.server_launcher import ServerLauncher
 launcher = ServerLauncher()
 yield launcher
 launcher.cleanup_all()
```

---

## ✅ Success Criteria

- [ ] **Coverage**: 100+ MCP servers in registry with test configurations
- [ ] **Speed**: Smoke tests complete in <5 minutes; full suite in <30 minutes
- [ ] **Reliability**: <2% flaky test rate across runs
- [ ] **Regression Detection**: Catches 100% of compatibility regressions
- [ ] **CI Integration**: Status checks on all PRs; artifacts on releases
- [ ] **Documentation**: Complete registry contribution guide
- [ ] **Reporting**: JSON, Markdown, HTML reports generated automatically
- [ ] **Public Visibility**: Dashboard live with search and badges
- [ ] **Security**: Isolated containers with no external network access
- [ ] **Maintainability**: Easy to add new servers via PR

---

## 🏁 Definition of Done

- [ ] `tests/compatibility/` directory with test framework
- [ ] `mcp-servers/registry.yaml` with 100+ servers
- [ ] Smoke test workflow running on all PRs
- [ ] Full suite workflow running nightly
- [ ] Compatibility matrix attached to releases
- [ ] PR comments with test summaries
- [ ] `docs/compatibility.md` auto-updated
- [ ] Public dashboard deployed
- [ ] Contribution guide for adding servers
- [ ] Meta-tests for test framework passing
- [ ] Code passes `make verify` checks
- [ ] <5 minute smoke test runtime achieved
- [ ] 95%+ pass rate on full suite

---

## 📝 Additional Notes

🔹 **Server Categories**:
 - **Core**: Official MCP servers from Anthropic - must maintain 100% compatibility
 - **Popular**: Top community servers by downloads - target 95%+ compatibility
 - **Community**: Broader ecosystem - monitor and report

🔹 **Test Phases**:
 - **Registration**: Can the server be registered with ContextForge?
 - **Discovery**: Are tools/resources/prompts correctly discovered?
 - **Invocation**: Do sample tool calls succeed?
 - **Validation**: Do responses match expected schemas?

🔹 **Failure Categories**:
 - **Gateway Issue**: ContextForge bug - needs fix
 - **Server Issue**: Upstream server bug - document and notify
 - **Protocol Issue**: MCP spec interpretation difference - clarify
 - **Environment Issue**: CI/test infrastructure problem - flaky

🔹 **Performance Budget**:
 - Server startup: <10 seconds
 - Registration: <2 seconds
 - Discovery: <5 seconds
 - Invocation: <10 seconds per tool

🔹 **Security Considerations**:
 - All servers run in isolated Docker containers
 - No network egress allowed from test containers
 - Secrets injected via GitHub encrypted secrets
 - Payloads logged with automatic secret redaction
 - Container images scanned for vulnerabilities

🔹 **Ecosystem Benefits**:
 - Public compatibility matrix helps MCP adopters
 - Regression alerts help server maintainers
 - Badge system encourages compatibility
 - Test configs serve as integration examples

---

## 🔗 Related Issues

- #2322 - JSON validation failures for gateways (motivating issue)
- #XXX - MCP protocol version negotiation
- #XXX - Transport auto-detection improvements
- #XXX - Schema validation enhancements

---

## 📚 References

- [MCP Specification](https://modelcontextprotocol.io/specification)
- [MCP Servers Repository](https://github.com/modelcontextprotocol/servers)
- [GitHub Actions Matrix Strategy](https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs)
- [pytest-xdist Parallel Testing](https://pytest-xdist.readthedocs.io/)
- [Testcontainers Python](https://testcontainers-python.readthedocs.io/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing #2347

🧪 Epic: Automated MCP Server Compatibility Regression Suite - Top 100+ Server Testing

Goal

Why Now?

📖 User Stories

🏗 Architecture

System Overview

Test Execution Flow

Server Registry Schema

Compatibility Report Schema

📋 Implementation Tasks

Phase 1: Test Infrastructure Setup ✅

Phase 2: Core Test Framework ✅

Phase 3: GitHub Actions Workflows ✅

Phase 4: Reporting & Artifacts ✅

Phase 5: Server Population ✅

Phase 6: Advanced Features ✅

Phase 7: Public Dashboard ✅

Phase 8: Documentation & Testing ✅

Phase 9: Quality & Polish ✅

⚙️ Configuration Examples

GitHub Actions Smoke Test

Full Suite Workflow

Test Configuration

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

📚 References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing #2347

Description

🧪 Epic: Automated MCP Server Compatibility Regression Suite - Top 100+ Server Testing

Goal

Why Now?

📖 User Stories

🏗 Architecture

System Overview

Test Execution Flow

Server Registry Schema

Compatibility Report Schema

📋 Implementation Tasks

Phase 1: Test Infrastructure Setup ✅

Phase 2: Core Test Framework ✅

Phase 3: GitHub Actions Workflows ✅

Phase 4: Reporting & Artifacts ✅

Phase 5: Server Population ✅

Phase 6: Advanced Features ✅

Phase 7: Public Dashboard ✅

Phase 8: Documentation & Testing ✅

Phase 9: Quality & Polish ✅

⚙️ Configuration Examples

GitHub Actions Smoke Test

Full Suite Workflow

Test Configuration

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

📚 References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions