-
Notifications
You must be signed in to change notification settings - Fork 615
[EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing #2347
Description
🧪 Epic: Automated MCP Server Compatibility Regression Suite - Top 100+ Server Testing
Goal
Implement a comprehensive automated regression testing framework that continuously validates ContextForge compatibility with 100+ popular MCP servers from the ecosystem. This CI/CD-integrated test suite runs on every build via GitHub Actions, detecting protocol compatibility regressions, transport issues, schema validation failures, and behavioral deviations before they reach production.
Why Now?
With ContextForge's growing adoption and the rapidly expanding MCP ecosystem, maintaining compatibility is critical:
- Ecosystem Growth: The MCP ecosystem now includes 100+ public servers spanning databases, APIs, cloud services, AI tools, and developer utilities - each a potential integration point
- Protocol Evolution: MCP specification updates, transport changes (stdio/SSE/WebSocket/HTTP Streamable), and schema modifications require continuous validation
- Regression Prevention: Recent issues ([BUG]: Few MCP servers are not supported - Error when adding gateway #2322) showed JSON validation failures breaking gateway functionality - automated detection would have caught this earlier
- User Trust: Organizations depend on ContextForge to reliably proxy their MCP infrastructure - compatibility regressions erode confidence
- Release Velocity: Manual compatibility testing is unsustainable at scale - automation enables confident frequent releases
- Community Health: Publishing compatibility reports positions ContextForge as the authoritative MCP compatibility reference
By implementing this as an automated GitHub Actions workflow, we enable continuous compatibility assurance without human intervention while generating valuable compatibility matrices for the community.
📖 User Stories
US-1: Platform Maintainer - Automated Regression Detection
As a Platform Maintainer
I want automated tests to verify MCP server compatibility on every PR/push
So that compatibility regressions are detected before merging
Acceptance Criteria:
Given a PR is opened against the main branch
When the GitHub Actions workflow triggers
Then the system should:
- Spin up ContextForge in test mode
- Launch 100+ MCP servers from the registry
- Attempt registration via POST /gateways
- Validate tool/resource/prompt discovery
- Execute sample tool invocations
- Record pass/fail status per server
- Fail the CI if >5% of servers regress from baseline
- Generate a compatibility report artifactTechnical Requirements:
- GitHub Actions workflow with matrix strategy
- Parallel server testing (10-20 concurrent)
- Baseline comparison for regression detection
- Artifact upload for compatibility reports
- PR comment with test summary
US-2: Release Manager - Compatibility Matrix Generation
As a Release Manager
I want each release to include a compatibility matrix
So that users know which MCP servers are verified compatible
Acceptance Criteria:
Given a new release is tagged (e.g., v1.1.0)
When the release workflow runs
Then the system should:
- Run full compatibility suite against all servers
- Generate compatibility matrix (server × transport × status)
- Categorize: ✅ Compatible, ⚠️ Partial, ❌ Incompatible, ⏭️ Skipped
- Publish matrix to release notes
- Upload matrix as release asset (JSON + Markdown)
- Update docs/compatibility.md automaticallyTechnical Requirements:
- Full suite execution on release tags
- Matrix generation in JSON and Markdown formats
- Automatic docs update via PR
- Release asset attachment
- Historical tracking per version
US-3: Developer - Fast Feedback on Compatibility Impact
As a Developer
I want quick feedback on whether my changes affect MCP compatibility
So that I can fix issues before review
Acceptance Criteria:
Given I push a commit to my feature branch
When the smoke test workflow completes (<5 minutes)
Then I should see:
- Status check: "MCP Compatibility - Top 20 Servers"
- Pass/fail indicator in PR checks
- Link to full logs on failure
- Diff from baseline if any servers regressTechnical Requirements:
- "Smoke test" subset: top 20 most popular servers
- Target runtime: <5 minutes
- Clear pass/fail status checks
- Direct link to failure details
US-4: QA Engineer - Detailed Failure Analysis
As a QA Engineer
I want detailed failure reports with reproduction steps
So that I can diagnose and fix compatibility issues
Acceptance Criteria:
Given a server fails compatibility testing
Then the report should include:
- Server name, version, transport type
- Failure phase: registration | discovery | invocation | response
- Error message and stack trace
- Request/response payloads (sanitized)
- Expected vs actual behavior
- Link to server repository
- Suggested fix category: gateway | server | protocolTechnical Requirements:
- Structured failure reports (JSON)
- Payload capture with secret redaction
- Error categorization taxonomy
- Reproduction script generation
- Integration with issue templates
US-5: Community Contributor - Server Registration
As a Community Contributor
I want to add my MCP server to the compatibility suite
So that it's automatically tested with each ContextForge release
Acceptance Criteria:
Given I maintain an MCP server
When I submit a PR adding my server to mcp-servers/registry.yaml
Then my server should:
- Be validated for required fields (name, repo, transport)
- Be included in nightly compatibility runs
- Appear in the public compatibility matrix
- Receive notifications if compatibility breaksTechnical Requirements:
- Registry schema with validation
- PR template for server additions
- Server health check before inclusion
- Notification webhook for maintainers
- Badge generation for server READMEs
US-6: Operations Team - Nightly Full Suite
As an Operations Engineer
I want nightly runs of the full 100+ server suite
So that we catch issues from upstream server changes
Acceptance Criteria:
Given it's 2:00 AM UTC
When the nightly schedule triggers
Then the system should:
- Pull latest versions of all registered servers
- Run full compatibility suite
- Compare against previous night's results
- Alert on new failures (Slack/email)
- Generate trend report (last 7 days)
- Archive results for historical analysisTechnical Requirements:
- Scheduled GitHub Actions (cron)
- Server version pinning vs latest
- Delta detection and alerting
- Time-series data storage
- Trend visualization
US-7: Security Team - Isolated Test Execution
As a Security Engineer
I want MCP servers tested in isolated containers
So that malicious servers cannot compromise CI infrastructure
Acceptance Criteria:
Given an MCP server is being tested
Then the test runner should:
- Execute server in isolated Docker container
- Apply network policies (no external egress)
- Limit resource usage (CPU, memory, time)
- Scan for known vulnerabilities before testing
- Terminate on suspicious behavior
- Log all I/O for auditTechnical Requirements:
- Docker-based isolation
- Network policy enforcement
- Resource limits (cgroups)
- Timeout and kill mechanisms
- Audit logging
US-8: Product Manager - Public Dashboard
As a Product Manager
I want a public compatibility dashboard
So that users can check server compatibility before adoption
Acceptance Criteria:
Given a user visits contextforge.io/compatibility
Then they should see:
- Searchable list of 100+ MCP servers
- Compatibility status per server (badge)
- Last tested date and ContextForge version
- Supported transports per server
- Link to detailed test results
- Historical compatibility trendTechnical Requirements:
- Static site generation (GitHub Pages)
- Automatic updates from CI artifacts
- Search and filter functionality
- Mobile-responsive design
- Badge embed codes
🏗 Architecture
System Overview
graph TB
subgraph "GitHub Actions"
GHA[GitHub Actions Runner]
Matrix[Matrix Strategy]
Parallel[Parallel Jobs x20]
end
subgraph "Test Infrastructure"
CF[ContextForge Test Instance]
Docker[Docker Containers]
Network[Isolated Network]
end
subgraph "MCP Servers Registry"
Registry[(registry.yaml)]
S1[Server 1: filesystem]
S2[Server 2: github]
S3[Server 3: postgres]
SN[Server N: ...]
end
subgraph "Test Phases"
P1[1. Registration]
P2[2. Discovery]
P3[3. Invocation]
P4[4. Validation]
end
subgraph "Outputs"
Report[Compatibility Report]
Matrix2[Compatibility Matrix]
Artifacts[CI Artifacts]
Dashboard[Public Dashboard]
end
GHA --> Matrix --> Parallel
Parallel --> Docker
Docker --> CF
Docker --> S1 & S2 & S3 & SN
Registry --> Docker
CF --> P1 --> P2 --> P3 --> P4
P4 --> Report --> Artifacts
Artifacts --> Matrix2
Artifacts --> Dashboard
Test Execution Flow
sequenceDiagram
participant GHA as GitHub Actions
participant Runner as Test Runner
participant CF as ContextForge
participant Server as MCP Server
participant Report as Reporter
GHA->>Runner: Trigger workflow (push/PR/schedule)
Runner->>Runner: Load registry.yaml
Runner->>Runner: Create test matrix
par Parallel Server Tests
Runner->>Server: docker run mcp-server-X
Server-->>Runner: Container ready
Runner->>CF: POST /gateways (register server)
alt Registration Success
CF-->>Runner: 201 Created
Runner->>CF: GET /tools (discovery)
CF-->>Runner: Tool list
loop For each sample tool
Runner->>CF: POST /tools/invoke
CF->>Server: Forward invocation
Server-->>CF: Tool result
CF-->>Runner: Response
Runner->>Runner: Validate response schema
end
Runner->>Report: Record SUCCESS
else Registration Failure
CF-->>Runner: Error response
Runner->>Report: Record FAILURE (phase: registration)
end
end
Report->>Report: Aggregate results
Report->>GHA: Upload artifacts
Report->>GHA: Set check status
alt Any Regressions
Report->>GHA: FAIL build
Report->>GHA: Post PR comment
else All Pass
Report->>GHA: PASS build
end
Server Registry Schema
# mcp-servers/registry.yaml
servers:
- name: "mcp-server-filesystem"
description: "File system operations"
repository: "https://github.com/modelcontextprotocol/servers"
package: "@modelcontextprotocol/server-filesystem"
install: "npx"
command: "npx -y @modelcontextprotocol/server-filesystem /tmp/test"
transports: [stdio]
category: "filesystem"
popularity: 95 # 0-100 score
tier: "core" # core | popular | community
maintainer: "anthropic"
test_config:
timeout_seconds: 30
sample_tools:
- name: "read_file"
args: { path: "/tmp/test/sample.txt" }
setup: "echo 'test content' > /tmp/test/sample.txt"
- name: "list_directory"
args: { path: "/tmp/test" }
expected_tools: ["read_file", "write_file", "list_directory"]
expected_resources: []
- name: "mcp-server-github"
description: "GitHub API integration"
repository: "https://github.com/modelcontextprotocol/servers"
package: "@modelcontextprotocol/server-github"
install: "npx"
command: "npx -y @modelcontextprotocol/server-github"
transports: [stdio]
category: "api"
popularity: 92
tier: "core"
env_vars:
GITHUB_TOKEN: "${{ secrets.TEST_GITHUB_TOKEN }}"
test_config:
timeout_seconds: 60
sample_tools:
- name: "search_repositories"
args: { query: "mcp language:python" }
expected_tools: ["search_repositories", "get_repository", "list_issues"]
- name: "mcp-server-postgres"
description: "PostgreSQL database queries"
repository: "https://github.com/modelcontextprotocol/servers"
package: "@modelcontextprotocol/server-postgres"
install: "npx"
command: "npx -y @modelcontextprotocol/server-postgres"
transports: [stdio]
category: "database"
popularity: 88
tier: "core"
requires_service: "postgres"
env_vars:
POSTGRES_URL: "postgresql://test:test@localhost:5432/testdb"
test_config:
timeout_seconds: 45
sample_tools:
- name: "query"
args: { sql: "SELECT 1 as test" }
expected_tools: ["query"]
# ... 97+ more serversCompatibility Report Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"metadata": {
"type": "object",
"properties": {
"contextforge_version": { "type": "string" },
"contextforge_commit": { "type": "string" },
"test_run_id": { "type": "string" },
"timestamp": { "type": "string", "format": "date-time" },
"duration_seconds": { "type": "number" },
"trigger": { "enum": ["push", "pull_request", "schedule", "manual"] },
"runner": { "type": "string" }
}
},
"summary": {
"type": "object",
"properties": {
"total_servers": { "type": "integer" },
"compatible": { "type": "integer" },
"partial": { "type": "integer" },
"incompatible": { "type": "integer" },
"skipped": { "type": "integer" },
"pass_rate": { "type": "number" }
}
},
"results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"server_name": { "type": "string" },
"server_version": { "type": "string" },
"transport": { "type": "string" },
"status": { "enum": ["compatible", "partial", "incompatible", "skipped"] },
"phases": {
"type": "object",
"properties": {
"registration": { "$ref": "#/definitions/phase_result" },
"discovery": { "$ref": "#/definitions/phase_result" },
"invocation": { "$ref": "#/definitions/phase_result" },
"validation": { "$ref": "#/definitions/phase_result" }
}
},
"duration_ms": { "type": "integer" },
"error": { "type": "string" },
"details": { "type": "object" }
}
}
},
"regressions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"server_name": { "type": "string" },
"previous_status": { "type": "string" },
"current_status": { "type": "string" },
"since_version": { "type": "string" }
}
}
}
},
"definitions": {
"phase_result": {
"type": "object",
"properties": {
"status": { "enum": ["pass", "fail", "skip"] },
"duration_ms": { "type": "integer" },
"error": { "type": "string" },
"details": { "type": "object" }
}
}
}
}📋 Implementation Tasks
Phase 1: Test Infrastructure Setup ✅
-
Create Test Directory Structure
- Create
tests/compatibility/directory - Create
tests/compatibility/conftest.pywith pytest fixtures - Create
tests/compatibility/test_runner.pymain test orchestrator - Create
tests/compatibility/server_launcher.pyfor container management - Create
tests/compatibility/report_generator.pyfor output generation
- Create
-
Server Registry Implementation
- Create
mcp-servers/registry.yamlwith schema - Add top 20 core MCP servers (Anthropic official)
- Add top 30 popular community servers
- Add remaining 50+ servers from ecosystem scan
- Implement registry validation script
- Create PR template for adding servers
- Create
-
Docker Test Environment
- Create
tests/compatibility/Dockerfile.testenv - Create
docker-compose.test.yamlfor services (postgres, redis) - Configure isolated network (no external egress)
- Set resource limits (CPU, memory, timeout)
- Implement container cleanup
- Create
Phase 2: Core Test Framework ✅
-
Test Phases Implementation
- Registration Phase: POST to /gateways, validate response
- Discovery Phase: GET /tools, /resources, /prompts
- Invocation Phase: POST /tools/invoke with sample args
- Validation Phase: Schema validation, response content checks
-
Server Launcher Module
- Parse registry.yaml server definitions
- Support multiple install methods: npx, pip, docker
- Handle environment variable injection (secrets)
- Implement health check polling
- Container lifecycle management (start, stop, cleanup)
- Timeout handling with graceful shutdown
-
ContextForge Test Client
- Create
tests/compatibility/cf_client.py - Gateway registration endpoint wrapper
- Tool discovery endpoint wrapper
- Tool invocation endpoint wrapper
- Error handling and retry logic
- Response validation utilities
- Create
-
Result Collection
- Define result data structures (Pydantic models)
- Implement per-server result aggregation
- Track timing metrics per phase
- Capture error messages and stack traces
- Payload logging with secret redaction
Phase 3: GitHub Actions Workflows ✅
-
Smoke Test Workflow (PR/Push)
- Create
.github/workflows/mcp-compatibility-smoke.yaml - Trigger on: push, pull_request
- Test top 20 servers only
- Target runtime: <5 minutes
- Matrix strategy: 4 parallel jobs
- Status check: "MCP Compatibility Smoke Test"
- Create
-
Full Suite Workflow (Nightly/Release)
- Create
.github/workflows/mcp-compatibility-full.yaml - Trigger on: schedule (nightly), release tags
- Test all 100+ servers
- Matrix strategy: 20 parallel jobs
- Estimated runtime: 15-30 minutes
- Upload compatibility report artifact
- Create
-
Workflow Utilities
- Baseline comparison script
- Regression detection logic
- PR comment generator
- Slack notification integration
- Artifact retention policy (90 days)
Phase 4: Reporting & Artifacts ✅
-
Report Generator
- JSON report with full details
- Markdown summary for PR comments
- HTML report for artifacts
- Compatibility matrix (CSV/JSON)
- Regression diff report
-
PR Integration
- Automatic PR comment with summary
- Status check with pass/fail
- Link to full report artifact
- Regression highlight in comment
-
Release Integration
- Attach compatibility matrix to releases
- Auto-update
docs/compatibility.md - Generate badge data
- Archive historical results
Phase 5: Server Population ✅
-
Core Tier (20 servers) - Anthropic official + critical
- mcp-server-filesystem
- mcp-server-github
- mcp-server-gitlab
- mcp-server-postgres
- mcp-server-sqlite
- mcp-server-memory
- mcp-server-brave-search
- mcp-server-google-drive
- mcp-server-slack
- mcp-server-puppeteer
- mcp-server-sequential-thinking
- mcp-server-fetch
- mcp-server-everart
- mcp-server-everything
- mcp-server-aws-kb-retrieval
- mcp-server-google-maps
- mcp-server-time
- mcp-server-sentry
- mcp-server-raygun
- mcp-server-git
-
Popular Tier (30 servers) - Community high-usage
- Scan npmjs.org for
mcp-server-*packages - Scan PyPI for MCP server packages
- Rank by downloads/stars
- Add top 30 with test configs
- Verify each server launches successfully
- Document any special requirements
- Scan npmjs.org for
-
Community Tier (50+ servers) - Broader ecosystem
- GitHub search for MCP servers
- Awesome-MCP list scan
- Add servers with basic test configs
- Mark experimental/alpha servers
- Allow community additions via PR
Phase 6: Advanced Features ✅
-
Transport Testing
- stdio transport tests
- SSE transport tests
- WebSocket transport tests
- HTTP Streamable transport tests
- Transport fallback validation
-
Version Compatibility
- Test against multiple server versions (latest, latest-1)
- Test against multiple CF versions (current, previous release)
- Version matrix in reports
- Breaking change detection
-
Performance Baselines
- Record response times per server
- Track performance regressions
- Set performance thresholds (warn/fail)
- Performance trend graphs
-
Chaos Testing (Optional)
- Network latency injection
- Packet loss simulation
- Server timeout scenarios
- Malformed response handling
Phase 7: Public Dashboard ✅
-
Dashboard Implementation
- Static site with compatibility matrix
- GitHub Pages deployment
- Auto-update from CI artifacts
- Search and filter functionality
- Server detail pages
-
Badges & Embeds
- shields.io compatible badge endpoint
- Per-server compatibility badges
- Overall pass rate badge
- Embed code generator
-
Historical Tracking
- Store results in time-series format
- Compatibility trend visualization
- Version-over-version comparison
- Regression timeline
Phase 8: Documentation & Testing ✅
-
Documentation
- README for tests/compatibility/
- Registry contribution guide
- Troubleshooting common failures
- CI workflow documentation
- Dashboard usage guide
-
Meta-Testing
- Unit tests for test framework
- Mock server for framework validation
- Report generator tests
- Workflow syntax validation
Phase 9: Quality & Polish ✅
-
Code Quality
- Run
make autoflake isort blackon test code - Type hints for all test modules
- Docstrings for public functions
- Pass
make verifychecks
- Run
-
Performance Optimization
- Parallel container startup
- Connection pooling for CF client
- Async test execution where possible
- Caching of server images
-
Reliability
- Retry logic for flaky servers
- Graceful handling of unavailable servers
- Timeout tuning per server category
- Self-healing container cleanup
⚙️ Configuration Examples
GitHub Actions Smoke Test
# .github/workflows/mcp-compatibility-smoke.yaml
name: MCP Compatibility Smoke Test
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
compatibility:
name: Test Top 20 MCP Servers
runs-on: ubuntu-latest
timeout-minutes: 10
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
strategy:
fail-fast: false
matrix:
server-batch: [1, 2, 3, 4] # 5 servers per batch
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install uv
uv pip install -e ".[dev]" --system
- name: Start ContextForge
run: |
make dev &
sleep 10 # Wait for startup
- name: Run compatibility tests
run: |
python -m pytest tests/compatibility/ \
--tier=core \
--batch=${{ matrix.server-batch }} \
--junitxml=results-${{ matrix.server-batch }}.xml
env:
TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }}
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: compatibility-results-${{ matrix.server-batch }}
path: results-*.xml
report:
needs: compatibility
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v4
- name: Generate report
run: python scripts/generate_compatibility_report.py
- name: Comment on PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const report = fs.readFileSync('compatibility-summary.md', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: report
});Full Suite Workflow
# .github/workflows/mcp-compatibility-full.yaml
name: MCP Compatibility Full Suite
on:
schedule:
- cron: '0 2 * * *' # 2 AM UTC daily
release:
types: [published]
workflow_dispatch:
jobs:
compatibility:
name: Test All MCP Servers
runs-on: ubuntu-latest
timeout-minutes: 45
strategy:
fail-fast: false
matrix:
server-batch: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # 10+ servers per batch
steps:
# ... similar to smoke test but with all tiers
- name: Run full compatibility tests
run: |
python -m pytest tests/compatibility/ \
--tier=all \
--batch=${{ matrix.server-batch }} \
--verbose \
--capture-payloads
publish:
needs: compatibility
runs-on: ubuntu-latest
steps:
- name: Generate compatibility matrix
run: python scripts/generate_matrix.py
- name: Update docs
run: |
cp compatibility-matrix.md docs/compatibility.md
git add docs/compatibility.md
git commit -m "docs: update compatibility matrix [skip ci]"
git push
- name: Upload to release
if: github.event_name == 'release'
uses: softprops/action-gh-release@v1
with:
files: |
compatibility-matrix.json
compatibility-matrix.md
compatibility-report.htmlTest Configuration
# tests/compatibility/conftest.py
import pytest
from pathlib import Path
import yaml
def pytest_addoption(parser):
parser.addoption("--tier", default="core", choices=["core", "popular", "community", "all"])
parser.addoption("--batch", type=int, default=1)
parser.addoption("--capture-payloads", action="store_true")
@pytest.fixture(scope="session")
def registry():
registry_path = Path(__file__).parent.parent.parent / "mcp-servers" / "registry.yaml"
with open(registry_path) as f:
return yaml.safe_load(f)
@pytest.fixture(scope="session")
def contextforge_client():
from tests.compatibility.cf_client import ContextForgeClient
return ContextForgeClient(base_url="http://localhost:8000")
@pytest.fixture
def server_launcher():
from tests.compatibility.server_launcher import ServerLauncher
launcher = ServerLauncher()
yield launcher
launcher.cleanup_all()✅ Success Criteria
- Coverage: 100+ MCP servers in registry with test configurations
- Speed: Smoke tests complete in <5 minutes; full suite in <30 minutes
- Reliability: <2% flaky test rate across runs
- Regression Detection: Catches 100% of compatibility regressions
- CI Integration: Status checks on all PRs; artifacts on releases
- Documentation: Complete registry contribution guide
- Reporting: JSON, Markdown, HTML reports generated automatically
- Public Visibility: Dashboard live with search and badges
- Security: Isolated containers with no external network access
- Maintainability: Easy to add new servers via PR
🏁 Definition of Done
-
tests/compatibility/directory with test framework -
mcp-servers/registry.yamlwith 100+ servers - Smoke test workflow running on all PRs
- Full suite workflow running nightly
- Compatibility matrix attached to releases
- PR comments with test summaries
-
docs/compatibility.mdauto-updated - Public dashboard deployed
- Contribution guide for adding servers
- Meta-tests for test framework passing
- Code passes
make verifychecks - <5 minute smoke test runtime achieved
- 95%+ pass rate on full suite
📝 Additional Notes
🔹 Server Categories:
- Core: Official MCP servers from Anthropic - must maintain 100% compatibility
- Popular: Top community servers by downloads - target 95%+ compatibility
- Community: Broader ecosystem - monitor and report
🔹 Test Phases:
- Registration: Can the server be registered with ContextForge?
- Discovery: Are tools/resources/prompts correctly discovered?
- Invocation: Do sample tool calls succeed?
- Validation: Do responses match expected schemas?
🔹 Failure Categories:
- Gateway Issue: ContextForge bug - needs fix
- Server Issue: Upstream server bug - document and notify
- Protocol Issue: MCP spec interpretation difference - clarify
- Environment Issue: CI/test infrastructure problem - flaky
🔹 Performance Budget:
- Server startup: <10 seconds
- Registration: <2 seconds
- Discovery: <5 seconds
- Invocation: <10 seconds per tool
🔹 Security Considerations:
- All servers run in isolated Docker containers
- No network egress allowed from test containers
- Secrets injected via GitHub encrypted secrets
- Payloads logged with automatic secret redaction
- Container images scanned for vulnerabilities
🔹 Ecosystem Benefits:
- Public compatibility matrix helps MCP adopters
- Regression alerts help server maintainers
- Badge system encourages compatibility
- Test configs serve as integration examples
🔗 Related Issues
- [BUG]: Few MCP servers are not supported - Error when adding gateway #2322 - JSON validation failures for gateways (motivating issue)
- #XXX - MCP protocol version negotiation
- #XXX - Transport auto-detection improvements
- #XXX - Schema validation enhancements