-
Notifications
You must be signed in to change notification settings - Fork 615
[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473
Copy link
Copy link
Open
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesperformancePerformance related itemsPerformance related itemsreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)
Milestone
Description
[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks
Goal
Produce a comprehensive manual test plan for validating gateway performance under expected and peak production loads including sustained traffic, spike handling, and resource consumption analysis.
Why Now?
Performance testing is critical for production readiness:
- Capacity Planning: Need baseline metrics to size deployments
- SLA Validation: Must verify P50/P95/P99 latency targets
- Breaking Points: Identify limits before users discover them
- Resource Efficiency: Optimize CPU, memory, and connection usage
- Regression Prevention: Establish benchmarks for future releases
User Stories
US-1: DevOps - Capacity Planning
As a DevOps engineer
I want performance benchmarks for the gateway
So that I can properly size production deployments
Acceptance Criteria:
Feature: Performance Benchmarking
Scenario: Establish baseline throughput
Given a standard gateway deployment
When I run sustained load at 1000 req/s
Then response times should be stable
And P95 latency should be under 200ms
And no errors should occur
Scenario: Find breaking point
Given a standard gateway deployment
When I incrementally increase load
Then I identify the maximum sustainable throughput
And document resource consumption at each levelUS-2: SRE - SLA Validation
As an SRE
I want to validate latency SLAs under load
So that I can guarantee service level objectives
Acceptance Criteria:
Feature: SLA Validation
Scenario: Validate P99 latency under load
Given production-equivalent load (500 req/s)
When I measure response times for 1 hour
Then P50 should be under 50ms
And P95 should be under 150ms
And P99 should be under 300msArchitecture
LOAD TESTING ARCHITECTURE
+------------------------------------------------------------------------+
| |
| Load Generator Gateway Cluster Backend Services |
| -------------- --------------- ---------------- |
| |
| +-----------+ +---------------+ +-----------+ |
| | k6 / | ------> | Gateway 1 | -------> | MCP Server| |
| | Locust | +---------------+ +-----------+ |
| +-----------+ | |
| | v |
| | +---------------+ +-----------+ |
| +---------------> | Gateway 2 | -------> | MCP Server| |
| | +---------------+ +-----------+ |
| | | |
| v v |
| +-----------+ +---------------+ |
| | Metrics | <------- | Prometheus | |
| | Collector | +---------------+ |
| +-----------+ |
| |
+------------------------------------------------------------------------+
Test Types:
- Load Test: Sustained traffic at expected levels
- Stress Test: Beyond capacity to find limits
- Spike Test: Sudden traffic surge
- Soak Test: Extended duration for memory leaks
Test Environment Setup
# Environment variables
export GATEWAY_URL="http://localhost:8000"
export TARGET_RPS=1000
export DURATION="1h"
# Install k6 load testing tool
brew install k6 # macOS
# or: snap install k6 # Linux
# Start gateway with production-like config
export DATABASE_URL="postgresql://user:pass@localhost/gateway"
export REDIS_URL="redis://localhost:6379"
export WORKERS=4
make serve
# Start test MCP server for consistent responses
python -m mcpgateway.translate --stdio "uvx mcp-server-time" --port 9000
curl -s -X POST "$GATEWAY_URL/gateways" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "perf-test-server", "url": "http://localhost:9000"}'Manual Test Cases
| Case | Scenario | Load Profile | Duration | Expected Result |
|---|---|---|---|---|
| PERF-01 | Baseline throughput | 100 req/s constant | 10 min | Establish baseline metrics |
| PERF-02 | Sustained load | 1000 req/s constant | 1 hour | Stable latency, 0% errors |
| PERF-03 | Stress test | Ramp 100->5000 req/s | 30 min | Find breaking point |
| PERF-04 | Spike test | 100->2000->100 req/s | 15 min | Recover within 30s |
| PERF-05 | Soak test | 500 req/s constant | 24 hours | No memory leak |
| PERF-06 | Connection saturation | 10000 concurrent | 5 min | Graceful rejection |
| PERF-07 | Tool invocation load | 100 concurrent tools/call | 10 min | P95 < 500ms |
PERF-01: Baseline Throughput Test
k6 Script (baseline.js):
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
baseline: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 50,
maxVUs: 100,
},
},
thresholds: {
http_req_duration: ['p(95)<200'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.post(
`${__ENV.GATEWAY_URL}/mcp/http`,
JSON.stringify({
jsonrpc: '2.0',
id: 1,
method: 'tools/list',
params: {},
}),
{
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${__ENV.TOKEN}`,
},
}
);
check(res, {
'status is 200': (r) => r.status === 200,
'has result': (r) => JSON.parse(r.body).result !== undefined,
});
}Run Command:
k6 run --env GATEWAY_URL=$GATEWAY_URL --env TOKEN=$TOKEN baseline.jsExpected Results:
- All requests complete successfully
- P50 latency recorded
- P95 latency recorded
- P99 latency recorded
- Throughput confirmed at 100 req/s
PERF-02: Sustained Load Test
k6 Script (sustained.js):
export const options = {
scenarios: {
sustained: {
executor: 'constant-arrival-rate',
rate: 1000,
timeUnit: '1s',
duration: '1h',
preAllocatedVUs: 200,
maxVUs: 500,
},
},
thresholds: {
http_req_duration: ['p(95)<200', 'p(99)<500'],
http_req_failed: ['rate<0.001'],
},
};Validation:
# Monitor during test
watch -n 5 'curl -s "$GATEWAY_URL/health" | jq .'
# Check Prometheus metrics
curl -s "$GATEWAY_URL/metrics" | grep http_request_durationExpected Results:
- Error rate < 0.1%
- P95 latency stable throughout
- No degradation over time
- Memory usage stable
PERF-03: Stress Test (Find Breaking Point)
k6 Script (stress.js):
export const options = {
stages: [
{ duration: '2m', target: 100 }, // Warm up
{ duration: '5m', target: 500 }, // Ramp to 500
{ duration: '5m', target: 1000 }, // Ramp to 1000
{ duration: '5m', target: 2000 }, // Ramp to 2000
{ duration: '5m', target: 3000 }, // Ramp to 3000
{ duration: '5m', target: 5000 }, // Stress point
{ duration: '3m', target: 0 }, // Recovery
],
};Metrics to Record:
| RPS Level | P95 Latency | Error Rate | CPU % | Memory MB |
|---|---|---|---|---|
| 500 | ||||
| 1000 | ||||
| 2000 | ||||
| 3000 | ||||
| 5000 |
PERF-05: Soak Test (24-Hour)
k6 Script (soak.js):
export const options = {
scenarios: {
soak: {
executor: 'constant-arrival-rate',
rate: 500,
timeUnit: '1s',
duration: '24h',
preAllocatedVUs: 100,
maxVUs: 200,
},
},
};Memory Leak Detection:
# Record memory every hour
for i in {1..24}; do
echo "Hour $i: $(ps -o rss= -p $(pgrep -f mcpgateway) | awk '{print $1/1024 " MB"}')"
sleep 3600
doneExpected Results:
- Memory growth < 10% over 24 hours
- No connection leaks
- Latency remains stable
Test Matrix
| Scenario | Protocol | Concurrency | Duration | Pass Criteria |
|---|---|---|---|---|
| Baseline | HTTP | 50 VUs | 10m | Establish metrics |
| Sustained | HTTP | 200 VUs | 1h | P95 < 200ms, 0% errors |
| Stress | HTTP | 500 VUs | 30m | Find limit |
| Spike | HTTP | 100-500 VUs | 15m | Recover < 30s |
| Soak | HTTP | 100 VUs | 24h | No memory leak |
| SSE | SSE | 1000 connections | 1h | Stable streams |
| WebSocket | WS | 500 connections | 1h | No drops |
| Tool calls | HTTP | 100 concurrent | 10m | P95 < 500ms |
Success Criteria
- Baseline performance documented (P50, P95, P99 at 100 req/s)
- Sustained load (1000 req/s, 1 hour) passes with < 0.1% errors
- Breaking point identified and documented
- Spike recovery occurs within 30 seconds
- 24-hour soak test shows no memory leaks
- Resource sizing guidelines created (CPU/memory per RPS)
- All latency SLAs validated
Related Files
mcpgateway/main.py- Application entry pointmcpgateway/middleware/- Request processing pipelinemcpgateway/routers/mcp.py- MCP endpoints under testdeployment/k6/- Load test scripts (if exists)
Related Issues
- [TESTING][OBSERVABILITY]: Metrics Accuracy, Tracing Completeness, and Dashboard Validation #2476 - Observability and metrics accuracy
- [TESTING][FUNCTIONALITY]: Metrics system manual test plan (buffering, rollup, cleanup, queries) #2450 - Metrics system
- [TESTING][FUNCTIONALITY]: Observability manual test plan (metrics, logging, tracing, health) #2435 - Observability
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesperformancePerformance related itemsPerformance related itemsreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)