[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks

# [TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks

## Goal

Produce a **comprehensive manual test plan** for validating gateway performance under expected and peak production loads including sustained traffic, spike handling, and resource consumption analysis.

## Why Now?

Performance testing is critical for production readiness:

1. **Capacity Planning**: Need baseline metrics to size deployments
2. **SLA Validation**: Must verify P50/P95/P99 latency targets
3. **Breaking Points**: Identify limits before users discover them
4. **Resource Efficiency**: Optimize CPU, memory, and connection usage
5. **Regression Prevention**: Establish benchmarks for future releases

---

## User Stories

<details>
<summary>US-1: DevOps - Capacity Planning</summary>

**As a** DevOps engineer
**I want** performance benchmarks for the gateway
**So that** I can properly size production deployments

**Acceptance Criteria:**

```gherkin
Feature: Performance Benchmarking

 Scenario: Establish baseline throughput
 Given a standard gateway deployment
 When I run sustained load at 1000 req/s
 Then response times should be stable
 And P95 latency should be under 200ms
 And no errors should occur

 Scenario: Find breaking point
 Given a standard gateway deployment
 When I incrementally increase load
 Then I identify the maximum sustainable throughput
 And document resource consumption at each level
```

</details>

<details>
<summary>US-2: SRE - SLA Validation</summary>

**As an** SRE
**I want** to validate latency SLAs under load
**So that** I can guarantee service level objectives

**Acceptance Criteria:**

```gherkin
Feature: SLA Validation

 Scenario: Validate P99 latency under load
 Given production-equivalent load (500 req/s)
 When I measure response times for 1 hour
 Then P50 should be under 50ms
 And P95 should be under 150ms
 And P99 should be under 300ms
```

</details>

---

## Architecture

```
 LOAD TESTING ARCHITECTURE
+------------------------------------------------------------------------+
| |
| Load Generator Gateway Cluster Backend Services |
| -------------- --------------- ---------------- |
| |
| +-----------+ +---------------+ +-----------+ |
| | k6 / | ------> | Gateway 1 | -------> | MCP Server| |
| | Locust | +---------------+ +-----------+ |
| +-----------+ | |
| | v |
| | +---------------+ +-----------+ |
| +---------------> | Gateway 2 | -------> | MCP Server| |
| | +---------------+ +-----------+ |
| | | |
| v v |
| +-----------+ +---------------+ |
| | Metrics | <------- | Prometheus | |
| | Collector | +---------------+ |
| +-----------+ |
| |
+------------------------------------------------------------------------+

Test Types:
- Load Test: Sustained traffic at expected levels
- Stress Test: Beyond capacity to find limits
- Spike Test: Sudden traffic surge
- Soak Test: Extended duration for memory leaks
```

---

## Test Environment Setup

```bash
# Environment variables
export GATEWAY_URL="http://localhost:8000"
export TARGET_RPS=1000
export DURATION="1h"

# Install k6 load testing tool
brew install k6 # macOS
# or: snap install k6 # Linux

# Start gateway with production-like config
export DATABASE_URL="postgresql://user:pass@localhost/gateway"
export REDIS_URL="redis://localhost:6379"
export WORKERS=4
make serve

# Start test MCP server for consistent responses
python -m mcpgateway.translate --stdio "uvx mcp-server-time" --port 9000
curl -s -X POST "$GATEWAY_URL/gateways" \
 -H "Authorization: Bearer $TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"name": "perf-test-server", "url": "http://localhost:9000"}'
```

---

## Manual Test Cases

| Case | Scenario | Load Profile | Duration | Expected Result |
|------|----------|--------------|----------|-----------------|
| PERF-01 | Baseline throughput | 100 req/s constant | 10 min | Establish baseline metrics |
| PERF-02 | Sustained load | 1000 req/s constant | 1 hour | Stable latency, 0% errors |
| PERF-03 | Stress test | Ramp 100->5000 req/s | 30 min | Find breaking point |
| PERF-04 | Spike test | 100->2000->100 req/s | 15 min | Recover within 30s |
| PERF-05 | Soak test | 500 req/s constant | 24 hours | No memory leak |
| PERF-06 | Connection saturation | 10000 concurrent | 5 min | Graceful rejection |
| PERF-07 | Tool invocation load | 100 concurrent tools/call | 10 min | P95 < 500ms |

---

<details>
<summary>PERF-01: Baseline Throughput Test</summary>

**k6 Script (baseline.js):**
```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
 scenarios: {
 baseline: {
 executor: 'constant-arrival-rate',
 rate: 100,
 timeUnit: '1s',
 duration: '10m',
 preAllocatedVUs: 50,
 maxVUs: 100,
 },
 },
 thresholds: {
 http_req_duration: ['p(95)<200'],
 http_req_failed: ['rate<0.01'],
 },
};

export default function () {
 const res = http.post(
 `${__ENV.GATEWAY_URL}/mcp/http`,
 JSON.stringify({
 jsonrpc: '2.0',
 id: 1,
 method: 'tools/list',
 params: {},
 }),
 {
 headers: {
 'Content-Type': 'application/json',
 'Authorization': `Bearer ${__ENV.TOKEN}`,
 },
 }
 );

 check(res, {
 'status is 200': (r) => r.status === 200,
 'has result': (r) => JSON.parse(r.body).result !== undefined,
 });
}
```

**Run Command:**
```bash
k6 run --env GATEWAY_URL=$GATEWAY_URL --env TOKEN=$TOKEN baseline.js
```

**Expected Results:**
- All requests complete successfully
- P50 latency recorded
- P95 latency recorded
- P99 latency recorded
- Throughput confirmed at 100 req/s

</details>

<details>
<summary>PERF-02: Sustained Load Test</summary>

**k6 Script (sustained.js):**
```javascript
export const options = {
 scenarios: {
 sustained: {
 executor: 'constant-arrival-rate',
 rate: 1000,
 timeUnit: '1s',
 duration: '1h',
 preAllocatedVUs: 200,
 maxVUs: 500,
 },
 },
 thresholds: {
 http_req_duration: ['p(95)<200', 'p(99)<500'],
 http_req_failed: ['rate<0.001'],
 },
};
```

**Validation:**
```bash
# Monitor during test
watch -n 5 'curl -s "$GATEWAY_URL/health" | jq .'

# Check Prometheus metrics
curl -s "$GATEWAY_URL/metrics" | grep http_request_duration
```

**Expected Results:**
- Error rate < 0.1%
- P95 latency stable throughout
- No degradation over time
- Memory usage stable

</details>

<details>
<summary>PERF-03: Stress Test (Find Breaking Point)</summary>

**k6 Script (stress.js):**
```javascript
export const options = {
 stages: [
 { duration: '2m', target: 100 }, // Warm up
 { duration: '5m', target: 500 }, // Ramp to 500
 { duration: '5m', target: 1000 }, // Ramp to 1000
 { duration: '5m', target: 2000 }, // Ramp to 2000
 { duration: '5m', target: 3000 }, // Ramp to 3000
 { duration: '5m', target: 5000 }, // Stress point
 { duration: '3m', target: 0 }, // Recovery
 ],
};
```

**Metrics to Record:**
| RPS Level | P95 Latency | Error Rate | CPU % | Memory MB |
|-----------|-------------|------------|-------|-----------|
| 500 | | | | |
| 1000 | | | | |
| 2000 | | | | |
| 3000 | | | | |
| 5000 | | | | |

</details>

<details>
<summary>PERF-05: Soak Test (24-Hour)</summary>

**k6 Script (soak.js):**
```javascript
export const options = {
 scenarios: {
 soak: {
 executor: 'constant-arrival-rate',
 rate: 500,
 timeUnit: '1s',
 duration: '24h',
 preAllocatedVUs: 100,
 maxVUs: 200,
 },
 },
};
```

**Memory Leak Detection:**
```bash
# Record memory every hour
for i in {1..24}; do
 echo "Hour $i: $(ps -o rss= -p $(pgrep -f mcpgateway) | awk '{print $1/1024 " MB"}')"
 sleep 3600
done
```

**Expected Results:**
- Memory growth < 10% over 24 hours
- No connection leaks
- Latency remains stable

</details>

---

## Test Matrix

| Scenario | Protocol | Concurrency | Duration | Pass Criteria |
|----------|----------|-------------|----------|---------------|
| Baseline | HTTP | 50 VUs | 10m | Establish metrics |
| Sustained | HTTP | 200 VUs | 1h | P95 < 200ms, 0% errors |
| Stress | HTTP | 500 VUs | 30m | Find limit |
| Spike | HTTP | 100-500 VUs | 15m | Recover < 30s |
| Soak | HTTP | 100 VUs | 24h | No memory leak |
| SSE | SSE | 1000 connections | 1h | Stable streams |
| WebSocket | WS | 500 connections | 1h | No drops |
| Tool calls | HTTP | 100 concurrent | 10m | P95 < 500ms |

---

## Success Criteria

- [ ] Baseline performance documented (P50, P95, P99 at 100 req/s)
- [ ] Sustained load (1000 req/s, 1 hour) passes with < 0.1% errors
- [ ] Breaking point identified and documented
- [ ] Spike recovery occurs within 30 seconds
- [ ] 24-hour soak test shows no memory leaks
- [ ] Resource sizing guidelines created (CPU/memory per RPS)
- [ ] All latency SLAs validated

---

## Related Files

- `mcpgateway/main.py` - Application entry point
- `mcpgateway/middleware/` - Request processing pipeline
- `mcpgateway/routers/mcp.py` - MCP endpoints under test
- `deployment/k6/` - Load test scripts (if exists)

---

## Related Issues

- #2476 - Observability and metrics accuracy
- #2450 - Metrics system
- #2435 - Observability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks

Goal

Why Now?

User Stories

Architecture

Test Environment Setup

Manual Test Cases

Test Matrix

Success Criteria

Related Files

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Case	Scenario	Load Profile	Duration	Expected Result
PERF-01	Baseline throughput	100 req/s constant	10 min	Establish baseline metrics
PERF-02	Sustained load	1000 req/s constant	1 hour	Stable latency, 0% errors
PERF-03	Stress test	Ramp 100->5000 req/s	30 min	Find breaking point
PERF-04	Spike test	100->2000->100 req/s	15 min	Recover within 30s
PERF-05	Soak test	500 req/s constant	24 hours	No memory leak
PERF-06	Connection saturation	10000 concurrent	5 min	Graceful rejection
PERF-07	Tool invocation load	100 concurrent tools/call	10 min	P95 < 500ms

Scenario	Protocol	Concurrency	Duration	Pass Criteria
Baseline	HTTP	50 VUs	10m	Establish metrics
Sustained	HTTP	200 VUs	1h	P95 < 200ms, 0% errors
Stress	HTTP	500 VUs	30m	Find limit
Spike	HTTP	100-500 VUs	15m	Recover < 30s
Soak	HTTP	100 VUs	24h	No memory leak
SSE	SSE	1000 connections	1h	Stable streams
WebSocket	WS	500 connections	1h	No drops
Tool calls	HTTP	100 concurrent	10m	P95 < 500ms

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473

Description

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks

Goal

Why Now?

User Stories

Architecture

Test Environment Setup

Manual Test Cases

Test Matrix

Success Criteria

Related Files

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions