[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline

# [TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline

## Goal

Establish a **JMeter-based performance testing baseline** that complements the existing Locust framework. JMeter provides industry-standard testing capabilities, enterprise tooling integration, advanced correlation support, and standardized reporting formats (JTL, CSV, HTML dashboards) that integrate with CI/CD pipelines and APM tools.

## Why Now?

While Locust provides excellent Python-based load testing, JMeter adds complementary value:

1. **Enterprise Integration**: JMeter integrates with enterprise APM tools (Dynatrace, AppDynamics, New Relic)
2. **Standardized Reporting**: JTL/CSV formats are industry standard for performance dashboards
3. **Non-Developer Access**: GUI mode enables QA teams without Python experience to run tests
4. **Correlation Engine**: Advanced response extraction for complex MCP workflows
5. **Distributed Testing**: Native distributed mode with remote worker coordination
6. **Protocol Coverage**: Built-in support for HTTP, WebSocket, TCP, and custom samplers

---

## User Stories

<details>
<summary>US-1: QA Engineer - Baseline Performance Metrics</summary>

**As a** QA Engineer
**I want** JMeter test plans that establish performance baselines
**So that** I can detect performance regressions in CI/CD pipelines

**Acceptance Criteria:**

```gherkin
Feature: JMeter Baseline Testing

 Scenario: Establish REST API baseline
 Given the gateway is deployed with standard configuration
 When I execute the REST API baseline test plan
 Then response times are recorded in JTL format
 And P50, P95, P99 latencies are calculated
 And results can be compared against thresholds

 Scenario: Establish MCP protocol baseline
 Given MCP servers are registered with the gateway
 When I execute the MCP baseline test plan
 Then JSON-RPC response times are captured
 And tool invocation latencies are measured
 And results include error categorization
```

</details>

<details>
<summary>US-2: DevOps - CI/CD Integration</summary>

**As a** DevOps Engineer
**I want** JMeter tests that run in headless mode
**So that** I can integrate performance testing into GitHub Actions

**Acceptance Criteria:**

```gherkin
Feature: CI/CD Integration

 Scenario: Run JMeter in non-GUI mode
 Given JMeter test plans are available
 When I execute 'jmeter -n -t testplan.jmx -l results.jtl'
 Then tests complete without GUI dependencies
 And exit code reflects pass/fail status
 And HTML report is generated automatically

 Scenario: Performance gate in CI
 Given baseline metrics are established
 When current test results exceed thresholds
 Then CI pipeline fails with specific metrics
 And comparison report highlights regressions
```

</details>

<details>
<summary>US-3: SRE - Observability During Load Tests</summary>

**As an** SRE
**I want** comprehensive monitoring during JMeter tests
**So that** I can correlate performance metrics with system behavior

**Acceptance Criteria:**

```gherkin
Feature: Observability Integration

 Scenario: Monitor during load test
 Given monitoring stack is running (Prometheus, Grafana, Loki)
 When I execute a JMeter load test
 Then Grafana shows real-time metrics
 And I can correlate JMeter results with:
 | Metric | Source |
 | HTTP request latency | Gateway |
 | Database query time | PostgreSQL |
 | Connection pool usage | PgBouncer |
 | Container CPU/Memory | cAdvisor |
 | MCP server latency | fast_time_server |
 | Echo tool performance | fast_test_server |

 Scenario: Aggregate logs during test
 Given Loki log aggregation is running
 When tests execute
 Then logs are collected via Promtail
 And Grafana shows correlated logs with metrics
```

</details>

<details>
<summary>US-4: Performance Engineer - Protocol-Specific Testing</summary>

**As a** Performance Engineer
**I want** test plans for each gateway protocol
**So that** I can identify protocol-specific bottlenecks

**Acceptance Criteria:**

```gherkin
Feature: Protocol Coverage

 Scenario Outline: Test protocol performance
 Given the gateway supports <protocol>
 When I run the <protocol> test plan
 Then baseline metrics are captured
 And protocol-specific errors are categorized

 Examples:
 | protocol | test_plan |
 | REST HTTP | rest_api_baseline.jmx |
 | MCP JSON-RPC | mcp_jsonrpc_baseline.jmx |
 | SSE Streaming | sse_streaming_baseline.jmx |
 | WebSocket | websocket_baseline.jmx |
 | Admin UI (HTMX) | admin_ui_baseline.jmx |
```

</details>

---

## Architecture

```
 JMETER LOAD TESTING ARCHITECTURE
+-------------------------------------------------------------------------+
| |
| JMeter Controller Gateway Stack Observability |
| ----------------- ------------- ------------- |
| |
| +--------------+ +---------------+ +-----------+ |
| | JMeter | HTTP | nginx | | Prometheus| |
| | Controller |--------->| (port 8080) | | (9090) | |
| +--------------+ +-------+-------+ +-----------+ |
| | | | |
| v v v |
| +--------------+ +---------------+ +-----------+ |
| | JMeter | | Gateway | | Grafana | |
| | Workers |--------->| Replicas (3) | | (3000) | |
| | (Distributed)| +-------+-------+ +-----------+ |
| +--------------+ | ^ |
| | +-----+-----+ | |
| v | | +-----------+ |
| +--------------+ v v | Loki + | |
| | Results | +-----------+ +-----------+ | Promtail | |
| | Collector | | PgBouncer | | Redis | +-----------+ |
| +--------------+ | (6432) | | (6379) | ^ |
| | +-----+-----+ +-----------+ | |
| v | +-----------+ |
| +--------------+ v | cAdvisor | |
| | JTL/CSV | +-----------+ | (8081) | |
| | Reports | | PostgreSQL| +-----------+ |
| +--------------+ | (5433) | |
| +-----------+ |
| |
| MCP Test Servers (for load testing MCP protocol) |
| ------------------------------------------------ |
| +------------------+ +------------------+ +------------------+ |
| | fast_time_server | | fast_test_server | | benchmark_server | |
| | (Go, port 8888) | | (Rust, port 8880)| | (Go, 9000-9099) | |
| | Tools: | | Tools: | | Tools: | |
| | - get_system_time| | - echo | | - Lightweight | |
| | - timezones | | - get_system_time| | MCP servers | |
| | | | - get_stats | | - Multi-instance | |
| +------------------+ +------------------+ +------------------+ |
| |
| Exporters: postgres_exporter (9187), redis_exporter (9121), |
| pgbouncer_exporter (9127), nginx_exporter (9113) |
| |
+-------------------------------------------------------------------------+

Test Profiles:
- Baseline: 1,000 req/s sustained for metric establishment
- Load: 4,000 req/s sustained, SLA validation
- Stress: Ramp to breaking point (10,000+ req/s)
- Spike: 1,000 → 10,000 → 1,000 req/s recovery test
- Soak: 2,000 req/s for 24 hours (memory leak detection)
- Protocol: Per-protocol (REST, MCP, SSE, WebSocket) isolation
```

---

## Test Environment Setup

### Prerequisites

```bash
# Install JMeter (macOS)
brew install jmeter

# Install JMeter (Linux)
wget https://dlcdn.apache.org/jmeter/binaries/apache-jmeter-5.6.3.tgz
tar -xzf apache-jmeter-5.6.3.tgz
export PATH=$PATH:$(pwd)/apache-jmeter-5.6.3/bin

# Install required plugins
jmeter -p plugins-manager.sh install jpgc-casutg,jpgc-tst,jpgc-graphs-basic

# Verify installation
jmeter --version
```

### Environment Configuration

```bash
# Core configuration
export GATEWAY_URL="http://localhost:8080"
export GATEWAY_ADMIN_URL="http://localhost:8080/admin"
export MCP_RPC_ENDPOINT="/rpc"
export MCPGATEWAY_BEARER_TOKEN=$(python -m mcpgateway.utils.create_jwt_token \
 --username admin@example.com --exp 10080 --secret $JWT_SECRET_KEY)

# Test parameters
export JMETER_THREADS=100 # Concurrent users
export JMETER_RAMP_UP=30 # Seconds to reach full load
export JMETER_DURATION=300 # Test duration in seconds
export JMETER_TARGET_RPS=4000 # Target requests per second

# Monitoring endpoints
export PROMETHEUS_URL="http://localhost:9090"
export GRAFANA_URL="http://localhost:3000"

# MCP Test Server endpoints
export FAST_TIME_SERVER_URL="http://localhost:8888"
export FAST_TEST_SERVER_URL="http://localhost:8880"
export BENCHMARK_SERVER_URL="http://localhost:9000"
```

### Start Test Environment

```bash
# Option 1: Standard stack with monitoring (3 gateway replicas)
make monitoring-up

# Option 2: Performance stack (7 replicas + PostgreSQL replica)
make performance-up

# Option 3: Start with benchmark servers
docker compose --profile benchmark up -d

# Verify services are running
curl -s $GATEWAY_URL/health | jq .
curl -s $PROMETHEUS_URL/-/healthy
curl -s $GRAFANA_URL/api/health | jq .

# Verify MCP test servers
curl -s $FAST_TIME_SERVER_URL/health
curl -s $FAST_TEST_SERVER_URL/health
```

---

## Test Plans (Stories)

### Story 1: REST API Baseline (`rest_api_baseline.jmx`)

| Parameter | Value | Description |
|-----------|-------|-------------|
| Threads | 100 | Concurrent virtual users |
| Ramp-up | 60s | Time to spawn all users |
| Duration | 10m | Test duration |
| Target RPS | 1,000 | Baseline requests per second |

**Endpoints Tested:**

| Endpoint | Weight | Method | Expected Response |
|----------|--------|--------|-------------------|
| `/health` | 20% | GET | 200, JSON |
| `/tools` | 25% | GET | 200, JSON array |
| `/servers` | 20% | GET | 200, JSON array |
| `/gateways` | 15% | GET | 200, JSON array |
| `/resources` | 10% | GET | 200, JSON array |
| `/prompts` | 10% | GET | 200, JSON array |

**Run Command:**
```bash
jmeter -n -t tests/jmeter/rest_api_baseline.jmx \
 -JGATEWAY_URL=$GATEWAY_URL \
 -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
 -JTHREADS=100 -JRAMP_UP=60 -JDURATION=600 \
 -l results/rest_baseline_$(date +%Y%m%d_%H%M%S).jtl \
 -e -o results/rest_baseline_report/
```

---

### Story 2: MCP JSON-RPC Baseline (`mcp_jsonrpc_baseline.jmx`)

| Parameter | Value | Description |
|-----------|-------|-------------|
| Threads | 200 | Concurrent MCP clients |
| Ramp-up | 60s | Gradual client addition |
| Duration | 15m | Extended test for stability |
| Target RPS | 1,000 | Baseline JSON-RPC operations/second |

**MCP Methods Tested:**

| Method | Weight | Parameters | Validation |
|--------|--------|------------|------------|
| `tools/list` | 25% | `{}` | Has `tools` array |
| `tools/call` | 30% | `{name, arguments}` | Has `content` |
| `resources/list` | 15% | `{}` | Has `resources` array |
| `resources/read` | 10% | `{uri}` | Has `contents` |
| `prompts/list` | 10% | `{}` | Has `prompts` array |
| `initialize` | 5% | `{protocolVersion, capabilities}` | Has `serverInfo` |
| `ping` | 5% | `{}` | Empty result |

**JSON-RPC Request Template:**
```json
{
 "jsonrpc": "2.0",
 "id": "${__UUID}",
 "method": "${method}",
 "params": ${params}
}
```

**Run Command:**
```bash
jmeter -n -t tests/jmeter/mcp_jsonrpc_baseline.jmx \
 -JGATEWAY_URL=$GATEWAY_URL \
 -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
 -JTHREADS=200 -JRAMP_UP=60 -JDURATION=900 \
 -l results/mcp_baseline_$(date +%Y%m%d_%H%M%S).jtl \
 -e -o results/mcp_baseline_report/
```

---

### Story 3: MCP Test Server Baseline (`mcp_test_servers_baseline.jmx`)

Tests the MCP test servers directly for establishing performance ceilings.

| Server | Port | Tools | Description |
|--------|------|-------|-------------|
| fast_time_server | 8888 | `get_system_time` | Go-based time service |
| fast_test_server | 8880 | `echo`, `get_system_time`, `get_stats` | Rust-based test server |
| benchmark_server | 9000-9099 | Various | Multi-instance benchmark |

**Run Command:**
```bash
# Test fast_time_server directly
jmeter -n -t tests/jmeter/mcp_test_servers_baseline.jmx \
 -JFAST_TIME_URL=$FAST_TIME_SERVER_URL \
 -JFAST_TEST_URL=$FAST_TEST_SERVER_URL \
 -JTHREADS=200 -JDURATION=600 \
 -l results/mcp_servers_$(date +%Y%m%d_%H%M%S).jtl \
 -e -o results/mcp_servers_report/
```

---

### Story 4: Load Test (`load_test.jmx`)

| Parameter | Value | Description |
|-----------|-------|-------------|
| Threads | 400 | Concurrent users |
| Ramp-up | 120s | Gradual ramp |
| Duration | 30m | Sustained load |
| Target RPS | 4,000 | Production load simulation |

**Run Command:**
```bash
jmeter -n -t tests/jmeter/load_test.jmx \
 -JGATEWAY_URL=$GATEWAY_URL \
 -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
 -JTHREADS=400 -JRAMP_UP=120 -JDURATION=1800 \
 -l results/load_test_$(date +%Y%m%d_%H%M%S).jtl \
 -e -o results/load_test_report/
```

---

### Story 5: Stress Test (`stress_test.jmx`)

| Phase | Duration | Users | Target |
|-------|----------|-------|--------|
| Warm-up | 2m | 200 | 1,000 RPS |
| Ramp 1 | 5m | 400 | 2,000 RPS |
| Ramp 2 | 5m | 800 | 4,000 RPS |
| Ramp 3 | 5m | 1,200 | 6,000 RPS |
| Ramp 4 | 5m | 1,600 | 8,000 RPS |
| Peak | 5m | 2,000 | 10,000 RPS |
| Recovery | 3m | 200 | 1,000 RPS |

**Metrics to Capture at Each Level:**

| Level | P50 (ms) | P95 (ms) | P99 (ms) | Error % | CPU % | Memory MB |
|-------|----------|----------|----------|---------|-------|-----------|
| 2,000 RPS | | | | | | |
| 4,000 RPS | | | | | | |
| 6,000 RPS | | | | | | |
| 8,000 RPS | | | | | | |
| 10,000 RPS | | | | | | |

---

### Story 6: Spike Test (`spike_test.jmx`)

| Phase | Duration | Users | Target | Description |
|-------|----------|-------|--------|-------------|
| Baseline | 2m | 200 | 1,000 RPS | Normal operation |
| Spike | 30s | 2,000 | 10,000 RPS | 10x traffic surge |
| Sustain | 2m | 2,000 | 10,000 RPS | Peak hold |
| Recovery | 2m | 200 | 1,000 RPS | Return to baseline |
| Verify | 2m | 200 | 1,000 RPS | Stability check |

**Recovery Criteria:**
- P95 latency returns to baseline within 30 seconds
- Error rate drops to <0.1% within 60 seconds
- No connection pool exhaustion

---

### Story 7: Soak Test (`soak_test.jmx`)

| Parameter | Value | Purpose |
|-----------|-------|---------|
| Duration | 24 hours | Memory leak detection |
| Users | 400 | Sustained load |
| Target RPS | 2,000 | Steady throughput |
| Sample Interval | 1 hour | Metric snapshots |

**Hourly Metrics Collection:**

| Hour | Memory (MB) | Connections | P95 (ms) | Errors |
|------|-------------|-------------|----------|--------|
| 0 | | | | |
| 4 | | | | |
| 8 | | | | |
| 12 | | | | |
| 16 | | | | |
| 20 | | | | |
| 24 | | | | |

**Memory Leak Detection Script:**
```bash
#!/bin/bash
# Run during soak test to detect memory growth
for i in $(seq 1 24); do
 echo "Hour $i: $(docker stats --no-stream --format '{{.Name}}: {{.MemUsage}}' | grep gateway)"
 sleep 3600
done
```

---

### Story 8: Protocol Isolation Tests

#### 8a: SSE Streaming (`sse_streaming_baseline.jmx`)

| Parameter | Value |
|-----------|-------|
| Connections | 1,000 |
| Duration | 10m |
| Events/connection | Continuous |

**Metrics:**
- Connection establishment time
- Event delivery latency
- Connection drop rate
- Reconnection success rate

#### 8b: WebSocket (`websocket_baseline.jmx`)

| Parameter | Value |
|-----------|-------|
| Connections | 500 |
| Duration | 10m |
| Messages/second | 20 per connection |

**Metrics:**
- Handshake time
- Message round-trip time
- Frame loss rate
- Connection lifetime

#### 8c: Admin UI (`admin_ui_baseline.jmx`)

| Parameter | Value |
|-----------|-------|
| Users | 50 |
| Duration | 5m |
| Think time | 3-5s |

**User Journey:**
1. Login → Dashboard
2. Navigate Tools → View details
3. Navigate Servers → View details
4. Navigate Gateways → Refresh
5. View Metrics page
6. Logout

---

## Monitoring & Observability During Tests

### Pre-Test Checklist

```bash
# 1. Verify monitoring stack
make monitoring-status

# 2. Check Prometheus targets
curl -s "$PROMETHEUS_URL/api/v1/targets" | jq '.data.activeTargets | length'

# 3. Verify Grafana dashboards
curl -s "$GRAFANA_URL/api/search?type=dash-db" | jq '.[].title'

# 4. Verify Loki is receiving logs
curl -s "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/labels" | jq .

# 5. Start log collection (alternative to Loki)
docker compose logs -f gateway > logs/gateway_$(date +%Y%m%d_%H%M%S).log &
```

### Real-Time Monitoring

| Metric | Prometheus Query | Alert Threshold |
|--------|-----------------|-----------------|
| HTTP Request Rate | `rate(http_requests_total[1m])` | N/A (baseline) |
| HTTP Error Rate | `rate(http_requests_total{status=~"5.."}[1m])` | > 1% |
| P95 Latency | `histogram_quantile(0.95, http_request_duration_seconds_bucket)` | > 500ms |
| DB Connections | `pg_stat_activity_count` | > 80% pool |
| CPU Usage | `container_cpu_usage_seconds_total` | > 80% |
| Memory Usage | `container_memory_usage_bytes` | > 85% |
| Redis Operations | `redis_commands_total` | N/A |
| PgBouncer Pool | `pgbouncer_pools_cl_active` | > 80% |

### Grafana Dashboards

1. **Gateway Performance** - HTTP metrics, latency percentiles
2. **Database Health** - PostgreSQL connections, query times (via postgres_exporter)
3. **Container Resources** - CPU, memory, network I/O (via cAdvisor)
4. **MCP Protocol** - JSON-RPC operations, tool invocations
5. **Redis Metrics** - Cache hit rates, operations (via redis_exporter)
6. **Connection Pools** - PgBouncer stats (via pgbouncer_exporter)

### Exporters in Monitoring Stack

| Exporter | Port | Metrics |
|----------|------|---------|
| postgres_exporter | 9187 | PostgreSQL statistics |
| redis_exporter | 9121 | Redis metrics |
| pgbouncer_exporter | 9127 | Connection pool stats |
| nginx_exporter | 9113 | NGINX request metrics |
| cAdvisor | 8081 | Container resources |

### Post-Test Analysis

```bash
# Generate HTML report from JTL
jmeter -g results/test.jtl -o results/report/

# Export Prometheus metrics snapshot
curl -s "$PROMETHEUS_URL/api/v1/query?query=up" > metrics_snapshot.json

# Export Loki logs for test window
curl -G "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/query_range" \
 --data-urlencode 'query={job="gateway"}' \
 --data-urlencode "start=$(date -d '1 hour ago' +%s)000000000" \
 --data-urlencode "end=$(date +%s)000000000" > logs_export.json

# Compare with baseline
python scripts/compare_jmeter_results.py \
 baseline/results.jtl current/results.jtl \
 --threshold-p95=10% --threshold-error=0.5%
```

---

## Test Matrix

| Test Plan | Protocol | Threads | Duration | RPS Target | SLA |
|-----------|----------|---------|----------|------------|-----|
| REST Baseline | HTTP | 100 | 10m | 1,000 | P95 < 200ms |
| MCP Baseline | JSON-RPC | 200 | 15m | 1,000 | P95 < 300ms |
| MCP Servers | JSON-RPC | 200 | 10m | 2,000 | P95 < 50ms |
| Load | Mixed | 400 | 30m | 4,000 | P95 < 300ms |
| Stress | Mixed | 2,000 | 30m | 10,000 | Find limit |
| Spike | Mixed | 200-2,000 | 10m | 1K→10K→1K | Recovery < 30s |
| Soak | Mixed | 400 | 24h | 2,000 | No memory leak |
| SSE | SSE | 1,000 conn | 10m | N/A | Drop rate < 1% |
| WebSocket | WS | 500 conn | 10m | 10,000 msg/s | Loss < 0.1% |
| Admin UI | HTMX | 50 | 5m | N/A | P95 < 500ms |

---

## Success Criteria

- [ ] REST API baseline documented (P50, P95, P99 at 1,000 req/s)
- [ ] MCP JSON-RPC baseline documented (P50, P95, P99 at 1,000 req/s)
- [ ] MCP test servers baseline documented (fast_time, fast_test, benchmark)
- [ ] Load test sustains 4,000 RPS with P95 < 300ms
- [ ] Stress test identifies breaking point (target: >10,000 RPS)
- [ ] Spike test demonstrates recovery within 30 seconds after 10,000 RPS spike
- [ ] 24-hour soak test at 2,000 RPS shows memory growth < 10%
- [ ] SSE connection stability > 99% over 10 minutes (1,000 connections)
- [ ] WebSocket message delivery > 99.9% (10,000 msg/s)
- [ ] All test plans executable in non-GUI mode for CI/CD
- [ ] HTML reports generated automatically
- [ ] Prometheus/Grafana/Loki integration validated

---

## Deliverables

### Test Plans (`tests/jmeter/`)

```
tests/jmeter/
├── rest_api_baseline.jmx
├── mcp_jsonrpc_baseline.jmx
├── mcp_test_servers_baseline.jmx
├── load_test.jmx
├── stress_test.jmx
├── spike_test.jmx
├── soak_test.jmx
├── sse_streaming_baseline.jmx
├── websocket_baseline.jmx
├── admin_ui_baseline.jmx
├── properties/
│ ├── production.properties
│ └── ci.properties
└── data/
 ├── timezones.csv
 ├── tool_names.csv
 └── test_messages.csv
```

### Makefile Targets

```makefile
# JMeter targets
jmeter-rest-baseline: ## Run REST API baseline test (1,000 RPS)
jmeter-mcp-baseline: ## Run MCP JSON-RPC baseline test (1,000 RPS)
jmeter-mcp-servers-baseline: ## Run MCP test servers baseline
jmeter-load: ## Run load test (4,000 RPS)
jmeter-stress: ## Run stress test (ramp to 10,000 RPS)
jmeter-spike: ## Run spike test (1K→10K→1K recovery)
jmeter-soak: ## Run 24-hour soak test (2,000 RPS)
jmeter-report: ## Generate HTML report from last test
jmeter-compare: ## Compare current vs baseline results
```

---

## Related Files

- `tests/loadtest/locustfile.py` - Existing Locust tests (reference)
- `tests/loadtest/locustfile_baseline.py` - Component-level Locust tests
- `mcpgateway/routers/mcp.py` - MCP endpoints under test
- `mcpgateway/routers/tools.py` - REST API endpoints
- `docker-compose.yml` - Standard stack with monitoring profile
- `docker-compose-performance.yml` - Performance stack (7 replicas)
- `mcp-servers/fast-time-server/` - Go time server source
- `mcp-servers/fast-test-server/` - Rust test server source

---

## Related Issues

- #2473 - Load Testing, Stress Testing, and Benchmarks (Locust)
- #2476 - Observability and metrics accuracy
- #2450 - Metrics system
- #2360 - CPU spin loop detection

---

## Comparison: Locust vs JMeter

| Aspect | Locust (Existing) | JMeter (This Ticket) |
|--------|-------------------|----------------------|
| Language | Python | Java/XML |
| UI | Web-based | Desktop GUI + CLI |
| Scripting | Python code | XML + GUI |
| Protocol Support | HTTP, custom | HTTP, WS, TCP, JDBC |
| Distributed | Built-in | Master/Slave |
| CI/CD | Native | Non-GUI mode |
| Reporting | Custom + HTML | JTL, CSV, HTML |
| Enterprise | Limited | Extensive APM integration |
| Learning Curve | Low (for Python devs) | Medium |

**Recommendation**: Use Locust for developer-driven testing and rapid iteration. Use JMeter for formal baseline establishment, CI/CD gates, and enterprise reporting requirements.

Method	Weight	Parameters	Validation
`tools/list`	25%	`{}`	Has `tools` array
`tools/call`	30%	`{name, arguments}`	Has `content`
`resources/list`	15%	`{}`	Has `resources` array
`resources/read`	10%	`{uri}`	Has `contents`
`prompts/list`	10%	`{}`	Has `prompts` array
`initialize`	5%	`{protocolVersion, capabilities}`	Has `serverInfo`
`ping`	5%	`{}`	Empty result

Parameter	Value	Description
Threads	100	Concurrent virtual users
Ramp-up	60s	Time to spawn all users
Duration	10m	Test duration
Target RPS	1,000	Baseline requests per second

Endpoint	Weight	Method	Expected Response
`/health`	20%	GET	200, JSON
`/tools`	25%	GET	200, JSON array
`/servers`	20%	GET	200, JSON array
`/gateways`	15%	GET	200, JSON array
`/resources`	10%	GET	200, JSON array
`/prompts`	10%	GET	200, JSON array

Parameter	Value	Description
Threads	200	Concurrent MCP clients
Ramp-up	60s	Gradual client addition
Duration	15m	Extended test for stability
Target RPS	1,000	Baseline JSON-RPC operations/second

Server	Port	Tools	Description
fast_time_server	8888	`get_system_time`	Go-based time service
fast_test_server	8880	`echo`, `get_system_time`, `get_stats`	Rust-based test server
benchmark_server	9000-9099	Various	Multi-instance benchmark

Parameter	Value	Description
Threads	400	Concurrent users
Ramp-up	120s	Gradual ramp
Duration	30m	Sustained load
Target RPS	4,000	Production load simulation

Phase	Duration	Users	Target
Warm-up	2m	200	1,000 RPS
Ramp 1	5m	400	2,000 RPS
Ramp 2	5m	800	4,000 RPS
Ramp 3	5m	1,200	6,000 RPS
Ramp 4	5m	1,600	8,000 RPS
Peak	5m	2,000	10,000 RPS
Recovery	3m	200	1,000 RPS

Phase	Duration	Users	Target	Description
Baseline	2m	200	1,000 RPS	Normal operation
Spike	30s	2,000	10,000 RPS	10x traffic surge
Sustain	2m	2,000	10,000 RPS	Peak hold
Recovery	2m	200	1,000 RPS	Return to baseline
Verify	2m	200	1,000 RPS	Stability check

Parameter	Value	Purpose
Duration	24 hours	Memory leak detection
Users	400	Sustained load
Target RPS	2,000	Steady throughput
Sample Interval	1 hour	Metric snapshots

Metric	Prometheus Query	Alert Threshold
HTTP Request Rate	`rate(http_requests_total[1m])`	N/A (baseline)
HTTP Error Rate	`rate(http_requests_total{status=~"5.."}[1m])`	> 1%
P95 Latency	`histogram_quantile(0.95, http_request_duration_seconds_bucket)`	> 500ms
DB Connections	`pg_stat_activity_count`	> 80% pool
CPU Usage	`container_cpu_usage_seconds_total`	> 80%
Memory Usage	`container_memory_usage_bytes`	> 85%
Redis Operations	`redis_commands_total`	N/A
PgBouncer Pool	`pgbouncer_pools_cl_active`	> 80%

Exporter	Port	Metrics
postgres_exporter	9187	PostgreSQL statistics
redis_exporter	9121	Redis metrics
pgbouncer_exporter	9127	Connection pool stats
nginx_exporter	9113	NGINX request metrics
cAdvisor	8081	Container resources

Test Plan	Protocol	Threads	Duration	RPS Target	SLA
REST Baseline	HTTP	100	10m	1,000	P95 < 200ms
MCP Baseline	JSON-RPC	200	15m	1,000	P95 < 300ms
MCP Servers	JSON-RPC	200	10m	2,000	P95 < 50ms
Load	Mixed	400	30m	4,000	P95 < 300ms
Stress	Mixed	2,000	30m	10,000	Find limit
Spike	Mixed	200-2,000	10m	1K→10K→1K	Recovery < 30s
Soak	Mixed	400	24h	2,000	No memory leak
SSE	SSE	1,000 conn	10m	N/A	Drop rate < 1%
WebSocket	WS	500 conn	10m	10,000 msg/s	Loss < 0.1%
Admin UI	HTMX	50	5m	N/A	P95 < 500ms

Aspect	Locust (Existing)	JMeter (This Ticket)
Language	Python	Java/XML
UI	Web-based	Desktop GUI + CLI
Scripting	Python code	XML + GUI
Protocol Support	HTTP, custom	HTTP, WS, TCP, JDBC
Distributed	Built-in	Master/Slave
CI/CD	Native	Non-GUI mode
Reporting	Custom + HTML	JTL, CSV, HTML
Enterprise	Limited	Extensive APM integration
Learning Curve	Low (for Python devs)	Medium

[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline #2541

Description

[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline

Goal

Why Now?

User Stories

Architecture

Test Environment Setup

Prerequisites

Environment Configuration

Start Test Environment

Test Plans (Stories)

Story 1: REST API Baseline (rest_api_baseline.jmx)

Story 2: MCP JSON-RPC Baseline (mcp_jsonrpc_baseline.jmx)

Story 3: MCP Test Server Baseline (mcp_test_servers_baseline.jmx)

Story 4: Load Test (load_test.jmx)

Story 5: Stress Test (stress_test.jmx)

Story 6: Spike Test (spike_test.jmx)

Story 7: Soak Test (soak_test.jmx)

Story 8: Protocol Isolation Tests

8a: SSE Streaming (sse_streaming_baseline.jmx)

8b: WebSocket (websocket_baseline.jmx)

8c: Admin UI (admin_ui_baseline.jmx)

Monitoring & Observability During Tests

Pre-Test Checklist

Real-Time Monitoring

Grafana Dashboards

Exporters in Monitoring Stack

Post-Test Analysis

Test Matrix

Success Criteria

Deliverables

Test Plans (tests/jmeter/)

Makefile Targets

Related Files

Related Issues

Comparison: Locust vs JMeter

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Story 1: REST API Baseline (`rest_api_baseline.jmx`)

Story 2: MCP JSON-RPC Baseline (`mcp_jsonrpc_baseline.jmx`)

Story 3: MCP Test Server Baseline (`mcp_test_servers_baseline.jmx`)

Story 4: Load Test (`load_test.jmx`)

Story 5: Stress Test (`stress_test.jmx`)

Story 6: Spike Test (`spike_test.jmx`)

Story 7: Soak Test (`soak_test.jmx`)

8a: SSE Streaming (`sse_streaming_baseline.jmx`)

8b: WebSocket (`websocket_baseline.jmx`)

8c: Admin UI (`admin_ui_baseline.jmx`)

Test Plans (`tests/jmeter/`)