-
Notifications
You must be signed in to change notification settings - Fork 615
[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline #2541
Description
[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline
Goal
Establish a JMeter-based performance testing baseline that complements the existing Locust framework. JMeter provides industry-standard testing capabilities, enterprise tooling integration, advanced correlation support, and standardized reporting formats (JTL, CSV, HTML dashboards) that integrate with CI/CD pipelines and APM tools.
Why Now?
While Locust provides excellent Python-based load testing, JMeter adds complementary value:
- Enterprise Integration: JMeter integrates with enterprise APM tools (Dynatrace, AppDynamics, New Relic)
- Standardized Reporting: JTL/CSV formats are industry standard for performance dashboards
- Non-Developer Access: GUI mode enables QA teams without Python experience to run tests
- Correlation Engine: Advanced response extraction for complex MCP workflows
- Distributed Testing: Native distributed mode with remote worker coordination
- Protocol Coverage: Built-in support for HTTP, WebSocket, TCP, and custom samplers
User Stories
US-1: QA Engineer - Baseline Performance Metrics
As a QA Engineer
I want JMeter test plans that establish performance baselines
So that I can detect performance regressions in CI/CD pipelines
Acceptance Criteria:
Feature: JMeter Baseline Testing
Scenario: Establish REST API baseline
Given the gateway is deployed with standard configuration
When I execute the REST API baseline test plan
Then response times are recorded in JTL format
And P50, P95, P99 latencies are calculated
And results can be compared against thresholds
Scenario: Establish MCP protocol baseline
Given MCP servers are registered with the gateway
When I execute the MCP baseline test plan
Then JSON-RPC response times are captured
And tool invocation latencies are measured
And results include error categorizationUS-2: DevOps - CI/CD Integration
As a DevOps Engineer
I want JMeter tests that run in headless mode
So that I can integrate performance testing into GitHub Actions
Acceptance Criteria:
Feature: CI/CD Integration
Scenario: Run JMeter in non-GUI mode
Given JMeter test plans are available
When I execute 'jmeter -n -t testplan.jmx -l results.jtl'
Then tests complete without GUI dependencies
And exit code reflects pass/fail status
And HTML report is generated automatically
Scenario: Performance gate in CI
Given baseline metrics are established
When current test results exceed thresholds
Then CI pipeline fails with specific metrics
And comparison report highlights regressionsUS-3: SRE - Observability During Load Tests
As an SRE
I want comprehensive monitoring during JMeter tests
So that I can correlate performance metrics with system behavior
Acceptance Criteria:
Feature: Observability Integration
Scenario: Monitor during load test
Given monitoring stack is running (Prometheus, Grafana, Loki)
When I execute a JMeter load test
Then Grafana shows real-time metrics
And I can correlate JMeter results with:
| Metric | Source |
| HTTP request latency | Gateway |
| Database query time | PostgreSQL |
| Connection pool usage | PgBouncer |
| Container CPU/Memory | cAdvisor |
| MCP server latency | fast_time_server |
| Echo tool performance | fast_test_server |
Scenario: Aggregate logs during test
Given Loki log aggregation is running
When tests execute
Then logs are collected via Promtail
And Grafana shows correlated logs with metricsUS-4: Performance Engineer - Protocol-Specific Testing
As a Performance Engineer
I want test plans for each gateway protocol
So that I can identify protocol-specific bottlenecks
Acceptance Criteria:
Feature: Protocol Coverage
Scenario Outline: Test protocol performance
Given the gateway supports <protocol>
When I run the <protocol> test plan
Then baseline metrics are captured
And protocol-specific errors are categorized
Examples:
| protocol | test_plan |
| REST HTTP | rest_api_baseline.jmx |
| MCP JSON-RPC | mcp_jsonrpc_baseline.jmx |
| SSE Streaming | sse_streaming_baseline.jmx |
| WebSocket | websocket_baseline.jmx |
| Admin UI (HTMX) | admin_ui_baseline.jmx |Architecture
JMETER LOAD TESTING ARCHITECTURE
+-------------------------------------------------------------------------+
| |
| JMeter Controller Gateway Stack Observability |
| ----------------- ------------- ------------- |
| |
| +--------------+ +---------------+ +-----------+ |
| | JMeter | HTTP | nginx | | Prometheus| |
| | Controller |--------->| (port 8080) | | (9090) | |
| +--------------+ +-------+-------+ +-----------+ |
| | | | |
| v v v |
| +--------------+ +---------------+ +-----------+ |
| | JMeter | | Gateway | | Grafana | |
| | Workers |--------->| Replicas (3) | | (3000) | |
| | (Distributed)| +-------+-------+ +-----------+ |
| +--------------+ | ^ |
| | +-----+-----+ | |
| v | | +-----------+ |
| +--------------+ v v | Loki + | |
| | Results | +-----------+ +-----------+ | Promtail | |
| | Collector | | PgBouncer | | Redis | +-----------+ |
| +--------------+ | (6432) | | (6379) | ^ |
| | +-----+-----+ +-----------+ | |
| v | +-----------+ |
| +--------------+ v | cAdvisor | |
| | JTL/CSV | +-----------+ | (8081) | |
| | Reports | | PostgreSQL| +-----------+ |
| +--------------+ | (5433) | |
| +-----------+ |
| |
| MCP Test Servers (for load testing MCP protocol) |
| ------------------------------------------------ |
| +------------------+ +------------------+ +------------------+ |
| | fast_time_server | | fast_test_server | | benchmark_server | |
| | (Go, port 8888) | | (Rust, port 8880)| | (Go, 9000-9099) | |
| | Tools: | | Tools: | | Tools: | |
| | - get_system_time| | - echo | | - Lightweight | |
| | - timezones | | - get_system_time| | MCP servers | |
| | | | - get_stats | | - Multi-instance | |
| +------------------+ +------------------+ +------------------+ |
| |
| Exporters: postgres_exporter (9187), redis_exporter (9121), |
| pgbouncer_exporter (9127), nginx_exporter (9113) |
| |
+-------------------------------------------------------------------------+
Test Profiles:
- Baseline: 1,000 req/s sustained for metric establishment
- Load: 4,000 req/s sustained, SLA validation
- Stress: Ramp to breaking point (10,000+ req/s)
- Spike: 1,000 → 10,000 → 1,000 req/s recovery test
- Soak: 2,000 req/s for 24 hours (memory leak detection)
- Protocol: Per-protocol (REST, MCP, SSE, WebSocket) isolation
Test Environment Setup
Prerequisites
# Install JMeter (macOS)
brew install jmeter
# Install JMeter (Linux)
wget https://dlcdn.apache.org/jmeter/binaries/apache-jmeter-5.6.3.tgz
tar -xzf apache-jmeter-5.6.3.tgz
export PATH=$PATH:$(pwd)/apache-jmeter-5.6.3/bin
# Install required plugins
jmeter -p plugins-manager.sh install jpgc-casutg,jpgc-tst,jpgc-graphs-basic
# Verify installation
jmeter --versionEnvironment Configuration
# Core configuration
export GATEWAY_URL="http://localhost:8080"
export GATEWAY_ADMIN_URL="http://localhost:8080/admin"
export MCP_RPC_ENDPOINT="/rpc"
export MCPGATEWAY_BEARER_TOKEN=$(python -m mcpgateway.utils.create_jwt_token \
--username admin@example.com --exp 10080 --secret $JWT_SECRET_KEY)
# Test parameters
export JMETER_THREADS=100 # Concurrent users
export JMETER_RAMP_UP=30 # Seconds to reach full load
export JMETER_DURATION=300 # Test duration in seconds
export JMETER_TARGET_RPS=4000 # Target requests per second
# Monitoring endpoints
export PROMETHEUS_URL="http://localhost:9090"
export GRAFANA_URL="http://localhost:3000"
# MCP Test Server endpoints
export FAST_TIME_SERVER_URL="http://localhost:8888"
export FAST_TEST_SERVER_URL="http://localhost:8880"
export BENCHMARK_SERVER_URL="http://localhost:9000"Start Test Environment
# Option 1: Standard stack with monitoring (3 gateway replicas)
make monitoring-up
# Option 2: Performance stack (7 replicas + PostgreSQL replica)
make performance-up
# Option 3: Start with benchmark servers
docker compose --profile benchmark up -d
# Verify services are running
curl -s $GATEWAY_URL/health | jq .
curl -s $PROMETHEUS_URL/-/healthy
curl -s $GRAFANA_URL/api/health | jq .
# Verify MCP test servers
curl -s $FAST_TIME_SERVER_URL/health
curl -s $FAST_TEST_SERVER_URL/healthTest Plans (Stories)
Story 1: REST API Baseline (rest_api_baseline.jmx)
| Parameter | Value | Description |
|---|---|---|
| Threads | 100 | Concurrent virtual users |
| Ramp-up | 60s | Time to spawn all users |
| Duration | 10m | Test duration |
| Target RPS | 1,000 | Baseline requests per second |
Endpoints Tested:
| Endpoint | Weight | Method | Expected Response |
|---|---|---|---|
/health |
20% | GET | 200, JSON |
/tools |
25% | GET | 200, JSON array |
/servers |
20% | GET | 200, JSON array |
/gateways |
15% | GET | 200, JSON array |
/resources |
10% | GET | 200, JSON array |
/prompts |
10% | GET | 200, JSON array |
Run Command:
jmeter -n -t tests/jmeter/rest_api_baseline.jmx \
-JGATEWAY_URL=$GATEWAY_URL \
-JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
-JTHREADS=100 -JRAMP_UP=60 -JDURATION=600 \
-l results/rest_baseline_$(date +%Y%m%d_%H%M%S).jtl \
-e -o results/rest_baseline_report/Story 2: MCP JSON-RPC Baseline (mcp_jsonrpc_baseline.jmx)
| Parameter | Value | Description |
|---|---|---|
| Threads | 200 | Concurrent MCP clients |
| Ramp-up | 60s | Gradual client addition |
| Duration | 15m | Extended test for stability |
| Target RPS | 1,000 | Baseline JSON-RPC operations/second |
MCP Methods Tested:
| Method | Weight | Parameters | Validation |
|---|---|---|---|
tools/list |
25% | {} |
Has tools array |
tools/call |
30% | {name, arguments} |
Has content |
resources/list |
15% | {} |
Has resources array |
resources/read |
10% | {uri} |
Has contents |
prompts/list |
10% | {} |
Has prompts array |
initialize |
5% | {protocolVersion, capabilities} |
Has serverInfo |
ping |
5% | {} |
Empty result |
JSON-RPC Request Template:
{
"jsonrpc": "2.0",
"id": "${__UUID}",
"method": "${method}",
"params": ${params}
}Run Command:
jmeter -n -t tests/jmeter/mcp_jsonrpc_baseline.jmx \
-JGATEWAY_URL=$GATEWAY_URL \
-JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
-JTHREADS=200 -JRAMP_UP=60 -JDURATION=900 \
-l results/mcp_baseline_$(date +%Y%m%d_%H%M%S).jtl \
-e -o results/mcp_baseline_report/Story 3: MCP Test Server Baseline (mcp_test_servers_baseline.jmx)
Tests the MCP test servers directly for establishing performance ceilings.
| Server | Port | Tools | Description |
|---|---|---|---|
| fast_time_server | 8888 | get_system_time |
Go-based time service |
| fast_test_server | 8880 | echo, get_system_time, get_stats |
Rust-based test server |
| benchmark_server | 9000-9099 | Various | Multi-instance benchmark |
Run Command:
# Test fast_time_server directly
jmeter -n -t tests/jmeter/mcp_test_servers_baseline.jmx \
-JFAST_TIME_URL=$FAST_TIME_SERVER_URL \
-JFAST_TEST_URL=$FAST_TEST_SERVER_URL \
-JTHREADS=200 -JDURATION=600 \
-l results/mcp_servers_$(date +%Y%m%d_%H%M%S).jtl \
-e -o results/mcp_servers_report/Story 4: Load Test (load_test.jmx)
| Parameter | Value | Description |
|---|---|---|
| Threads | 400 | Concurrent users |
| Ramp-up | 120s | Gradual ramp |
| Duration | 30m | Sustained load |
| Target RPS | 4,000 | Production load simulation |
Run Command:
jmeter -n -t tests/jmeter/load_test.jmx \
-JGATEWAY_URL=$GATEWAY_URL \
-JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
-JTHREADS=400 -JRAMP_UP=120 -JDURATION=1800 \
-l results/load_test_$(date +%Y%m%d_%H%M%S).jtl \
-e -o results/load_test_report/Story 5: Stress Test (stress_test.jmx)
| Phase | Duration | Users | Target |
|---|---|---|---|
| Warm-up | 2m | 200 | 1,000 RPS |
| Ramp 1 | 5m | 400 | 2,000 RPS |
| Ramp 2 | 5m | 800 | 4,000 RPS |
| Ramp 3 | 5m | 1,200 | 6,000 RPS |
| Ramp 4 | 5m | 1,600 | 8,000 RPS |
| Peak | 5m | 2,000 | 10,000 RPS |
| Recovery | 3m | 200 | 1,000 RPS |
Metrics to Capture at Each Level:
| Level | P50 (ms) | P95 (ms) | P99 (ms) | Error % | CPU % | Memory MB |
|---|---|---|---|---|---|---|
| 2,000 RPS | ||||||
| 4,000 RPS | ||||||
| 6,000 RPS | ||||||
| 8,000 RPS | ||||||
| 10,000 RPS |
Story 6: Spike Test (spike_test.jmx)
| Phase | Duration | Users | Target | Description |
|---|---|---|---|---|
| Baseline | 2m | 200 | 1,000 RPS | Normal operation |
| Spike | 30s | 2,000 | 10,000 RPS | 10x traffic surge |
| Sustain | 2m | 2,000 | 10,000 RPS | Peak hold |
| Recovery | 2m | 200 | 1,000 RPS | Return to baseline |
| Verify | 2m | 200 | 1,000 RPS | Stability check |
Recovery Criteria:
- P95 latency returns to baseline within 30 seconds
- Error rate drops to <0.1% within 60 seconds
- No connection pool exhaustion
Story 7: Soak Test (soak_test.jmx)
| Parameter | Value | Purpose |
|---|---|---|
| Duration | 24 hours | Memory leak detection |
| Users | 400 | Sustained load |
| Target RPS | 2,000 | Steady throughput |
| Sample Interval | 1 hour | Metric snapshots |
Hourly Metrics Collection:
| Hour | Memory (MB) | Connections | P95 (ms) | Errors |
|---|---|---|---|---|
| 0 | ||||
| 4 | ||||
| 8 | ||||
| 12 | ||||
| 16 | ||||
| 20 | ||||
| 24 |
Memory Leak Detection Script:
#!/bin/bash
# Run during soak test to detect memory growth
for i in $(seq 1 24); do
echo "Hour $i: $(docker stats --no-stream --format '{{.Name}}: {{.MemUsage}}' | grep gateway)"
sleep 3600
doneStory 8: Protocol Isolation Tests
8a: SSE Streaming (sse_streaming_baseline.jmx)
| Parameter | Value |
|---|---|
| Connections | 1,000 |
| Duration | 10m |
| Events/connection | Continuous |
Metrics:
- Connection establishment time
- Event delivery latency
- Connection drop rate
- Reconnection success rate
8b: WebSocket (websocket_baseline.jmx)
| Parameter | Value |
|---|---|
| Connections | 500 |
| Duration | 10m |
| Messages/second | 20 per connection |
Metrics:
- Handshake time
- Message round-trip time
- Frame loss rate
- Connection lifetime
8c: Admin UI (admin_ui_baseline.jmx)
| Parameter | Value |
|---|---|
| Users | 50 |
| Duration | 5m |
| Think time | 3-5s |
User Journey:
- Login → Dashboard
- Navigate Tools → View details
- Navigate Servers → View details
- Navigate Gateways → Refresh
- View Metrics page
- Logout
Monitoring & Observability During Tests
Pre-Test Checklist
# 1. Verify monitoring stack
make monitoring-status
# 2. Check Prometheus targets
curl -s "$PROMETHEUS_URL/api/v1/targets" | jq '.data.activeTargets | length'
# 3. Verify Grafana dashboards
curl -s "$GRAFANA_URL/api/search?type=dash-db" | jq '.[].title'
# 4. Verify Loki is receiving logs
curl -s "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/labels" | jq .
# 5. Start log collection (alternative to Loki)
docker compose logs -f gateway > logs/gateway_$(date +%Y%m%d_%H%M%S).log &Real-Time Monitoring
| Metric | Prometheus Query | Alert Threshold |
|---|---|---|
| HTTP Request Rate | rate(http_requests_total[1m]) |
N/A (baseline) |
| HTTP Error Rate | rate(http_requests_total{status=~"5.."}[1m]) |
> 1% |
| P95 Latency | histogram_quantile(0.95, http_request_duration_seconds_bucket) |
> 500ms |
| DB Connections | pg_stat_activity_count |
> 80% pool |
| CPU Usage | container_cpu_usage_seconds_total |
> 80% |
| Memory Usage | container_memory_usage_bytes |
> 85% |
| Redis Operations | redis_commands_total |
N/A |
| PgBouncer Pool | pgbouncer_pools_cl_active |
> 80% |
Grafana Dashboards
- Gateway Performance - HTTP metrics, latency percentiles
- Database Health - PostgreSQL connections, query times (via postgres_exporter)
- Container Resources - CPU, memory, network I/O (via cAdvisor)
- MCP Protocol - JSON-RPC operations, tool invocations
- Redis Metrics - Cache hit rates, operations (via redis_exporter)
- Connection Pools - PgBouncer stats (via pgbouncer_exporter)
Exporters in Monitoring Stack
| Exporter | Port | Metrics |
|---|---|---|
| postgres_exporter | 9187 | PostgreSQL statistics |
| redis_exporter | 9121 | Redis metrics |
| pgbouncer_exporter | 9127 | Connection pool stats |
| nginx_exporter | 9113 | NGINX request metrics |
| cAdvisor | 8081 | Container resources |
Post-Test Analysis
# Generate HTML report from JTL
jmeter -g results/test.jtl -o results/report/
# Export Prometheus metrics snapshot
curl -s "$PROMETHEUS_URL/api/v1/query?query=up" > metrics_snapshot.json
# Export Loki logs for test window
curl -G "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/query_range" \
--data-urlencode 'query={job="gateway"}' \
--data-urlencode "start=$(date -d '1 hour ago' +%s)000000000" \
--data-urlencode "end=$(date +%s)000000000" > logs_export.json
# Compare with baseline
python scripts/compare_jmeter_results.py \
baseline/results.jtl current/results.jtl \
--threshold-p95=10% --threshold-error=0.5%Test Matrix
| Test Plan | Protocol | Threads | Duration | RPS Target | SLA |
|---|---|---|---|---|---|
| REST Baseline | HTTP | 100 | 10m | 1,000 | P95 < 200ms |
| MCP Baseline | JSON-RPC | 200 | 15m | 1,000 | P95 < 300ms |
| MCP Servers | JSON-RPC | 200 | 10m | 2,000 | P95 < 50ms |
| Load | Mixed | 400 | 30m | 4,000 | P95 < 300ms |
| Stress | Mixed | 2,000 | 30m | 10,000 | Find limit |
| Spike | Mixed | 200-2,000 | 10m | 1K→10K→1K | Recovery < 30s |
| Soak | Mixed | 400 | 24h | 2,000 | No memory leak |
| SSE | SSE | 1,000 conn | 10m | N/A | Drop rate < 1% |
| WebSocket | WS | 500 conn | 10m | 10,000 msg/s | Loss < 0.1% |
| Admin UI | HTMX | 50 | 5m | N/A | P95 < 500ms |
Success Criteria
- REST API baseline documented (P50, P95, P99 at 1,000 req/s)
- MCP JSON-RPC baseline documented (P50, P95, P99 at 1,000 req/s)
- MCP test servers baseline documented (fast_time, fast_test, benchmark)
- Load test sustains 4,000 RPS with P95 < 300ms
- Stress test identifies breaking point (target: >10,000 RPS)
- Spike test demonstrates recovery within 30 seconds after 10,000 RPS spike
- 24-hour soak test at 2,000 RPS shows memory growth < 10%
- SSE connection stability > 99% over 10 minutes (1,000 connections)
- WebSocket message delivery > 99.9% (10,000 msg/s)
- All test plans executable in non-GUI mode for CI/CD
- HTML reports generated automatically
- Prometheus/Grafana/Loki integration validated
Deliverables
Test Plans (tests/jmeter/)
tests/jmeter/
├── rest_api_baseline.jmx
├── mcp_jsonrpc_baseline.jmx
├── mcp_test_servers_baseline.jmx
├── load_test.jmx
├── stress_test.jmx
├── spike_test.jmx
├── soak_test.jmx
├── sse_streaming_baseline.jmx
├── websocket_baseline.jmx
├── admin_ui_baseline.jmx
├── properties/
│ ├── production.properties
│ └── ci.properties
└── data/
├── timezones.csv
├── tool_names.csv
└── test_messages.csv
Makefile Targets
# JMeter targets
jmeter-rest-baseline: ## Run REST API baseline test (1,000 RPS)
jmeter-mcp-baseline: ## Run MCP JSON-RPC baseline test (1,000 RPS)
jmeter-mcp-servers-baseline: ## Run MCP test servers baseline
jmeter-load: ## Run load test (4,000 RPS)
jmeter-stress: ## Run stress test (ramp to 10,000 RPS)
jmeter-spike: ## Run spike test (1K→10K→1K recovery)
jmeter-soak: ## Run 24-hour soak test (2,000 RPS)
jmeter-report: ## Generate HTML report from last test
jmeter-compare: ## Compare current vs baseline resultsRelated Files
tests/loadtest/locustfile.py- Existing Locust tests (reference)tests/loadtest/locustfile_baseline.py- Component-level Locust testsmcpgateway/routers/mcp.py- MCP endpoints under testmcpgateway/routers/tools.py- REST API endpointsdocker-compose.yml- Standard stack with monitoring profiledocker-compose-performance.yml- Performance stack (7 replicas)mcp-servers/fast-time-server/- Go time server sourcemcp-servers/fast-test-server/- Rust test server source
Related Issues
- [TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473 - Load Testing, Stress Testing, and Benchmarks (Locust)
- [TESTING][OBSERVABILITY]: Metrics Accuracy, Tracing Completeness, and Dashboard Validation #2476 - Observability and metrics accuracy
- [TESTING][FUNCTIONALITY]: Metrics system manual test plan (buffering, rollup, cleanup, queries) #2450 - Metrics system
- [BUG]: anyio cancel scope spin loop causes 100% CPU after load test stops #2360 - CPU spin loop detection
Comparison: Locust vs JMeter
| Aspect | Locust (Existing) | JMeter (This Ticket) |
|---|---|---|
| Language | Python | Java/XML |
| UI | Web-based | Desktop GUI + CLI |
| Scripting | Python code | XML + GUI |
| Protocol Support | HTTP, custom | HTTP, WS, TCP, JDBC |
| Distributed | Built-in | Master/Slave |
| CI/CD | Native | Non-GUI mode |
| Reporting | Custom + HTML | JTL, CSV, HTML |
| Enterprise | Limited | Extensive APM integration |
| Learning Curve | Low (for Python devs) | Medium |
Recommendation: Use Locust for developer-driven testing and rapid iteration. Use JMeter for formal baseline establishment, CI/CD gates, and enterprise reporting requirements.