Skip to content

[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline #2541

@crivetimihai

Description

@crivetimihai

[TESTING][PERFORMANCE]: JMeter Performance Load Testing Baseline

Goal

Establish a JMeter-based performance testing baseline that complements the existing Locust framework. JMeter provides industry-standard testing capabilities, enterprise tooling integration, advanced correlation support, and standardized reporting formats (JTL, CSV, HTML dashboards) that integrate with CI/CD pipelines and APM tools.

Why Now?

While Locust provides excellent Python-based load testing, JMeter adds complementary value:

  1. Enterprise Integration: JMeter integrates with enterprise APM tools (Dynatrace, AppDynamics, New Relic)
  2. Standardized Reporting: JTL/CSV formats are industry standard for performance dashboards
  3. Non-Developer Access: GUI mode enables QA teams without Python experience to run tests
  4. Correlation Engine: Advanced response extraction for complex MCP workflows
  5. Distributed Testing: Native distributed mode with remote worker coordination
  6. Protocol Coverage: Built-in support for HTTP, WebSocket, TCP, and custom samplers

User Stories

US-1: QA Engineer - Baseline Performance Metrics

As a QA Engineer
I want JMeter test plans that establish performance baselines
So that I can detect performance regressions in CI/CD pipelines

Acceptance Criteria:

Feature: JMeter Baseline Testing

  Scenario: Establish REST API baseline
    Given the gateway is deployed with standard configuration
    When I execute the REST API baseline test plan
    Then response times are recorded in JTL format
    And P50, P95, P99 latencies are calculated
    And results can be compared against thresholds

  Scenario: Establish MCP protocol baseline
    Given MCP servers are registered with the gateway
    When I execute the MCP baseline test plan
    Then JSON-RPC response times are captured
    And tool invocation latencies are measured
    And results include error categorization
US-2: DevOps - CI/CD Integration

As a DevOps Engineer
I want JMeter tests that run in headless mode
So that I can integrate performance testing into GitHub Actions

Acceptance Criteria:

Feature: CI/CD Integration

  Scenario: Run JMeter in non-GUI mode
    Given JMeter test plans are available
    When I execute 'jmeter -n -t testplan.jmx -l results.jtl'
    Then tests complete without GUI dependencies
    And exit code reflects pass/fail status
    And HTML report is generated automatically

  Scenario: Performance gate in CI
    Given baseline metrics are established
    When current test results exceed thresholds
    Then CI pipeline fails with specific metrics
    And comparison report highlights regressions
US-3: SRE - Observability During Load Tests

As an SRE
I want comprehensive monitoring during JMeter tests
So that I can correlate performance metrics with system behavior

Acceptance Criteria:

Feature: Observability Integration

  Scenario: Monitor during load test
    Given monitoring stack is running (Prometheus, Grafana, Loki)
    When I execute a JMeter load test
    Then Grafana shows real-time metrics
    And I can correlate JMeter results with:
      | Metric                  | Source           |
      | HTTP request latency    | Gateway          |
      | Database query time     | PostgreSQL       |
      | Connection pool usage   | PgBouncer        |
      | Container CPU/Memory    | cAdvisor         |
      | MCP server latency      | fast_time_server |
      | Echo tool performance   | fast_test_server |

  Scenario: Aggregate logs during test
    Given Loki log aggregation is running
    When tests execute
    Then logs are collected via Promtail
    And Grafana shows correlated logs with metrics
US-4: Performance Engineer - Protocol-Specific Testing

As a Performance Engineer
I want test plans for each gateway protocol
So that I can identify protocol-specific bottlenecks

Acceptance Criteria:

Feature: Protocol Coverage

  Scenario Outline: Test protocol performance
    Given the gateway supports <protocol>
    When I run the <protocol> test plan
    Then baseline metrics are captured
    And protocol-specific errors are categorized

    Examples:
      | protocol        | test_plan                     |
      | REST HTTP       | rest_api_baseline.jmx         |
      | MCP JSON-RPC    | mcp_jsonrpc_baseline.jmx      |
      | SSE Streaming   | sse_streaming_baseline.jmx    |
      | WebSocket       | websocket_baseline.jmx        |
      | Admin UI (HTMX) | admin_ui_baseline.jmx         |

Architecture

                     JMETER LOAD TESTING ARCHITECTURE
+-------------------------------------------------------------------------+
|                                                                         |
|   JMeter Controller          Gateway Stack            Observability     |
|   -----------------          -------------            -------------     |
|                                                                         |
|   +--------------+          +---------------+         +-----------+     |
|   | JMeter       |  HTTP    |   nginx       |         | Prometheus|     |
|   | Controller   |--------->| (port 8080)   |         | (9090)    |     |
|   +--------------+          +-------+-------+         +-----------+     |
|         |                          |                        |           |
|         v                          v                        v           |
|   +--------------+          +---------------+         +-----------+     |
|   | JMeter       |          | Gateway       |         | Grafana   |     |
|   | Workers      |--------->| Replicas (3)  |         | (3000)    |     |
|   | (Distributed)|          +-------+-------+         +-----------+     |
|   +--------------+                 |                        ^           |
|         |                    +-----+-----+                  |           |
|         v                    |           |            +-----------+     |
|   +--------------+           v           v            | Loki +    |     |
|   | Results      |    +-----------+ +-----------+     | Promtail  |     |
|   | Collector    |    | PgBouncer | |   Redis   |     +-----------+     |
|   +--------------+    | (6432)    | |  (6379)   |           ^           |
|         |             +-----+-----+ +-----------+           |           |
|         v                   |                         +-----------+     |
|   +--------------+          v                         | cAdvisor  |     |
|   | JTL/CSV      |    +-----------+                   | (8081)    |     |
|   | Reports      |    | PostgreSQL|                   +-----------+     |
|   +--------------+    | (5433)    |                                     |
|                       +-----------+                                     |
|                                                                         |
|   MCP Test Servers (for load testing MCP protocol)                      |
|   ------------------------------------------------                      |
|   +------------------+  +------------------+  +------------------+       |
|   | fast_time_server |  | fast_test_server |  | benchmark_server |      |
|   | (Go, port 8888)  |  | (Rust, port 8880)|  | (Go, 9000-9099)  |      |
|   | Tools:           |  | Tools:           |  | Tools:           |      |
|   | - get_system_time|  | - echo           |  | - Lightweight    |      |
|   | - timezones      |  | - get_system_time|  |   MCP servers    |      |
|   |                  |  | - get_stats      |  | - Multi-instance |      |
|   +------------------+  +------------------+  +------------------+       |
|                                                                         |
|   Exporters: postgres_exporter (9187), redis_exporter (9121),           |
|              pgbouncer_exporter (9127), nginx_exporter (9113)           |
|                                                                         |
+-------------------------------------------------------------------------+

Test Profiles:
- Baseline: 1,000 req/s sustained for metric establishment
- Load:     4,000 req/s sustained, SLA validation
- Stress:   Ramp to breaking point (10,000+ req/s)
- Spike:    1,000 → 10,000 → 1,000 req/s recovery test
- Soak:     2,000 req/s for 24 hours (memory leak detection)
- Protocol: Per-protocol (REST, MCP, SSE, WebSocket) isolation

Test Environment Setup

Prerequisites

# Install JMeter (macOS)
brew install jmeter

# Install JMeter (Linux)
wget https://dlcdn.apache.org/jmeter/binaries/apache-jmeter-5.6.3.tgz
tar -xzf apache-jmeter-5.6.3.tgz
export PATH=$PATH:$(pwd)/apache-jmeter-5.6.3/bin

# Install required plugins
jmeter -p plugins-manager.sh install jpgc-casutg,jpgc-tst,jpgc-graphs-basic

# Verify installation
jmeter --version

Environment Configuration

# Core configuration
export GATEWAY_URL="http://localhost:8080"
export GATEWAY_ADMIN_URL="http://localhost:8080/admin"
export MCP_RPC_ENDPOINT="/rpc"
export MCPGATEWAY_BEARER_TOKEN=$(python -m mcpgateway.utils.create_jwt_token \
  --username admin@example.com --exp 10080 --secret $JWT_SECRET_KEY)

# Test parameters
export JMETER_THREADS=100          # Concurrent users
export JMETER_RAMP_UP=30           # Seconds to reach full load
export JMETER_DURATION=300         # Test duration in seconds
export JMETER_TARGET_RPS=4000      # Target requests per second

# Monitoring endpoints
export PROMETHEUS_URL="http://localhost:9090"
export GRAFANA_URL="http://localhost:3000"

# MCP Test Server endpoints
export FAST_TIME_SERVER_URL="http://localhost:8888"
export FAST_TEST_SERVER_URL="http://localhost:8880"
export BENCHMARK_SERVER_URL="http://localhost:9000"

Start Test Environment

# Option 1: Standard stack with monitoring (3 gateway replicas)
make monitoring-up

# Option 2: Performance stack (7 replicas + PostgreSQL replica)
make performance-up

# Option 3: Start with benchmark servers
docker compose --profile benchmark up -d

# Verify services are running
curl -s $GATEWAY_URL/health | jq .
curl -s $PROMETHEUS_URL/-/healthy
curl -s $GRAFANA_URL/api/health | jq .

# Verify MCP test servers
curl -s $FAST_TIME_SERVER_URL/health
curl -s $FAST_TEST_SERVER_URL/health

Test Plans (Stories)

Story 1: REST API Baseline (rest_api_baseline.jmx)

Parameter Value Description
Threads 100 Concurrent virtual users
Ramp-up 60s Time to spawn all users
Duration 10m Test duration
Target RPS 1,000 Baseline requests per second

Endpoints Tested:

Endpoint Weight Method Expected Response
/health 20% GET 200, JSON
/tools 25% GET 200, JSON array
/servers 20% GET 200, JSON array
/gateways 15% GET 200, JSON array
/resources 10% GET 200, JSON array
/prompts 10% GET 200, JSON array

Run Command:

jmeter -n -t tests/jmeter/rest_api_baseline.jmx \
  -JGATEWAY_URL=$GATEWAY_URL \
  -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
  -JTHREADS=100 -JRAMP_UP=60 -JDURATION=600 \
  -l results/rest_baseline_$(date +%Y%m%d_%H%M%S).jtl \
  -e -o results/rest_baseline_report/

Story 2: MCP JSON-RPC Baseline (mcp_jsonrpc_baseline.jmx)

Parameter Value Description
Threads 200 Concurrent MCP clients
Ramp-up 60s Gradual client addition
Duration 15m Extended test for stability
Target RPS 1,000 Baseline JSON-RPC operations/second

MCP Methods Tested:

Method Weight Parameters Validation
tools/list 25% {} Has tools array
tools/call 30% {name, arguments} Has content
resources/list 15% {} Has resources array
resources/read 10% {uri} Has contents
prompts/list 10% {} Has prompts array
initialize 5% {protocolVersion, capabilities} Has serverInfo
ping 5% {} Empty result

JSON-RPC Request Template:

{
  "jsonrpc": "2.0",
  "id": "${__UUID}",
  "method": "${method}",
  "params": ${params}
}

Run Command:

jmeter -n -t tests/jmeter/mcp_jsonrpc_baseline.jmx \
  -JGATEWAY_URL=$GATEWAY_URL \
  -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
  -JTHREADS=200 -JRAMP_UP=60 -JDURATION=900 \
  -l results/mcp_baseline_$(date +%Y%m%d_%H%M%S).jtl \
  -e -o results/mcp_baseline_report/

Story 3: MCP Test Server Baseline (mcp_test_servers_baseline.jmx)

Tests the MCP test servers directly for establishing performance ceilings.

Server Port Tools Description
fast_time_server 8888 get_system_time Go-based time service
fast_test_server 8880 echo, get_system_time, get_stats Rust-based test server
benchmark_server 9000-9099 Various Multi-instance benchmark

Run Command:

# Test fast_time_server directly
jmeter -n -t tests/jmeter/mcp_test_servers_baseline.jmx \
  -JFAST_TIME_URL=$FAST_TIME_SERVER_URL \
  -JFAST_TEST_URL=$FAST_TEST_SERVER_URL \
  -JTHREADS=200 -JDURATION=600 \
  -l results/mcp_servers_$(date +%Y%m%d_%H%M%S).jtl \
  -e -o results/mcp_servers_report/

Story 4: Load Test (load_test.jmx)

Parameter Value Description
Threads 400 Concurrent users
Ramp-up 120s Gradual ramp
Duration 30m Sustained load
Target RPS 4,000 Production load simulation

Run Command:

jmeter -n -t tests/jmeter/load_test.jmx \
  -JGATEWAY_URL=$GATEWAY_URL \
  -JTOKEN=$MCPGATEWAY_BEARER_TOKEN \
  -JTHREADS=400 -JRAMP_UP=120 -JDURATION=1800 \
  -l results/load_test_$(date +%Y%m%d_%H%M%S).jtl \
  -e -o results/load_test_report/

Story 5: Stress Test (stress_test.jmx)

Phase Duration Users Target
Warm-up 2m 200 1,000 RPS
Ramp 1 5m 400 2,000 RPS
Ramp 2 5m 800 4,000 RPS
Ramp 3 5m 1,200 6,000 RPS
Ramp 4 5m 1,600 8,000 RPS
Peak 5m 2,000 10,000 RPS
Recovery 3m 200 1,000 RPS

Metrics to Capture at Each Level:

Level P50 (ms) P95 (ms) P99 (ms) Error % CPU % Memory MB
2,000 RPS
4,000 RPS
6,000 RPS
8,000 RPS
10,000 RPS

Story 6: Spike Test (spike_test.jmx)

Phase Duration Users Target Description
Baseline 2m 200 1,000 RPS Normal operation
Spike 30s 2,000 10,000 RPS 10x traffic surge
Sustain 2m 2,000 10,000 RPS Peak hold
Recovery 2m 200 1,000 RPS Return to baseline
Verify 2m 200 1,000 RPS Stability check

Recovery Criteria:

  • P95 latency returns to baseline within 30 seconds
  • Error rate drops to <0.1% within 60 seconds
  • No connection pool exhaustion

Story 7: Soak Test (soak_test.jmx)

Parameter Value Purpose
Duration 24 hours Memory leak detection
Users 400 Sustained load
Target RPS 2,000 Steady throughput
Sample Interval 1 hour Metric snapshots

Hourly Metrics Collection:

Hour Memory (MB) Connections P95 (ms) Errors
0
4
8
12
16
20
24

Memory Leak Detection Script:

#!/bin/bash
# Run during soak test to detect memory growth
for i in $(seq 1 24); do
  echo "Hour $i: $(docker stats --no-stream --format '{{.Name}}: {{.MemUsage}}' | grep gateway)"
  sleep 3600
done

Story 8: Protocol Isolation Tests

8a: SSE Streaming (sse_streaming_baseline.jmx)

Parameter Value
Connections 1,000
Duration 10m
Events/connection Continuous

Metrics:

  • Connection establishment time
  • Event delivery latency
  • Connection drop rate
  • Reconnection success rate

8b: WebSocket (websocket_baseline.jmx)

Parameter Value
Connections 500
Duration 10m
Messages/second 20 per connection

Metrics:

  • Handshake time
  • Message round-trip time
  • Frame loss rate
  • Connection lifetime

8c: Admin UI (admin_ui_baseline.jmx)

Parameter Value
Users 50
Duration 5m
Think time 3-5s

User Journey:

  1. Login → Dashboard
  2. Navigate Tools → View details
  3. Navigate Servers → View details
  4. Navigate Gateways → Refresh
  5. View Metrics page
  6. Logout

Monitoring & Observability During Tests

Pre-Test Checklist

# 1. Verify monitoring stack
make monitoring-status

# 2. Check Prometheus targets
curl -s "$PROMETHEUS_URL/api/v1/targets" | jq '.data.activeTargets | length'

# 3. Verify Grafana dashboards
curl -s "$GRAFANA_URL/api/search?type=dash-db" | jq '.[].title'

# 4. Verify Loki is receiving logs
curl -s "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/labels" | jq .

# 5. Start log collection (alternative to Loki)
docker compose logs -f gateway > logs/gateway_$(date +%Y%m%d_%H%M%S).log &

Real-Time Monitoring

Metric Prometheus Query Alert Threshold
HTTP Request Rate rate(http_requests_total[1m]) N/A (baseline)
HTTP Error Rate rate(http_requests_total{status=~"5.."}[1m]) > 1%
P95 Latency histogram_quantile(0.95, http_request_duration_seconds_bucket) > 500ms
DB Connections pg_stat_activity_count > 80% pool
CPU Usage container_cpu_usage_seconds_total > 80%
Memory Usage container_memory_usage_bytes > 85%
Redis Operations redis_commands_total N/A
PgBouncer Pool pgbouncer_pools_cl_active > 80%

Grafana Dashboards

  1. Gateway Performance - HTTP metrics, latency percentiles
  2. Database Health - PostgreSQL connections, query times (via postgres_exporter)
  3. Container Resources - CPU, memory, network I/O (via cAdvisor)
  4. MCP Protocol - JSON-RPC operations, tool invocations
  5. Redis Metrics - Cache hit rates, operations (via redis_exporter)
  6. Connection Pools - PgBouncer stats (via pgbouncer_exporter)

Exporters in Monitoring Stack

Exporter Port Metrics
postgres_exporter 9187 PostgreSQL statistics
redis_exporter 9121 Redis metrics
pgbouncer_exporter 9127 Connection pool stats
nginx_exporter 9113 NGINX request metrics
cAdvisor 8081 Container resources

Post-Test Analysis

# Generate HTML report from JTL
jmeter -g results/test.jtl -o results/report/

# Export Prometheus metrics snapshot
curl -s "$PROMETHEUS_URL/api/v1/query?query=up" > metrics_snapshot.json

# Export Loki logs for test window
curl -G "$GRAFANA_URL/api/datasources/proxy/loki/loki/api/v1/query_range" \
  --data-urlencode 'query={job="gateway"}' \
  --data-urlencode "start=$(date -d '1 hour ago' +%s)000000000" \
  --data-urlencode "end=$(date +%s)000000000" > logs_export.json

# Compare with baseline
python scripts/compare_jmeter_results.py \
  baseline/results.jtl current/results.jtl \
  --threshold-p95=10% --threshold-error=0.5%

Test Matrix

Test Plan Protocol Threads Duration RPS Target SLA
REST Baseline HTTP 100 10m 1,000 P95 < 200ms
MCP Baseline JSON-RPC 200 15m 1,000 P95 < 300ms
MCP Servers JSON-RPC 200 10m 2,000 P95 < 50ms
Load Mixed 400 30m 4,000 P95 < 300ms
Stress Mixed 2,000 30m 10,000 Find limit
Spike Mixed 200-2,000 10m 1K→10K→1K Recovery < 30s
Soak Mixed 400 24h 2,000 No memory leak
SSE SSE 1,000 conn 10m N/A Drop rate < 1%
WebSocket WS 500 conn 10m 10,000 msg/s Loss < 0.1%
Admin UI HTMX 50 5m N/A P95 < 500ms

Success Criteria

  • REST API baseline documented (P50, P95, P99 at 1,000 req/s)
  • MCP JSON-RPC baseline documented (P50, P95, P99 at 1,000 req/s)
  • MCP test servers baseline documented (fast_time, fast_test, benchmark)
  • Load test sustains 4,000 RPS with P95 < 300ms
  • Stress test identifies breaking point (target: >10,000 RPS)
  • Spike test demonstrates recovery within 30 seconds after 10,000 RPS spike
  • 24-hour soak test at 2,000 RPS shows memory growth < 10%
  • SSE connection stability > 99% over 10 minutes (1,000 connections)
  • WebSocket message delivery > 99.9% (10,000 msg/s)
  • All test plans executable in non-GUI mode for CI/CD
  • HTML reports generated automatically
  • Prometheus/Grafana/Loki integration validated

Deliverables

Test Plans (tests/jmeter/)

tests/jmeter/
├── rest_api_baseline.jmx
├── mcp_jsonrpc_baseline.jmx
├── mcp_test_servers_baseline.jmx
├── load_test.jmx
├── stress_test.jmx
├── spike_test.jmx
├── soak_test.jmx
├── sse_streaming_baseline.jmx
├── websocket_baseline.jmx
├── admin_ui_baseline.jmx
├── properties/
│   ├── production.properties
│   └── ci.properties
└── data/
    ├── timezones.csv
    ├── tool_names.csv
    └── test_messages.csv

Makefile Targets

# JMeter targets
jmeter-rest-baseline:         ## Run REST API baseline test (1,000 RPS)
jmeter-mcp-baseline:          ## Run MCP JSON-RPC baseline test (1,000 RPS)
jmeter-mcp-servers-baseline:  ## Run MCP test servers baseline
jmeter-load:                  ## Run load test (4,000 RPS)
jmeter-stress:                ## Run stress test (ramp to 10,000 RPS)
jmeter-spike:                 ## Run spike test (1K→10K→1K recovery)
jmeter-soak:                  ## Run 24-hour soak test (2,000 RPS)
jmeter-report:                ## Generate HTML report from last test
jmeter-compare:               ## Compare current vs baseline results

Related Files

  • tests/loadtest/locustfile.py - Existing Locust tests (reference)
  • tests/loadtest/locustfile_baseline.py - Component-level Locust tests
  • mcpgateway/routers/mcp.py - MCP endpoints under test
  • mcpgateway/routers/tools.py - REST API endpoints
  • docker-compose.yml - Standard stack with monitoring profile
  • docker-compose-performance.yml - Performance stack (7 replicas)
  • mcp-servers/fast-time-server/ - Go time server source
  • mcp-servers/fast-test-server/ - Rust test server source

Related Issues


Comparison: Locust vs JMeter

Aspect Locust (Existing) JMeter (This Ticket)
Language Python Java/XML
UI Web-based Desktop GUI + CLI
Scripting Python code XML + GUI
Protocol Support HTTP, custom HTTP, WS, TCP, JDBC
Distributed Built-in Master/Slave
CI/CD Native Non-GUI mode
Reporting Custom + HTML JTL, CSV, HTML
Enterprise Limited Extensive APM integration
Learning Curve Low (for Python devs) Medium

Recommendation: Use Locust for developer-driven testing and rapid iteration. Use JMeter for formal baseline establishment, CI/CD gates, and enterprise reporting requirements.

Metadata

Metadata

Assignees

Labels

SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseenhancementNew feature or requestmanual-testingManual testing / test planning issuesperformancePerformance related itemstestingTesting (unit, e2e, manual, automated, etc)

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions