Skip to content

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473

@crivetimihai

Description

@crivetimihai

[TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks

Goal

Produce a comprehensive manual test plan for validating gateway performance under expected and peak production loads including sustained traffic, spike handling, and resource consumption analysis.

Why Now?

Performance testing is critical for production readiness:

  1. Capacity Planning: Need baseline metrics to size deployments
  2. SLA Validation: Must verify P50/P95/P99 latency targets
  3. Breaking Points: Identify limits before users discover them
  4. Resource Efficiency: Optimize CPU, memory, and connection usage
  5. Regression Prevention: Establish benchmarks for future releases

User Stories

US-1: DevOps - Capacity Planning

As a DevOps engineer
I want performance benchmarks for the gateway
So that I can properly size production deployments

Acceptance Criteria:

Feature: Performance Benchmarking

  Scenario: Establish baseline throughput
    Given a standard gateway deployment
    When I run sustained load at 1000 req/s
    Then response times should be stable
    And P95 latency should be under 200ms
    And no errors should occur

  Scenario: Find breaking point
    Given a standard gateway deployment
    When I incrementally increase load
    Then I identify the maximum sustainable throughput
    And document resource consumption at each level
US-2: SRE - SLA Validation

As an SRE
I want to validate latency SLAs under load
So that I can guarantee service level objectives

Acceptance Criteria:

Feature: SLA Validation

  Scenario: Validate P99 latency under load
    Given production-equivalent load (500 req/s)
    When I measure response times for 1 hour
    Then P50 should be under 50ms
    And P95 should be under 150ms
    And P99 should be under 300ms

Architecture

                     LOAD TESTING ARCHITECTURE
+------------------------------------------------------------------------+
|                                                                        |
|   Load Generator          Gateway Cluster          Backend Services    |
|   --------------          ---------------          ----------------    |
|                                                                        |
|   +-----------+          +---------------+          +-----------+      |
|   |   k6 /    |  ------> |   Gateway 1   | -------> | MCP Server|      |
|   |  Locust   |          +---------------+          +-----------+      |
|   +-----------+                  |                                     |
|        |                         v                                     |
|        |                 +---------------+          +-----------+      |
|        +---------------> |   Gateway 2   | -------> | MCP Server|      |
|        |                 +---------------+          +-----------+      |
|        |                         |                                     |
|        v                         v                                     |
|   +-----------+          +---------------+                             |
|   | Metrics   | <------- |  Prometheus   |                             |
|   | Collector |          +---------------+                             |
|   +-----------+                                                        |
|                                                                        |
+------------------------------------------------------------------------+

Test Types:
- Load Test: Sustained traffic at expected levels
- Stress Test: Beyond capacity to find limits
- Spike Test: Sudden traffic surge
- Soak Test: Extended duration for memory leaks

Test Environment Setup

# Environment variables
export GATEWAY_URL="http://localhost:8000"
export TARGET_RPS=1000
export DURATION="1h"

# Install k6 load testing tool
brew install k6  # macOS
# or: snap install k6  # Linux

# Start gateway with production-like config
export DATABASE_URL="postgresql://user:pass@localhost/gateway"
export REDIS_URL="redis://localhost:6379"
export WORKERS=4
make serve

# Start test MCP server for consistent responses
python -m mcpgateway.translate --stdio "uvx mcp-server-time" --port 9000
curl -s -X POST "$GATEWAY_URL/gateways" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "perf-test-server", "url": "http://localhost:9000"}'

Manual Test Cases

Case Scenario Load Profile Duration Expected Result
PERF-01 Baseline throughput 100 req/s constant 10 min Establish baseline metrics
PERF-02 Sustained load 1000 req/s constant 1 hour Stable latency, 0% errors
PERF-03 Stress test Ramp 100->5000 req/s 30 min Find breaking point
PERF-04 Spike test 100->2000->100 req/s 15 min Recover within 30s
PERF-05 Soak test 500 req/s constant 24 hours No memory leak
PERF-06 Connection saturation 10000 concurrent 5 min Graceful rejection
PERF-07 Tool invocation load 100 concurrent tools/call 10 min P95 < 500ms

PERF-01: Baseline Throughput Test

k6 Script (baseline.js):

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  scenarios: {
    baseline: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,
      maxVUs: 100,
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<200'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.post(
    `${__ENV.GATEWAY_URL}/mcp/http`,
    JSON.stringify({
      jsonrpc: '2.0',
      id: 1,
      method: 'tools/list',
      params: {},
    }),
    {
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${__ENV.TOKEN}`,
      },
    }
  );

  check(res, {
    'status is 200': (r) => r.status === 200,
    'has result': (r) => JSON.parse(r.body).result !== undefined,
  });
}

Run Command:

k6 run --env GATEWAY_URL=$GATEWAY_URL --env TOKEN=$TOKEN baseline.js

Expected Results:

  • All requests complete successfully
  • P50 latency recorded
  • P95 latency recorded
  • P99 latency recorded
  • Throughput confirmed at 100 req/s
PERF-02: Sustained Load Test

k6 Script (sustained.js):

export const options = {
  scenarios: {
    sustained: {
      executor: 'constant-arrival-rate',
      rate: 1000,
      timeUnit: '1s',
      duration: '1h',
      preAllocatedVUs: 200,
      maxVUs: 500,
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    http_req_failed: ['rate<0.001'],
  },
};

Validation:

# Monitor during test
watch -n 5 'curl -s "$GATEWAY_URL/health" | jq .'

# Check Prometheus metrics
curl -s "$GATEWAY_URL/metrics" | grep http_request_duration

Expected Results:

  • Error rate < 0.1%
  • P95 latency stable throughout
  • No degradation over time
  • Memory usage stable
PERF-03: Stress Test (Find Breaking Point)

k6 Script (stress.js):

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Warm up
    { duration: '5m', target: 500 },   // Ramp to 500
    { duration: '5m', target: 1000 },  // Ramp to 1000
    { duration: '5m', target: 2000 },  // Ramp to 2000
    { duration: '5m', target: 3000 },  // Ramp to 3000
    { duration: '5m', target: 5000 },  // Stress point
    { duration: '3m', target: 0 },     // Recovery
  ],
};

Metrics to Record:

RPS Level P95 Latency Error Rate CPU % Memory MB
500
1000
2000
3000
5000
PERF-05: Soak Test (24-Hour)

k6 Script (soak.js):

export const options = {
  scenarios: {
    soak: {
      executor: 'constant-arrival-rate',
      rate: 500,
      timeUnit: '1s',
      duration: '24h',
      preAllocatedVUs: 100,
      maxVUs: 200,
    },
  },
};

Memory Leak Detection:

# Record memory every hour
for i in {1..24}; do
  echo "Hour $i: $(ps -o rss= -p $(pgrep -f mcpgateway) | awk '{print $1/1024 " MB"}')"
  sleep 3600
done

Expected Results:

  • Memory growth < 10% over 24 hours
  • No connection leaks
  • Latency remains stable

Test Matrix

Scenario Protocol Concurrency Duration Pass Criteria
Baseline HTTP 50 VUs 10m Establish metrics
Sustained HTTP 200 VUs 1h P95 < 200ms, 0% errors
Stress HTTP 500 VUs 30m Find limit
Spike HTTP 100-500 VUs 15m Recover < 30s
Soak HTTP 100 VUs 24h No memory leak
SSE SSE 1000 connections 1h Stable streams
WebSocket WS 500 connections 1h No drops
Tool calls HTTP 100 concurrent 10m P95 < 500ms

Success Criteria

  • Baseline performance documented (P50, P95, P99 at 100 req/s)
  • Sustained load (1000 req/s, 1 hour) passes with < 0.1% errors
  • Breaking point identified and documented
  • Spike recovery occurs within 30 seconds
  • 24-hour soak test shows no memory leaks
  • Resource sizing guidelines created (CPU/memory per RPS)
  • All latency SLAs validated

Related Files

  • mcpgateway/main.py - Application entry point
  • mcpgateway/middleware/ - Request processing pipeline
  • mcpgateway/routers/mcp.py - MCP endpoints under test
  • deployment/k6/ - Load test scripts (if exists)

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesperformancePerformance related itemsreadyValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions