-
Notifications
You must be signed in to change notification settings - Fork 615
[EPIC][TESTING]: Slow Time Server - configurable-latency MCP server for timeout, resilience, and load testing #2783
Description
⏱ Epic: Slow Time Server - Configurable-Latency MCP Server for Timeout, Resilience, and Load Testing
Goal
Create a Go MCP server (mcp-servers/go/slow-time-server) modelled on the existing fast-time-server that introduces configurable artificial latency on every tool call, resource fetch, and prompt render. This server serves as a first-class testing target for validating gateway timeout enforcement, circuit breaker behaviour, session pool resilience, and load testing under realistic slow-tool conditions.
Why Now?
Issue #2781 exposed that users with long-running tools hit the gateway's 60-second default timeout and receive a confusing empty error (Tool invocation failed: ). While the timeout behaviour is correct, we lack a reproducible, self-contained test target for:
- Timeout Verification: Validate that
TOOL_TIMEOUT, per-tooltimeout_ms,MCP_SESSION_POOL_TRANSPORT_TIMEOUT, andHTTPX_READ_TIMEOUTall behave correctly and interact as documented - Circuit Breaker Testing: Exercise the
CircuitBreakerPluginwith deterministic slow/failing tools — half-open recovery, retry headers, threshold tuning - Session Pool Resilience: Test
MCP_SESSION_POOL_*settings under sustained slow-tool load — pool exhaustion, acquire timeouts, stale session eviction - Load Testing: Extend the existing Locust test suite (
tests/loadtest/) with slow-tool scenarios that model real-world MCP servers with non-trivial latency - Error Message Propagation: Provide a test harness for [ENHANCEMENT][OBSERVABILITY]: Preserve timeout error message through ExceptionGroup unwrapping in tool invocations #2782 (timeout error message lost through ExceptionGroup wrapping)
- Regression Prevention: PR Enforce per-tool timeouts and enhanced circuit breaker plugin #2569 introduced timeout enforcement — a dedicated slow server catches regressions in timeout handling across all transports (SSE, StreamableHTTP, stdio)
The fast-time-server is excellent for throughput benchmarks but always returns instantly. Real MCP servers (compliance screening, data pipelines, LLM inference) routinely take 30-300+ seconds. We need a counterpart that simulates this.
📖 User Stories
US-1: Gateway Developer - Validate Timeout Enforcement
As a Gateway Developer
I want an MCP server with configurable response latency
So that I can validate that timeout enforcement works correctly across all transports
Acceptance Criteria:
Given the slow-time-server is running with default latency 5s
And the gateway has TOOL_TIMEOUT=3
When I invoke the get_slow_time tool via StreamableHTTP
Then the gateway should return ToolTimeoutError after ~3s
And the error message should include the timeout value
And the structured log should contain event=tool_timeout with timeout_seconds=3Technical Requirements:
- Configurable default latency via
-latencyflag andDEFAULT_LATENCYenv var - Per-tool latency override via tool arguments (
delay_secondsparameter) - Support all transports: stdio, SSE, StreamableHTTP, dual, REST
- Latency applied via
time.Sleep()before returning result
US-2: QA Engineer - Test Circuit Breaker with Slow Tools
As a QA Engineer
I want tools that intermittently fail or timeout
So that I can test circuit breaker open/half-open/closed state transitions
Acceptance Criteria:
Given the slow-time-server has failure_rate=0.5 and latency=10s
And the CircuitBreakerPlugin has threshold=3 and reset=30s
When I invoke the tool 5 times rapidly
Then ~2-3 invocations should timeout
And the circuit breaker should open after 3 failures
And subsequent calls should fail fast with "circuit open"
And after 30s the circuit should enter half-open stateTechnical Requirements:
- Configurable failure rate (
-failure-rateflag, 0.0-1.0) - Failure modes: timeout (sleep beyond gateway timeout), error (return error), panic (simulate crash)
- Deterministic mode (
-seedflag) for reproducible test runs
US-3: Performance Engineer - Load Test with Realistic Latency
As a Performance Engineer
I want to run Locust load tests against slow tools
So that I can measure gateway behaviour under sustained slow-tool load
Acceptance Criteria:
Given the slow-time-server is running in dual mode
And a Locust test is configured with 50 concurrent users
When each user invokes get_slow_time with delay_seconds=2
Then the gateway should handle concurrent requests within pool limits
And session pool metrics should show pool utilisation
And no requests should fail due to pool exhaustion (within configured limits)Technical Requirements:
- Locust test file:
tests/loadtest/locustfile_slow_time_server.py - Test scenarios: gradual ramp-up, sustained load, spike, timeout storms
- Metrics: p50/p95/p99 latency, error rate, pool utilisation
US-4: Developer - Variable Latency Distribution
As a Developer
I want tools with variable latency distributions (uniform, normal, exponential)
So that I can simulate realistic tool response time patterns
Acceptance Criteria:
Given the slow-time-server is configured with:
-latency-distribution=normal -latency-mean=5 -latency-stddev=2
When I invoke the tool 100 times
Then response times should follow a normal distribution (mean ~5s, stddev ~2s)
And some calls should naturally exceed the gateway timeout
And the distribution should be observable in metricsTechnical Requirements:
- Distributions:
fixed(default),uniform(min/max),normal(mean/stddev),exponential(lambda) - Per-invocation override via
delay_secondstool argument (always fixed) /metricsendpoint exposing latency histogram
US-5: Integration Tester - Multi-Tool Latency Profiles
As an Integration Tester
I want multiple tools with different latency profiles on the same server
So that I can test per-tool timeout_ms overrides and mixed-latency scenarios
Acceptance Criteria:
Given the slow-time-server exposes:
- get_slow_time (configurable delay)
- convert_slow_time (configurable delay)
- get_instant_time (always instant, 0ms)
- get_timeout_time (always exceeds 5min, for timeout testing)
When I register tools with different timeout_ms values:
- get_slow_time: timeout_ms=10000
- get_timeout_time: timeout_ms=5000
Then each tool should respect its per-tool timeout independentlyUS-6: DevOps Engineer - Docker Compose Integration
As a DevOps Engineer
I want the slow-time-server in the performance testing Docker Compose stack
So that I can run timeout and resilience tests in CI/CD
Acceptance Criteria:
Given docker-compose-performance.yml includes the slow-time-server
When I run: docker compose -f docker-compose-performance.yml up
Then the slow-time-server should be accessible on port 8081
And the gateway should have it pre-registered as a gateway
And Locust tests can target it alongside the fast-time-server🏗 Architecture
Tool Inventory
| Tool | Description | Latency Behaviour | Purpose |
|---|---|---|---|
get_slow_time |
Get current time with configurable delay | Respects delay_seconds arg, falls back to server default |
General timeout testing |
convert_slow_time |
Convert time between timezones with delay | Same as above | Per-tool timeout_ms testing |
get_instant_time |
Get current time with zero delay | Always instant (0ms) | Baseline / control tool |
get_timeout_time |
Get current time with extreme delay | Always sleeps 10 minutes | Guaranteed timeout testing |
get_flaky_time |
Get current time with random failures | Fails based on -failure-rate |
Circuit breaker testing |
Server Flags
Usage: slow-time-server [flags]
Flags:
-transport string Transport: stdio, http, sse, dual, rest (default "stdio")
-addr string Bind address (default "0.0.0.0")
-port int Port (default 8081)
-auth-token string Bearer auth token
-log-level string Log level: debug, info, warn, error (default "info")
Latency Configuration:
-latency duration Default tool latency (default 5s)
-latency-distribution string Distribution: fixed, uniform, normal, exponential (default "fixed")
-latency-min duration Min latency for uniform distribution (default 1s)
-latency-max duration Max latency for uniform distribution (default 10s)
-latency-mean duration Mean for normal distribution (default 5s)
-latency-stddev duration Stddev for normal distribution (default 2s)
Failure Simulation:
-failure-rate float Probability of failure for flaky tool (0.0-1.0, default 0.0)
-failure-mode string Failure type: timeout, error, panic (default "timeout")
-seed int Random seed for reproducibility (default: time-based)
Environment Variables:
DEFAULT_LATENCY Override -latency (e.g., "5s", "30s", "2m")
FAILURE_RATE Override -failure-rate
AUTH_TOKEN Override -auth-token
Tool Input Schema
{
"name": "get_slow_time",
"description": "Get current system time with configurable artificial delay. Use delay_seconds to control response latency for timeout testing.",
"inputSchema": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "IANA timezone name (default: UTC)",
"default": "UTC"
},
"delay_seconds": {
"type": "number",
"description": "Override delay in seconds. If omitted, uses server default latency.",
"minimum": 0,
"maximum": 600
}
}
},
"annotations": {
"readOnlyHint": true,
"destructiveHint": false,
"openWorldHint": false
}
}Data Flow
sequenceDiagram
participant Client as MCP Client / Locust
participant Gateway as ContextForge Gateway
participant Plugin as CircuitBreakerPlugin
participant Server as slow-time-server
Client->>Gateway: tools/call: get_slow_time {delay_seconds: 30}
Gateway->>Plugin: tool_pre_invoke
Plugin-->>Gateway: continue (circuit closed)
Gateway->>Server: session.call_tool("get_slow_time", {delay_seconds: 30})
Note over Server: time.Sleep(30s)
alt Gateway timeout fires first (TOOL_TIMEOUT=10)
Gateway-->>Gateway: asyncio.wait_for → TimeoutError after 10s
Gateway->>Plugin: tool_post_invoke (timeout failure)
Plugin-->>Plugin: Increment failure count
Gateway-->>Client: ToolTimeoutError: "timed out after 10s"
else Server responds within timeout
Server-->>Gateway: {time: "2026-02-09T...", delayed_by: "30s"}
Gateway->>Plugin: tool_post_invoke (success)
Gateway-->>Client: ToolResult {time, delayed_by}
end
📋 Implementation Tasks
Phase 1: Core Server Implementation
-
Scaffold project structure
- Create
mcp-servers/go/slow-time-server/directory -
go.modwithgithub.com/mark3labs/mcp-godependency (same version as fast-time-server) -
main.go— server implementation -
Makefile— mirroring fast-time-server targets -
Dockerfile— multi-stage build (scratch base) -
.gitignore,.golangci.yml,staticcheck.conf
- Create
-
Implement latency engine
- Fixed latency:
time.Sleep(duration) - Uniform distribution:
rand.Float64() * (max - min) + min - Normal distribution:
rand.NormFloat64() * stddev + mean(clamped to 0) - Exponential distribution:
rand.ExpFloat64() / lambda - Per-call
delay_secondsoverride (always fixed, bypasses distribution) - Context-aware sleep: respect
context.Done()for clean cancellation
- Fixed latency:
-
Implement MCP tools (5 tools)
-
get_slow_time— configurable delay, timezone support -
convert_slow_time— configurable delay, timezone conversion -
get_instant_time— zero delay (control/baseline) -
get_timeout_time— 10-minute fixed delay (guaranteed timeout) -
get_flaky_time— probabilistic failure based on failure_rate
-
-
Implement MCP resources (2 resources)
-
latency://config— current server latency configuration (JSON) -
latency://stats— invocation count, avg/p50/p95/p99 latency, failure count
-
-
Implement MCP prompts (1 prompt)
-
test_timeout— generates a prompt instructing the LLM to invoke slow tools with specific delay to test timeout behaviour
-
-
Multi-transport support
- stdio, SSE, HTTP (StreamableHTTP), dual, REST — same as fast-time-server
- Auth middleware reuse
- CORS and logging middleware
Phase 2: Failure Simulation
-
Failure modes for
get_flaky_time-
timeout— sleep for 10x the configured latency (exceed any reasonable timeout) -
error— returnmcp.NewToolResultError("simulated failure") -
panic—panic("simulated crash")with recovery middleware - Deterministic seeding (
-seedflag) for reproducible test runs
-
-
Response metadata
- Include
delayed_by,server_default_latency,distribution,failure_simulatedin tool response - This allows test assertions to verify the server behaved as configured
- Include
Phase 3: REST API & Observability
-
REST endpoints (in
dualandrestmodes)-
GET /api/v1/time?timezone=X&delay=Y— slow time with delay -
GET /api/v1/config— current latency config -
GET /api/v1/stats— invocation statistics -
POST /api/v1/config— runtime latency reconfiguration (hot-reload) -
GET /health— health check (always instant, no delay) -
GET /version— version info
-
-
Prometheus metrics endpoint
-
GET /metrics— tool_invocations_total, tool_latency_seconds (histogram), tool_failures_total - Labels: tool_name, failure_mode, transport
-
Phase 4: Testing & Quality
-
Unit tests (
main_test.go)- Test each tool handler with various delay values
- Test latency distributions produce expected ranges
- Test failure rate simulation (deterministic seed)
- Test context cancellation during sleep
- Test REST endpoints
- Test auth middleware
-
Locust load test (
tests/loadtest/locustfile_slow_time_server.py)- Scenario: gradual ramp-up (1→50 users, 2s delay per tool call)
- Scenario: timeout storm (all users hit 120s delay with TOOL_TIMEOUT=60)
- Scenario: mixed latency (instant + slow + timeout tools)
- Scenario: circuit breaker exercise (50% failure rate)
- Assertions on p95 latency, error rate, timeout rate
-
Integration test with gateway
- Register slow-time-server as gateway
- Test
TOOL_TIMEOUTenforcement via StreamableHTTP - Test per-tool
timeout_msoverride - Test
MCP_SESSION_POOL_TRANSPORT_TIMEOUTinteraction - Test circuit breaker plugin activation on repeated timeouts
Phase 5: Docker & CI Integration
- Dockerfile — multi-stage, scratch base, ~2 MiB image
- Add to
docker-compose-performance.yml- Service:
slow-time-serveron port 8081 - Default latency: 5s
- Environment:
DEFAULT_LATENCY=5s,FAILURE_RATE=0.1
- Service:
- CI workflow (optional)
- Build and test in GitHub Actions
- Publish container image alongside fast-time-server
Phase 6: Documentation
- README.md — usage, flags, examples, Docker, tool reference
- Update
mcp-servers/AGENTS.md— add slow-time-server entry - Update
docs/docs/manage/tuning.md— reference slow-time-server for timeout tuning validation
⚙️ Configuration Examples
Basic: 5-second latency for timeout testing
./slow-time-server -transport=dual -port=8081 -latency=5sCircuit breaker testing: 30% failure rate
./slow-time-server -transport=dual -port=8081 \
-latency=2s -failure-rate=0.3 -failure-mode=error -seed=42Realistic distribution: normal with occasional outliers
./slow-time-server -transport=dual -port=8081 \
-latency-distribution=normal -latency-mean=5s -latency-stddev=3sDocker Compose
# docker-compose-performance.yml
services:
slow-time-server:
build:
context: ./mcp-servers/go/slow-time-server
ports:
- "8081:8081"
environment:
DEFAULT_LATENCY: "5s"
FAILURE_RATE: "0.1"
command: ["-transport=dual", "-port=8081", "-listen=0.0.0.0"]Gateway registration
# Register the slow-time-server as a gateway
curl -X POST http://localhost:4444/api/v1/gateways \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "slow-time-server",
"url": "http://slow-time-server:8081",
"transport": "streamablehttp"
}'
# Override timeout for the guaranteed-timeout tool
curl -X PATCH http://localhost:4444/api/v1/tools/<get_timeout_time_id> \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"timeout_ms": 5000}'✅ Success Criteria
- Functionality: All 5 tools work across all transports (stdio, SSE, StreamableHTTP, dual, REST)
- Latency: Configurable via flags, env vars, and per-call
delay_secondsargument - Distributions: Fixed, uniform, normal, and exponential latency distributions produce correct ranges
- Failure Simulation:
get_flaky_timefails at configured rate with deterministic seeding - Gateway Integration: Successfully registered and invoked through ContextForge gateway
- Timeout Testing: Gateway
TOOL_TIMEOUT, per-tooltimeout_ms, and pool transport timeout all validated - Circuit Breaker: CircuitBreakerPlugin correctly opens/closes with configurable failure rates
- Load Testing: Locust scenarios run successfully with expected latency profiles
- Docker: Builds as ~2 MiB scratch image, runs in docker-compose-performance.yml
- Testing: Unit tests pass with race detection (
go test -race), >80% coverage - Documentation: README with full flag reference, examples, and Docker instructions
- Quality: Passes
make lint,make vet,make staticcheck
🏁 Definition of Done
- Go module scaffolded in
mcp-servers/go/slow-time-server/ - 5 MCP tools implemented with configurable latency
- 2 MCP resources (config, stats) implemented
- 4 latency distributions (fixed, uniform, normal, exponential) working
- Failure simulation with configurable rate and mode
- All 5 transports supported (stdio, SSE, HTTP, dual, REST)
- Unit tests with race detection and >80% coverage
- Locust load test file created in
tests/loadtest/ - Dockerfile producing scratch-based ~2 MiB image
- Added to
docker-compose-performance.yml - README.md with complete documentation
- Passes
make lint vet staticcheck test
📝 Additional Notes
🔹 Relationship to fast-time-server: The slow-time-server shares the same MCP SDK, transport infrastructure, auth middleware, and code patterns as fast-time-server. It adds latency injection and failure simulation on top. Consider extracting shared code into a common package if duplication becomes significant.
🔹 Context-aware sleep: Use select { case <-ctx.Done(): return ctx.Err(); case <-time.After(delay): } rather than bare time.Sleep() so that tool cancellations propagate cleanly and don't leave goroutines hanging.
🔹 Runtime reconfiguration: The POST /api/v1/config endpoint allows changing latency parameters without restarting the server. This is useful for scripted test scenarios that need different latency profiles in sequence.
🔹 Deterministic mode: The -seed flag ensures get_flaky_time and distribution-based latencies produce the same sequence across runs, enabling reproducible CI test assertions.
🔹 Port convention: fast-time-server uses 8080, slow-time-server uses 8081, avoiding conflicts in compose stacks.
🔗 Related Issues
- [QUESTION][CONFIGURATION]: MCP toolkit tool invocation returns an error Tool invocation failed #2781 — MCP toolkit tool invocation returns an error Tool invocation failed (motivating issue)
- [ENHANCEMENT][OBSERVABILITY]: Preserve timeout error message through ExceptionGroup unwrapping in tool invocations #2782 — Preserve timeout error message through ExceptionGroup unwrapping
- Enforce per-tool timeouts and enhanced circuit breaker plugin #2569 — Enforce per-tool timeouts and enhanced circuit breaker plugin
- [EPIC][TESTING]: Automated MCP server compatibility regression suite - Top 100+ server testing #2347 — Automated MCP server compatibility regression suite
- [EPIC][TESTING]: Achieve 100% Locust Load Test Coverage for REST APIs #2636 — Achieve 100% Locust Load Test Coverage for REST APIs
- [TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473 — Load Testing, Stress Testing, and Benchmarks