[EPIC][TESTING]: Slow Time Server - configurable-latency MCP server for timeout, resilience, and load testing

# ⏱ Epic: Slow Time Server - Configurable-Latency MCP Server for Timeout, Resilience, and Load Testing

## Goal

Create a **Go MCP server** (`mcp-servers/go/slow-time-server`) modelled on the existing `fast-time-server` that introduces **configurable artificial latency** on every tool call, resource fetch, and prompt render. This server serves as a first-class testing target for validating gateway timeout enforcement, circuit breaker behaviour, session pool resilience, and load testing under realistic slow-tool conditions.

## Why Now?

Issue #2781 exposed that users with long-running tools hit the gateway's 60-second default timeout and receive a confusing empty error (`Tool invocation failed: `). While the timeout behaviour is correct, we lack a **reproducible, self-contained test target** for:

1. **Timeout Verification**: Validate that `TOOL_TIMEOUT`, per-tool `timeout_ms`, `MCP_SESSION_POOL_TRANSPORT_TIMEOUT`, and `HTTPX_READ_TIMEOUT` all behave correctly and interact as documented
2. **Circuit Breaker Testing**: Exercise the `CircuitBreakerPlugin` with deterministic slow/failing tools — half-open recovery, retry headers, threshold tuning
3. **Session Pool Resilience**: Test `MCP_SESSION_POOL_*` settings under sustained slow-tool load — pool exhaustion, acquire timeouts, stale session eviction
4. **Load Testing**: Extend the existing Locust test suite (`tests/loadtest/`) with slow-tool scenarios that model real-world MCP servers with non-trivial latency
5. **Error Message Propagation**: Provide a test harness for #2782 (timeout error message lost through ExceptionGroup wrapping)
6. **Regression Prevention**: PR #2569 introduced timeout enforcement — a dedicated slow server catches regressions in timeout handling across all transports (SSE, StreamableHTTP, stdio)

The `fast-time-server` is excellent for throughput benchmarks but always returns instantly. Real MCP servers (compliance screening, data pipelines, LLM inference) routinely take 30-300+ seconds. We need a counterpart that simulates this.

---

## 📖 User Stories

<details>
<summary>US-1: Gateway Developer - Validate Timeout Enforcement</summary>

**As a** Gateway Developer
**I want** an MCP server with configurable response latency
**So that** I can validate that timeout enforcement works correctly across all transports

**Acceptance Criteria:**

```gherkin
Given the slow-time-server is running with default latency 5s
And the gateway has TOOL_TIMEOUT=3
When I invoke the get_slow_time tool via StreamableHTTP
Then the gateway should return ToolTimeoutError after ~3s
And the error message should include the timeout value
And the structured log should contain event=tool_timeout with timeout_seconds=3
```

**Technical Requirements:**
- Configurable default latency via `-latency` flag and `DEFAULT_LATENCY` env var
- Per-tool latency override via tool arguments (`delay_seconds` parameter)
- Support all transports: stdio, SSE, StreamableHTTP, dual, REST
- Latency applied via `time.Sleep()` before returning result

</details>

<details>
<summary>US-2: QA Engineer - Test Circuit Breaker with Slow Tools</summary>

**As a** QA Engineer
**I want** tools that intermittently fail or timeout
**So that** I can test circuit breaker open/half-open/closed state transitions

**Acceptance Criteria:**

```gherkin
Given the slow-time-server has failure_rate=0.5 and latency=10s
And the CircuitBreakerPlugin has threshold=3 and reset=30s
When I invoke the tool 5 times rapidly
Then ~2-3 invocations should timeout
And the circuit breaker should open after 3 failures
And subsequent calls should fail fast with "circuit open"
And after 30s the circuit should enter half-open state
```

**Technical Requirements:**
- Configurable failure rate (`-failure-rate` flag, 0.0-1.0)
- Failure modes: timeout (sleep beyond gateway timeout), error (return error), panic (simulate crash)
- Deterministic mode (`-seed` flag) for reproducible test runs

</details>

<details>
<summary>US-3: Performance Engineer - Load Test with Realistic Latency</summary>

**As a** Performance Engineer
**I want** to run Locust load tests against slow tools
**So that** I can measure gateway behaviour under sustained slow-tool load

**Acceptance Criteria:**

```gherkin
Given the slow-time-server is running in dual mode
And a Locust test is configured with 50 concurrent users
When each user invokes get_slow_time with delay_seconds=2
Then the gateway should handle concurrent requests within pool limits
And session pool metrics should show pool utilisation
And no requests should fail due to pool exhaustion (within configured limits)
```

**Technical Requirements:**
- Locust test file: `tests/loadtest/locustfile_slow_time_server.py`
- Test scenarios: gradual ramp-up, sustained load, spike, timeout storms
- Metrics: p50/p95/p99 latency, error rate, pool utilisation

</details>

<details>
<summary>US-4: Developer - Variable Latency Distribution</summary>

**As a** Developer
**I want** tools with variable latency distributions (uniform, normal, exponential)
**So that** I can simulate realistic tool response time patterns

**Acceptance Criteria:**

```gherkin
Given the slow-time-server is configured with:
 -latency-distribution=normal -latency-mean=5 -latency-stddev=2
When I invoke the tool 100 times
Then response times should follow a normal distribution (mean ~5s, stddev ~2s)
And some calls should naturally exceed the gateway timeout
And the distribution should be observable in metrics
```

**Technical Requirements:**
- Distributions: `fixed` (default), `uniform` (min/max), `normal` (mean/stddev), `exponential` (lambda)
- Per-invocation override via `delay_seconds` tool argument (always fixed)
- `/metrics` endpoint exposing latency histogram

</details>

<details>
<summary>US-5: Integration Tester - Multi-Tool Latency Profiles</summary>

**As an** Integration Tester
**I want** multiple tools with different latency profiles on the same server
**So that** I can test per-tool `timeout_ms` overrides and mixed-latency scenarios

**Acceptance Criteria:**

```gherkin
Given the slow-time-server exposes:
 - get_slow_time (configurable delay)
 - convert_slow_time (configurable delay)
 - get_instant_time (always instant, 0ms)
 - get_timeout_time (always exceeds 5min, for timeout testing)
When I register tools with different timeout_ms values:
 - get_slow_time: timeout_ms=10000
 - get_timeout_time: timeout_ms=5000
Then each tool should respect its per-tool timeout independently
```

</details>

<details>
<summary>US-6: DevOps Engineer - Docker Compose Integration</summary>

**As a** DevOps Engineer
**I want** the slow-time-server in the performance testing Docker Compose stack
**So that** I can run timeout and resilience tests in CI/CD

**Acceptance Criteria:**

```gherkin
Given docker-compose-performance.yml includes the slow-time-server
When I run: docker compose -f docker-compose-performance.yml up
Then the slow-time-server should be accessible on port 8081
And the gateway should have it pre-registered as a gateway
And Locust tests can target it alongside the fast-time-server
```

</details>

---

## 🏗 Architecture

### Tool Inventory

| Tool | Description | Latency Behaviour | Purpose |
|------|-------------|-------------------|---------|
| `get_slow_time` | Get current time with configurable delay | Respects `delay_seconds` arg, falls back to server default | General timeout testing |
| `convert_slow_time` | Convert time between timezones with delay | Same as above | Per-tool timeout_ms testing |
| `get_instant_time` | Get current time with zero delay | Always instant (0ms) | Baseline / control tool |
| `get_timeout_time` | Get current time with extreme delay | Always sleeps 10 minutes | Guaranteed timeout testing |
| `get_flaky_time` | Get current time with random failures | Fails based on `-failure-rate` | Circuit breaker testing |

### Server Flags

```
Usage: slow-time-server [flags]

Flags:
 -transport string Transport: stdio, http, sse, dual, rest (default "stdio")
 -addr string Bind address (default "0.0.0.0")
 -port int Port (default 8081)
 -auth-token string Bearer auth token
 -log-level string Log level: debug, info, warn, error (default "info")

Latency Configuration:
 -latency duration Default tool latency (default 5s)
 -latency-distribution string Distribution: fixed, uniform, normal, exponential (default "fixed")
 -latency-min duration Min latency for uniform distribution (default 1s)
 -latency-max duration Max latency for uniform distribution (default 10s)
 -latency-mean duration Mean for normal distribution (default 5s)
 -latency-stddev duration Stddev for normal distribution (default 2s)

Failure Simulation:
 -failure-rate float Probability of failure for flaky tool (0.0-1.0, default 0.0)
 -failure-mode string Failure type: timeout, error, panic (default "timeout")
 -seed int Random seed for reproducibility (default: time-based)

Environment Variables:
 DEFAULT_LATENCY Override -latency (e.g., "5s", "30s", "2m")
 FAILURE_RATE Override -failure-rate
 AUTH_TOKEN Override -auth-token
```

### Tool Input Schema

```json
{
 "name": "get_slow_time",
 "description": "Get current system time with configurable artificial delay. Use delay_seconds to control response latency for timeout testing.",
 "inputSchema": {
 "type": "object",
 "properties": {
 "timezone": {
 "type": "string",
 "description": "IANA timezone name (default: UTC)",
 "default": "UTC"
 },
 "delay_seconds": {
 "type": "number",
 "description": "Override delay in seconds. If omitted, uses server default latency.",
 "minimum": 0,
 "maximum": 600
 }
 }
 },
 "annotations": {
 "readOnlyHint": true,
 "destructiveHint": false,
 "openWorldHint": false
 }
}
```

### Data Flow

```mermaid
sequenceDiagram
 participant Client as MCP Client / Locust
 participant Gateway as ContextForge Gateway
 participant Plugin as CircuitBreakerPlugin
 participant Server as slow-time-server

 Client->>Gateway: tools/call: get_slow_time {delay_seconds: 30}
 Gateway->>Plugin: tool_pre_invoke
 Plugin-->>Gateway: continue (circuit closed)
 Gateway->>Server: session.call_tool("get_slow_time", {delay_seconds: 30})

 Note over Server: time.Sleep(30s)

 alt Gateway timeout fires first (TOOL_TIMEOUT=10)
 Gateway-->>Gateway: asyncio.wait_for → TimeoutError after 10s
 Gateway->>Plugin: tool_post_invoke (timeout failure)
 Plugin-->>Plugin: Increment failure count
 Gateway-->>Client: ToolTimeoutError: "timed out after 10s"
 else Server responds within timeout
 Server-->>Gateway: {time: "2026-02-09T...", delayed_by: "30s"}
 Gateway->>Plugin: tool_post_invoke (success)
 Gateway-->>Client: ToolResult {time, delayed_by}
 end
```

---

## 📋 Implementation Tasks

### Phase 1: Core Server Implementation

- [ ] **Scaffold project structure**
 - [ ] Create `mcp-servers/go/slow-time-server/` directory
 - [ ] `go.mod` with `github.com/mark3labs/mcp-go` dependency (same version as fast-time-server)
 - [ ] `main.go` — server implementation
 - [ ] `Makefile` — mirroring fast-time-server targets
 - [ ] `Dockerfile` — multi-stage build (scratch base)
 - [ ] `.gitignore`, `.golangci.yml`, `staticcheck.conf`

- [ ] **Implement latency engine**
 - [ ] Fixed latency: `time.Sleep(duration)`
 - [ ] Uniform distribution: `rand.Float64() * (max - min) + min`
 - [ ] Normal distribution: `rand.NormFloat64() * stddev + mean` (clamped to 0)
 - [ ] Exponential distribution: `rand.ExpFloat64() / lambda`
 - [ ] Per-call `delay_seconds` override (always fixed, bypasses distribution)
 - [ ] Context-aware sleep: respect `context.Done()` for clean cancellation

- [ ] **Implement MCP tools (5 tools)**
 - [ ] `get_slow_time` — configurable delay, timezone support
 - [ ] `convert_slow_time` — configurable delay, timezone conversion
 - [ ] `get_instant_time` — zero delay (control/baseline)
 - [ ] `get_timeout_time` — 10-minute fixed delay (guaranteed timeout)
 - [ ] `get_flaky_time` — probabilistic failure based on failure_rate

- [ ] **Implement MCP resources (2 resources)**
 - [ ] `latency://config` — current server latency configuration (JSON)
 - [ ] `latency://stats` — invocation count, avg/p50/p95/p99 latency, failure count

- [ ] **Implement MCP prompts (1 prompt)**
 - [ ] `test_timeout` — generates a prompt instructing the LLM to invoke slow tools with specific delay to test timeout behaviour

- [ ] **Multi-transport support**
 - [ ] stdio, SSE, HTTP (StreamableHTTP), dual, REST — same as fast-time-server
 - [ ] Auth middleware reuse
 - [ ] CORS and logging middleware

### Phase 2: Failure Simulation

- [ ] **Failure modes for `get_flaky_time`**
 - [ ] `timeout` — sleep for 10x the configured latency (exceed any reasonable timeout)
 - [ ] `error` — return `mcp.NewToolResultError("simulated failure")`
 - [ ] `panic` — `panic("simulated crash")` with recovery middleware
 - [ ] Deterministic seeding (`-seed` flag) for reproducible test runs

- [ ] **Response metadata**
 - [ ] Include `delayed_by`, `server_default_latency`, `distribution`, `failure_simulated` in tool response
 - [ ] This allows test assertions to verify the server behaved as configured

### Phase 3: REST API & Observability

- [ ] **REST endpoints** (in `dual` and `rest` modes)
 - [ ] `GET /api/v1/time?timezone=X&delay=Y` — slow time with delay
 - [ ] `GET /api/v1/config` — current latency config
 - [ ] `GET /api/v1/stats` — invocation statistics
 - [ ] `POST /api/v1/config` — runtime latency reconfiguration (hot-reload)
 - [ ] `GET /health` — health check (always instant, no delay)
 - [ ] `GET /version` — version info

- [ ] **Prometheus metrics endpoint**
 - [ ] `GET /metrics` — tool_invocations_total, tool_latency_seconds (histogram), tool_failures_total
 - [ ] Labels: tool_name, failure_mode, transport

### Phase 4: Testing & Quality

- [ ] **Unit tests** (`main_test.go`)
 - [ ] Test each tool handler with various delay values
 - [ ] Test latency distributions produce expected ranges
 - [ ] Test failure rate simulation (deterministic seed)
 - [ ] Test context cancellation during sleep
 - [ ] Test REST endpoints
 - [ ] Test auth middleware

- [ ] **Locust load test** (`tests/loadtest/locustfile_slow_time_server.py`)
 - [ ] Scenario: gradual ramp-up (1→50 users, 2s delay per tool call)
 - [ ] Scenario: timeout storm (all users hit 120s delay with TOOL_TIMEOUT=60)
 - [ ] Scenario: mixed latency (instant + slow + timeout tools)
 - [ ] Scenario: circuit breaker exercise (50% failure rate)
 - [ ] Assertions on p95 latency, error rate, timeout rate

- [ ] **Integration test with gateway**
 - [ ] Register slow-time-server as gateway
 - [ ] Test `TOOL_TIMEOUT` enforcement via StreamableHTTP
 - [ ] Test per-tool `timeout_ms` override
 - [ ] Test `MCP_SESSION_POOL_TRANSPORT_TIMEOUT` interaction
 - [ ] Test circuit breaker plugin activation on repeated timeouts

### Phase 5: Docker & CI Integration

- [ ] **Dockerfile** — multi-stage, scratch base, ~2 MiB image
- [ ] **Add to `docker-compose-performance.yml`**
 - [ ] Service: `slow-time-server` on port 8081
 - [ ] Default latency: 5s
 - [ ] Environment: `DEFAULT_LATENCY=5s`, `FAILURE_RATE=0.1`
- [ ] **CI workflow** (optional)
 - [ ] Build and test in GitHub Actions
 - [ ] Publish container image alongside fast-time-server

### Phase 6: Documentation

- [ ] **README.md** — usage, flags, examples, Docker, tool reference
- [ ] **Update `mcp-servers/AGENTS.md`** — add slow-time-server entry
- [ ] **Update `docs/docs/manage/tuning.md`** — reference slow-time-server for timeout tuning validation

---

## ⚙️ Configuration Examples

### Basic: 5-second latency for timeout testing

```bash
./slow-time-server -transport=dual -port=8081 -latency=5s
```

### Circuit breaker testing: 30% failure rate

```bash
./slow-time-server -transport=dual -port=8081 \
 -latency=2s -failure-rate=0.3 -failure-mode=error -seed=42
```

### Realistic distribution: normal with occasional outliers

```bash
./slow-time-server -transport=dual -port=8081 \
 -latency-distribution=normal -latency-mean=5s -latency-stddev=3s
```

### Docker Compose

```yaml
# docker-compose-performance.yml
services:
 slow-time-server:
 build:
 context: ./mcp-servers/go/slow-time-server
 ports:
 - "8081:8081"
 environment:
 DEFAULT_LATENCY: "5s"
 FAILURE_RATE: "0.1"
 command: ["-transport=dual", "-port=8081", "-listen=0.0.0.0"]
```

### Gateway registration

```bash
# Register the slow-time-server as a gateway
curl -X POST http://localhost:4444/api/v1/gateways \
 -H "Authorization: Bearer $TOKEN" \
 -H "Content-Type: application/json" \
 -d '{
 "name": "slow-time-server",
 "url": "http://slow-time-server:8081",
 "transport": "streamablehttp"
 }'

# Override timeout for the guaranteed-timeout tool
curl -X PATCH http://localhost:4444/api/v1/tools/<get_timeout_time_id> \
 -H "Authorization: Bearer $TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"timeout_ms": 5000}'
```

---

## ✅ Success Criteria

- [ ] **Functionality**: All 5 tools work across all transports (stdio, SSE, StreamableHTTP, dual, REST)
- [ ] **Latency**: Configurable via flags, env vars, and per-call `delay_seconds` argument
- [ ] **Distributions**: Fixed, uniform, normal, and exponential latency distributions produce correct ranges
- [ ] **Failure Simulation**: `get_flaky_time` fails at configured rate with deterministic seeding
- [ ] **Gateway Integration**: Successfully registered and invoked through ContextForge gateway
- [ ] **Timeout Testing**: Gateway `TOOL_TIMEOUT`, per-tool `timeout_ms`, and pool transport timeout all validated
- [ ] **Circuit Breaker**: CircuitBreakerPlugin correctly opens/closes with configurable failure rates
- [ ] **Load Testing**: Locust scenarios run successfully with expected latency profiles
- [ ] **Docker**: Builds as ~2 MiB scratch image, runs in docker-compose-performance.yml
- [ ] **Testing**: Unit tests pass with race detection (`go test -race`), >80% coverage
- [ ] **Documentation**: README with full flag reference, examples, and Docker instructions
- [ ] **Quality**: Passes `make lint`, `make vet`, `make staticcheck`

---

## 🏁 Definition of Done

- [ ] Go module scaffolded in `mcp-servers/go/slow-time-server/`
- [ ] 5 MCP tools implemented with configurable latency
- [ ] 2 MCP resources (config, stats) implemented
- [ ] 4 latency distributions (fixed, uniform, normal, exponential) working
- [ ] Failure simulation with configurable rate and mode
- [ ] All 5 transports supported (stdio, SSE, HTTP, dual, REST)
- [ ] Unit tests with race detection and >80% coverage
- [ ] Locust load test file created in `tests/loadtest/`
- [ ] Dockerfile producing scratch-based ~2 MiB image
- [ ] Added to `docker-compose-performance.yml`
- [ ] README.md with complete documentation
- [ ] Passes `make lint vet staticcheck test`

---

## 📝 Additional Notes

🔹 **Relationship to fast-time-server**: The slow-time-server shares the same MCP SDK, transport infrastructure, auth middleware, and code patterns as `fast-time-server`. It adds latency injection and failure simulation on top. Consider extracting shared code into a common package if duplication becomes significant.

🔹 **Context-aware sleep**: Use `select { case <-ctx.Done(): return ctx.Err(); case <-time.After(delay): }` rather than bare `time.Sleep()` so that tool cancellations propagate cleanly and don't leave goroutines hanging.

🔹 **Runtime reconfiguration**: The `POST /api/v1/config` endpoint allows changing latency parameters without restarting the server. This is useful for scripted test scenarios that need different latency profiles in sequence.

🔹 **Deterministic mode**: The `-seed` flag ensures `get_flaky_time` and distribution-based latencies produce the same sequence across runs, enabling reproducible CI test assertions.

🔹 **Port convention**: fast-time-server uses 8080, slow-time-server uses 8081, avoiding conflicts in compose stacks.

---

## 🔗 Related Issues

- #2781 — MCP toolkit tool invocation returns an error Tool invocation failed (motivating issue)
- #2782 — Preserve timeout error message through ExceptionGroup unwrapping
- #2569 — Enforce per-tool timeouts and enhanced circuit breaker plugin
- #2347 — Automated MCP server compatibility regression suite
- #2636 — Achieve 100% Locust Load Test Coverage for REST APIs
- #2473 — Load Testing, Stress Testing, and Benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC][TESTING]: Slow Time Server - configurable-latency MCP server for timeout, resilience, and load testing #2783

⏱ Epic: Slow Time Server - Configurable-Latency MCP Server for Timeout, Resilience, and Load Testing

Goal

Why Now?

📖 User Stories

🏗 Architecture

Tool Inventory

Server Flags

Tool Input Schema

Data Flow

📋 Implementation Tasks

Phase 1: Core Server Implementation

Phase 2: Failure Simulation

Phase 3: REST API & Observability

Phase 4: Testing & Quality

Phase 5: Docker & CI Integration

Phase 6: Documentation

⚙️ Configuration Examples

Basic: 5-second latency for timeout testing

Circuit breaker testing: 30% failure rate

Realistic distribution: normal with occasional outliers

Docker Compose

Gateway registration

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool	Description	Latency Behaviour	Purpose
`get_slow_time`	Get current time with configurable delay	Respects `delay_seconds` arg, falls back to server default	General timeout testing
`convert_slow_time`	Convert time between timezones with delay	Same as above	Per-tool timeout_ms testing
`get_instant_time`	Get current time with zero delay	Always instant (0ms)	Baseline / control tool
`get_timeout_time`	Get current time with extreme delay	Always sleeps 10 minutes	Guaranteed timeout testing
`get_flaky_time`	Get current time with random failures	Fails based on `-failure-rate`	Circuit breaker testing

[EPIC][TESTING]: Slow Time Server - configurable-latency MCP server for timeout, resilience, and load testing #2783

Description

⏱ Epic: Slow Time Server - Configurable-Latency MCP Server for Timeout, Resilience, and Load Testing

Goal

Why Now?

📖 User Stories

🏗 Architecture

Tool Inventory

Server Flags

Tool Input Schema

Data Flow

📋 Implementation Tasks

Phase 1: Core Server Implementation

Phase 2: Failure Simulation

Phase 3: REST API & Observability

Phase 4: Testing & Quality

Phase 5: Docker & CI Integration

Phase 6: Documentation

⚙️ Configuration Examples

Basic: 5-second latency for timeout testing

Circuit breaker testing: 30% failure rate

Realistic distribution: normal with occasional outliers

Docker Compose

Gateway registration

✅ Success Criteria

🏁 Definition of Done

📝 Additional Notes

🔗 Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions