-
Notifications
You must be signed in to change notification settings - Fork 613
[TESTING]: Locust load test reports false failures for 409 Conflict on state change endpoints #2566
Description
✅ Test Summary
The Locust load test (tests/loadtest/locustfile.py) incorrectly reports failures when state change endpoints return 409 Conflict. This is expected behavior under high concurrency due to optimistic locking, not an actual failure.
🧪 Test Type
- Integration / end-to-end tests
- Other: Load testing (Locust)
🧬 Scope & Affected Components
-
mcpgatewaycore (API logic, handlers) - Other: Load testing infrastructure
🐞 Problem
When running make load-test-ui with 4000 concurrent users, the test reports ~100+ failures like:
CatchResponseError('Expected [200, 403, 404], got 409')
Affected endpoints:
| Endpoint | Occurrences |
|---|---|
/servers/[id]/state |
~91 |
/tools/[id]/state |
~25 |
/resources/[id]/state |
~3 |
🔍 Root Cause
The 409 errors occur due to race conditions when multiple users try to toggle the same entity's state simultaneously:
Time User A User B Result
────────────────────────────────────────────────────────────────────────
T1 POST /servers/abc/state POST /servers/abc/state
T2 Read: enabled=true Read: enabled=true
T3 Write: enabled=false ✅ (waiting for lock)
T4 Write conflict → 409 ❌
The server correctly returns 409 Conflict to prevent lost updates (optimistic locking). This is correct behavior, not a bug.
🔧 Fix
The state change functions in locustfile.py need to include 409 in their allowed response codes:
Current code (lines 1261, 1276, 1291, 1306, 1321):
self._validate_json_response(response, allowed_codes=[200, 403, 404])Fixed code:
self._validate_json_response(response, allowed_codes=[200, 403, 404, 409])Functions to update:
set_server_state()- line 1261set_tool_state()- line 1276set_resource_state()- line 1291set_prompt_state()- line 1306set_gateway_state()- line 1321
📋 Acceptance Criteria
- Add
409toallowed_codesfor all 5 state change functions - Update comments to explain why 409 is acceptable (concurrent state changes)
- Load test with 4000 users shows no false failures for state endpoints
- Code passes
make verify
📓 Additional Context
Load test metrics showing the issue:
Total Requests: 825,874
Total Failures: 1,000 (0.121%)
- 409 Conflict on state changes: ~116 (11.6% of failures)
The 409 errors represent only 0.014% of total requests - this is expected and healthy behavior under concurrent load, not a system failure.
🧠 Environment Info
| Key | Value |
|---|---|
| Gateway version | main branch |
| Python version | 3.12 |
| Load test tool | Locust |
| Concurrent users | 4000 |
| Platform | Docker Compose (3 gateway replicas) |
📎 Related
- File:
tests/loadtest/locustfile.py - Functions:
set_server_state,set_tool_state,set_resource_state,set_prompt_state,set_gateway_state