Description
When a session file is locked (e.g., concurrent cron + dispatch hitting the same agent), the gateway treats the lock timeout as a model failure and cascades through the entire fallback chain. Each fallback attempt hits the same lock, wasting 10s × N models before failing entirely.
Expected behavior
Session file lock errors should be classified as infrastructure errors, not model failures. The gateway should:
- Fail fast with "session busy" — do not enter model fallback
- Queue/retry the request after a brief backoff
- Never substitute a different model for a file I/O contention issue
Actual behavior
[model-fallback/decision] candidate=opus-4-6 reason=timeout detail=session file locked (pid=37121)
[model-fallback/decision] candidate=gpt-5.4-pro reason=timeout detail=session file locked (pid=37121)
[model-fallback/decision] candidate=sonnet-4-5 reason=timeout detail=session file locked (pid=37121)
Embedded agent failed: All models failed (3): session file locked...
Same PID, same lock, three failed attempts at 10s each = 30s wasted.
Impact
Environment
- OpenClaw 2026.4.14 (stable)
- macOS 26.3.1, M4 Pro, 48 agents
- maxConcurrent: 14
- Fallback chain: opus-4-6 → gpt-5.4-pro → sonnet-4-5
Related
Description
When a session file is locked (e.g., concurrent cron + dispatch hitting the same agent), the gateway treats the lock timeout as a model failure and cascades through the entire fallback chain. Each fallback attempt hits the same lock, wasting 10s × N models before failing entirely.
Expected behavior
Session file lock errors should be classified as infrastructure errors, not model failures. The gateway should:
Actual behavior
Same PID, same lock, three failed attempts at 10s each = 30s wasted.
Impact
Environment
Related