Skip to content

[Bug]: Session file lock errors cascade through model fallback chain #66646

@pescaohq-web

Description

@pescaohq-web

Description

When a session file is locked (e.g., concurrent cron + dispatch hitting the same agent), the gateway treats the lock timeout as a model failure and cascades through the entire fallback chain. Each fallback attempt hits the same lock, wasting 10s × N models before failing entirely.

Expected behavior

Session file lock errors should be classified as infrastructure errors, not model failures. The gateway should:

  1. Fail fast with "session busy" — do not enter model fallback
  2. Queue/retry the request after a brief backoff
  3. Never substitute a different model for a file I/O contention issue

Actual behavior

[model-fallback/decision] candidate=opus-4-6 reason=timeout detail=session file locked (pid=37121)
[model-fallback/decision] candidate=gpt-5.4-pro reason=timeout detail=session file locked (pid=37121)  
[model-fallback/decision] candidate=sonnet-4-5 reason=timeout detail=session file locked (pid=37121)
Embedded agent failed: All models failed (3): session file locked...

Same PID, same lock, three failed attempts at 10s each = 30s wasted.

Impact

Environment

  • OpenClaw 2026.4.14 (stable)
  • macOS 26.3.1, M4 Pro, 48 agents
  • maxConcurrent: 14
  • Fallback chain: opus-4-6 → gpt-5.4-pro → sonnet-4-5

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions