Bug Report: EPERM on auth-profiles.json causes full gateway failure cascade
Summary
auth-profiles.json can acquire a Windows ReadOnly attribute during concurrent config writes, causing every LLM request to fail with EPERM: operation not permitted. The error is treated as fatal rather than non-fatal, which cascades through the fallback chain and makes the gateway completely unresponsive.
Environment
- OpenClaw: 2026.4.5 (3e72c03)
- OS: Windows 11 (10.0.26200, x64)
- Node: v24.14.1
- Providers: Anthropic (claude-opus-4-6), Ollama (glm-4.7-flash, gemma4:26b)
Steps to Reproduce
- Have a running gateway with Anthropic as primary model and Ollama as fallback (or vice versa)
- Add a new Ollama model to the config while the gateway is running (e.g., adding
gemma4:26b to openclaw.json models list)
- The gateway hot-reloads the config and updates
models.json and auth-profiles.json
- Under certain timing conditions,
auth-profiles.json acquires the Windows ReadOnly file attribute
- All subsequent LLM requests fail
Observed Behavior
Once ReadOnly is set on auth-profiles.json:
- Every LLM request attempts to write to
auth-profiles.json (via markAuthProfileGood)
- The atomic write (
copyFileSync from .tmp to target) fails with EPERM
- This error is treated as a request-level failure, not just a profile-save failure
- The fallback system activates: primary model (e.g.,
ollama/glm-4.7-flash) → fallback model (e.g., anthropic/claude-opus-4-6)
- The fallback model hits the same EPERM on the same file → also fails
- Result: "All models failed" — complete gateway unresponsiveness
- Gateway restarts do NOT fix it (the ReadOnly attribute persists on disk)
- Each retry cycle inflates the session context with error metadata, rapidly consuming the context window
Stack Trace
Error: EPERM: operation not permitted, copyfile
'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json.<uuid>.tmp'
-> 'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json'
at Object.copyFileSync (node:fs:3104:11)
at renameJsonFileWithFallback (json-file-1PGlTqjr.js:63:7)
at saveJsonFile (json-file-1PGlTqjr.js:98:3)
at saveAuthProfileStore (store-HF_Z-jKz.js:427:2)
at markAuthProfileGood (profiles-DKQdaSwr.js:76:2)
at pi-embedded-DWASRjxE.js:36473:7
Expected Behavior
- Profile write failure should be non-fatal. Failing to save "this API key worked" should not abort the entire LLM request. The response was already received — the profile write is bookkeeping.
- Atomic file writes should handle ReadOnly gracefully.
renameJsonFileWithFallback should detect the ReadOnly attribute and either clear it or log a warning rather than throwing a fatal error.
- Error-loop inflation should be bounded. Failed retries should not dump error metadata into the session context, as this accelerates context exhaustion.
Workaround
attrib -R "C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json"
Then restart the gateway.
Impact
- Gateway becomes completely unresponsive (no LLM requests succeed)
- Gateway restarts do not fix it (file attribute persists)
- Fallback chain burns paid API tokens on requests that will fail anyway
- Session context inflates rapidly from error metadata (~84% of 200k context window in minutes)
- User must manually identify and fix the file attribute — no error message points to the actual cause
Probable Root Cause
Race condition in the atomic JSON file write logic (renameJsonFileWithFallback) when multiple config files are being updated concurrently during hot-reload. On Windows, a failed rename falling back to copyFileSync may leave the target file with a ReadOnly attribute under certain timing conditions, or Windows itself may set ReadOnly as a protective measure during concurrent file access.
Bug Report: EPERM on auth-profiles.json causes full gateway failure cascade
Summary
auth-profiles.jsoncan acquire a Windows ReadOnly attribute during concurrent config writes, causing every LLM request to fail withEPERM: operation not permitted. The error is treated as fatal rather than non-fatal, which cascades through the fallback chain and makes the gateway completely unresponsive.Environment
Steps to Reproduce
gemma4:26btoopenclaw.jsonmodels list)models.jsonandauth-profiles.jsonauth-profiles.jsonacquires the Windows ReadOnly file attributeObserved Behavior
Once ReadOnly is set on
auth-profiles.json:auth-profiles.json(viamarkAuthProfileGood)copyFileSyncfrom.tmpto target) fails withEPERMollama/glm-4.7-flash) → fallback model (e.g.,anthropic/claude-opus-4-6)Stack Trace
Expected Behavior
renameJsonFileWithFallbackshould detect the ReadOnly attribute and either clear it or log a warning rather than throwing a fatal error.Workaround
Then restart the gateway.
Impact
Probable Root Cause
Race condition in the atomic JSON file write logic (
renameJsonFileWithFallback) when multiple config files are being updated concurrently during hot-reload. On Windows, a failedrenamefalling back tocopyFileSyncmay leave the target file with a ReadOnly attribute under certain timing conditions, or Windows itself may set ReadOnly as a protective measure during concurrent file access.