Skip to content

EPERM on auth-profiles.json causes full gateway failure cascade (Windows) #62099

@Hag-Fish

Description

@Hag-Fish

Bug Report: EPERM on auth-profiles.json causes full gateway failure cascade

Summary

auth-profiles.json can acquire a Windows ReadOnly attribute during concurrent config writes, causing every LLM request to fail with EPERM: operation not permitted. The error is treated as fatal rather than non-fatal, which cascades through the fallback chain and makes the gateway completely unresponsive.

Environment

  • OpenClaw: 2026.4.5 (3e72c03)
  • OS: Windows 11 (10.0.26200, x64)
  • Node: v24.14.1
  • Providers: Anthropic (claude-opus-4-6), Ollama (glm-4.7-flash, gemma4:26b)

Steps to Reproduce

  1. Have a running gateway with Anthropic as primary model and Ollama as fallback (or vice versa)
  2. Add a new Ollama model to the config while the gateway is running (e.g., adding gemma4:26b to openclaw.json models list)
  3. The gateway hot-reloads the config and updates models.json and auth-profiles.json
  4. Under certain timing conditions, auth-profiles.json acquires the Windows ReadOnly file attribute
  5. All subsequent LLM requests fail

Observed Behavior

Once ReadOnly is set on auth-profiles.json:

  1. Every LLM request attempts to write to auth-profiles.json (via markAuthProfileGood)
  2. The atomic write (copyFileSync from .tmp to target) fails with EPERM
  3. This error is treated as a request-level failure, not just a profile-save failure
  4. The fallback system activates: primary model (e.g., ollama/glm-4.7-flash) → fallback model (e.g., anthropic/claude-opus-4-6)
  5. The fallback model hits the same EPERM on the same file → also fails
  6. Result: "All models failed" — complete gateway unresponsiveness
  7. Gateway restarts do NOT fix it (the ReadOnly attribute persists on disk)
  8. Each retry cycle inflates the session context with error metadata, rapidly consuming the context window

Stack Trace

Error: EPERM: operation not permitted, copyfile 
  'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json.<uuid>.tmp' 
  -> 'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback (json-file-1PGlTqjr.js:63:7)
    at saveJsonFile (json-file-1PGlTqjr.js:98:3)
    at saveAuthProfileStore (store-HF_Z-jKz.js:427:2)
    at markAuthProfileGood (profiles-DKQdaSwr.js:76:2)
    at pi-embedded-DWASRjxE.js:36473:7

Expected Behavior

  1. Profile write failure should be non-fatal. Failing to save "this API key worked" should not abort the entire LLM request. The response was already received — the profile write is bookkeeping.
  2. Atomic file writes should handle ReadOnly gracefully. renameJsonFileWithFallback should detect the ReadOnly attribute and either clear it or log a warning rather than throwing a fatal error.
  3. Error-loop inflation should be bounded. Failed retries should not dump error metadata into the session context, as this accelerates context exhaustion.

Workaround

attrib -R "C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json"

Then restart the gateway.

Impact

  • Gateway becomes completely unresponsive (no LLM requests succeed)
  • Gateway restarts do not fix it (file attribute persists)
  • Fallback chain burns paid API tokens on requests that will fail anyway
  • Session context inflates rapidly from error metadata (~84% of 200k context window in minutes)
  • User must manually identify and fix the file attribute — no error message points to the actual cause

Probable Root Cause

Race condition in the atomic JSON file write logic (renameJsonFileWithFallback) when multiple config files are being updated concurrently during hot-reload. On Windows, a failed rename falling back to copyFileSync may leave the target file with a ReadOnly attribute under certain timing conditions, or Windows itself may set ReadOnly as a protective measure during concurrent file access.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions