Skip to content

Fix: SQLite WAL + BTRFS COW compatibility — busy_timeout + retry logic #30576

@savier89

Description

@savier89

Problem

SQLite WAL mode conflicts with BTRFS COW causing lock timeouts and Dashboard/Kanban failures.

Root Cause

  • BTRFS COW + SQLite WAL = concurrent write conflicts
  • Default busy_timeout too short (1000ms)
  • No retry logic for WAL initialization

Solution (tested in hermes_state.py)

  • Increased busy_timeout from 1000ms to 30000ms (30 sec)
  • Added retry logic: 3 attempts, 1s delay between tries
  • Proper exception handling and fallback on failure

Test Results

  • 400/400 concurrent operations completed successfully
  • 0 errors under load
  • Services stable after restart (hermes-gateway, hermes-dashboard)

Patch Location

See: https://github.com/savier89/hermes-btrfs-fix

Files Changed

  • hermes_state.py: WAL initialization with busy_timeout + retry loop

Why This Works

BTRFS COW causes temporary file locking during writes. WAL mode needs longer timeouts and retry to handle these transient conflicts. 30s timeout + 3 retries gives enough time for BTRFS to complete COW operations without falling back to DELETE mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions