Skip to content

fix(state): proactively skip WAL journal mode on BTRFS filesystems (#30846)#31586

Open
Tranquil-Flow wants to merge 2 commits into
NousResearch:mainfrom
Tranquil-Flow:fix/30846-btrfs-wal-proactive-detection
Open

fix(state): proactively skip WAL journal mode on BTRFS filesystems (#30846)#31586
Tranquil-Flow wants to merge 2 commits into
NousResearch:mainfrom
Tranquil-Flow:fix/30846-btrfs-wal-proactive-detection

Conversation

@Tranquil-Flow

Copy link
Copy Markdown
Contributor

What

SQLite WAL journal mode is incompatible with BTRFS Copy-on-Write. The existing exception-based fallback (catching OperationalError) is too late — by the time SQLite raises, COW may have already corrupted WAL records. Users on BTRFS get persistent disk I/O error without understanding the root cause.

Fix

Proactive BTRFS detection via /proc/self/mountinfo (Linux only). apply_wal_with_fallback() now accepts an optional db_path parameter. If the path resides on BTRFS, WAL is skipped entirely — before any pragma is executed — with a clear warning directing users to chattr +C.

  • hermes_state.py — +81: _is_on_btrfs(), _decode_mountinfo_path(), proactive skip in apply_wal_with_fallback()
  • Callers just add db_path= argument (2 chars each): gateway/platforms/api_server.py, hermes_cli/kanban_db.py, plugins/memory/holographic/store.py
  • db_path=None preserves full backward compatibility

Mountinfo parsing handles edge cases

  • Locates filesystem type after the - separator (not fixed field index)
  • Decodes octal escapes (\040 → space) for mount points with spaces
  • Uses os.path.commonpath() not raw startswith()/homebrew won't match /home

Tests

python3 -m pytest tests/test_hermes_state_wal_fallback.py -q -o addopts=
189 passed (9 new BTRFS tests)

9 regression tests: mountinfo parser shape, path boundary guard, octal escape decoding, proactive WAL skip, warning deduplication, backward-compatible db_path=None, post-fallback DB writability, non-Linux no-op, API contract.

Fail-without-fix: reverting to upstream hermes_state.py → 9/9 BTRFS tests fail.

Competitor check

Adjacent WAL PRs (#30700, #30823, #31294, #31014, #30654, #31130) address related SQLite/WAL issues but none implements proactive BTRFS filesystem detection for #30846.

Closes #30846

@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins labels May 24, 2026
@mohamedorigami-jpg

Copy link
Copy Markdown
Contributor

Nice approach. Proactive BTRFS detection via /proc/self/mountinfo is much cleaner than waiting for silent corruption and debugging backwards. The long-to-shortest-mount-path matching handles bind mounts correctly, and gating on /proc/self/mountinfo existence keeps it a no-op on macOS/Windows without needing explicit platform checks.

I recently worked on the kanban WAL lifecycle (#31130 -- WAL fd leak on connect/close cycles) and this BTRFS edge case is a different class of the same 'WAL is great until your filesystem does something unexpected' problem. The kanban_db.py hunk here is minimal and surgical.

@Tranquil-Flow Tranquil-Flow force-pushed the fix/30846-btrfs-wal-proactive-detection branch from 46cd282 to 33eca0c Compare May 25, 2026 10:59
…ousResearch#30846)

BTRFS Copy-on-Write can modify disk blocks after WAL records them,
producing silent database corruption. This change adds proactive
BTRFS detection via /proc/self/mountinfo (Linux-only) and skips
WAL entirely on BTRFS, falling back to DELETE journal mode.

- Add _is_on_btrfs() helper that parses /proc/self/mountinfo
- Update apply_wal_with_fallback() to accept optional db_path and
  check BTRFS before attempting PRAGMA journal_mode=WAL
- Pass db_path from SessionDB, kanban_db, api_server, and
  holographic memory store callers
- Add regression tests for BTRFS detection and proactive WAL skip
@Tranquil-Flow Tranquil-Flow force-pushed the fix/30846-btrfs-wal-proactive-detection branch from 33eca0c to dc55360 Compare May 25, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: BTRFS COW + SQLite WAL incompatibility — disk I/O errors on BTRFS

3 participants