fix(kanban): change synchronous=NORMAL to FULL + add wal_autocheckpoint=100 by someaka · Pull Request #31731 · NousResearch/hermes-agent

someaka · 2026-05-24T23:51:17Z

Problem

PRAGMA synchronous=NORMAL (line 1184 of hermes_cli/kanban_db.py) defers fsync in WAL mode, leaving kanban.db vulnerable to corruption when a process is SIGKILL'd mid-transaction or when WAL frames are partially written during concurrent access. This causes database disk image is malformed errors after ~9-10 rapid task creations.

Root Cause

synchronous=NORMAL means SQLite only syncs at checkpoint boundaries, not after every WAL frame write. If a writer process is killed mid-transaction (SIGKILL from reclaim, gateway shutdown, OOM killer), WAL frames may be written but the main DB is left in an inconsistent state. Next connection: malformed DB.

Also, the default 1000-page WAL checkpoint threshold lets the WAL grow large between checkpoints, widening the window where a SIGKILL can leave a huge WAL in a fragile state.

Fix — 2 lines, 1 file

synchronous=NORMAL → FULL: ensures every WAL frame is fsync'd before the write completes, preventing WAL/main-DB inconsistency even after SIGKILL
wal_autocheckpoint=100: caps WAL at 100 pages between automatic checkpoints — bounds the checkpoint I/O spike and reduces the window where a large WAL is fragile

These match the fix proposed in upstream PRs #30969 / #30973 which are not yet merged.

Trade-off

synchronous=FULL adds one fsync per write — ~1ms overhead on SSD, ~5-10ms on HDD. For kanban (create/show/complete operations, not a high-throughput OLTP path), this is negligible vs. the cost of DB corruption and manual recovery.

 hermes_cli/kanban_db.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Fixes: #31502
Related: #31618, #30896

…nt=100 PRAGMA synchronous=NORMAL defers fsync in WAL mode, leaving kanban.db vulnerable to corruption when a process is SIGKILL'd mid-transaction or when WAL frames are partially written during concurrent access. - synchronous=NORMAL → FULL: fsync every WAL frame before write completes - wal_autocheckpoint=100: limit WAL to 100 pages between auto-checkpoints Fixes: NousResearch#31502 Related: NousResearch#31618, NousResearch#30896

alt-glitch · 2026-05-25T00:00:15Z

Competing with open PRs #30973 (synchronous=FULL + wal_autocheckpoint=100) and #31208 (secure_delete + cell_size_check + synchronous=FULL) — all target the same kanban SQLite corruption under concurrent writes. Clean replacement of stacked #31726. Related issue: #31618 (corruption recurs even with these PRAGMAs under SIGKILL).

kshitijk4poor · 2026-05-28T06:39:39Z

Closing as already fixed on main — landed via #33482 commit 6416dd518 (@steveonjava's batch-salvage). That commit makes the exact same synchronous=NORMAL → FULL + wal_autocheckpoint=100 change you proposed here, and also adds secure_delete=ON + cell_size_check=ON for additional torn-write hardening. Thanks for tackling the same problem — same direction, just landed via the batch.

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard labels May 24, 2026

alt-glitch mentioned this pull request May 25, 2026

fix(kanban): change synchronous=NORMAL to FULL + add wal_autocheckpoint=100 #31726

Closed

steveonjava mentioned this pull request May 25, 2026

fix(kanban): merge complete_task and recompute_ready into a single write txn #31891

Closed

8 tasks

ddblue0 mentioned this pull request May 25, 2026

Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure #31736

Closed

kshitijk4poor closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kanban): change synchronous=NORMAL to FULL + add wal_autocheckpoint=100#31731

fix(kanban): change synchronous=NORMAL to FULL + add wal_autocheckpoint=100#31731
someaka wants to merge 1 commit into
NousResearch:mainfrom
someaka:fix/31502-kanban-synchronous-full-clean

someaka commented May 24, 2026

Uh oh!

alt-glitch commented May 25, 2026

Uh oh!

kshitijk4poor commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

someaka commented May 24, 2026

Problem

Root Cause

Fix — 2 lines, 1 file

Trade-off

Uh oh!

alt-glitch commented May 25, 2026

Uh oh!

kshitijk4poor commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants