Skip to content

fix(memory): set PRAGMA busy_timeout on every SQLite connection open#39183

Closed
MumuTW wants to merge 2 commits intoopenclaw:mainfrom
MumuTW:fix/sqlite-busy-timeout-on-connect
Closed

fix(memory): set PRAGMA busy_timeout on every SQLite connection open#39183
MumuTW wants to merge 2 commits intoopenclaw:mainfrom
MumuTW:fix/sqlite-busy-timeout-on-connect

Conversation

@MumuTW
Copy link
Copy Markdown
Contributor

@MumuTW MumuTW commented Mar 7, 2026

Summary

Fixes #39176

PRAGMA busy_timeout is per-connection and resets to 0 when the process restarts. With 10+ concurrent processes (gateway, cron agents, subagents) hitting the same SQLite file, a zero timeout means any concurrent write instantly fails with SQLITE_BUSY instead of retrying.

Changes

  • src/memory/manager-sync-ops.ts: Set PRAGMA busy_timeout = 5000 in openDatabaseAtPath(), which is used by all write connection opens (initial, vacuum, reopen)
  • src/memory/qmd-manager.ts: Raise busy_timeout from 1 ms to 5000 ms on the read-only QMD connection

Test plan

  • pnpm vitest run src/memory/index.test.ts — 10 tests pass
  • pnpm vitest run src/memory/qmd-manager.test.ts — 56 tests pass
  • Manual: restart gateway, verify PRAGMA busy_timeout returns 5000 on active connections

busy_timeout is per-connection and resets to 0 when the process
restarts.  With 10+ concurrent processes (gateway, cron agents,
subagents) hitting the same SQLite file, a zero timeout means any
concurrent write instantly fails with SQLITE_BUSY instead of retrying.

Set busy_timeout=5000 in both connection sites:
- manager-sync-ops.ts openDatabaseAtPath (write connections)
- qmd-manager.ts ensureDb (read-only connection, raised from 1 ms)

Closes openclaw#39176
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 7, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Main-thread blocking DoS risk from 1s SQLite busy_timeout on synchronous QMD read connection
2 🟡 Medium Node.js event-loop blocking DoS risk from SQLite DatabaseSync busy_timeout=5000 on request-path connection

1. 🟡 Main-thread blocking DoS risk from 1s SQLite busy_timeout on synchronous QMD read connection

Property Value
Severity Medium
CWE CWE-400
Location src/memory/qmd-manager.ts:1557-1565

Description

QmdMemoryManager opens its QMD index database using Node’s synchronous DatabaseSync (runs on the main thread) and now sets PRAGMA busy_timeout = 1000.

Impact:

  • When another process/thread holds a write lock on the QMD index (e.g., the QMD updater process qmd update/qmd embed running concurrently), SQLite calls on this connection can block synchronously for up to 1 second per statement instead of failing fast.
  • These synchronous calls occur on the latency-sensitive recall path:
    • QmdMemoryManager.search() resolves each result’s docid via resolveDocLocation() which calls ensureDb() and then executes synchronous db.prepare(...).all(...) queries.
    • The memory_search tool (used via the HTTP gateway /tools/invoke) calls manager.search(...) and then manager.status() (which calls readCounts() -> ensureDb() -> synchronous SQL).
  • Increasing the busy timeout from 1ms to 1000ms therefore introduces an availability/DoS vector: an authenticated caller can repeatedly invoke memory_search during periods of index write activity to stall the Node event loop and degrade service responsiveness for other requests.

Vulnerable change:

this.db = new DatabaseSync(this.indexPath, { readOnly: true });
this.db.exec("PRAGMA busy_timeout = 1000");

Recommendation

Avoid long blocking waits on the main thread for the QMD read-only recall path.

Options:

  1. Fail fast and fall back (restore a very small busy timeout) so the existing fallback manager can take over:
this.db = new DatabaseSync(this.indexPath, { readOnly: true });
this.db.exec("PRAGMA busy_timeout = 1"); // or 0
  1. Move SQLite reads off the main thread (preferred if you want retry behavior):
  • Use an async SQLite API / worker thread for docid->path lookups, or
  • Isolate DB access into a worker so busy waits don’t block the event loop.
  1. If retries are required, implement application-level retries with backoff and yielding (only feasible if DB access is not synchronous), rather than relying on busy_timeout.

2. 🟡 Node.js event-loop blocking DoS risk from SQLite DatabaseSync busy_timeout=5000 on request-path connection

Property Value
Severity Medium
CWE CWE-400
Location src/memory/manager-sync-ops.ts:257-266

Description

The builtin memory search/index uses node:sqlite's DatabaseSync (synchronous SQLite I/O on the Node.js main thread). This change sets a 5-second PRAGMA busy_timeout on every connection open, which can turn lock contention into multi-second event-loop stalls instead of a fast SQLITE_BUSY failure.

Concrete call chain to the affected connection:

  • External request: POST /tools/invoke (gateway) → tool memory_search
  • createMemorySearchTool.execute()getMemorySearchManager()MemoryIndexManager.get()
  • new MemoryIndexManager(...)this.db = this.openDatabase()openDatabaseAtPath() sets busy_timeout = 5000
  • Request path continues in MemoryIndexManager.search() which performs synchronous SQLite queries (db.prepare(...).all()/get())

Why this can be abused into a DoS:

  • With busy_timeout=5000, any statement encountering a lock (from concurrent sync/reindex in-process or another process using the same DB file) may block up to 5 seconds.
  • Because operations are synchronous (DatabaseSync), that wait occurs on the Node.js event loop, stalling unrelated requests and timers.
  • MemoryIndexManager.search() can also trigger background sync (this.sync({ reason: "search" })) when dirty; sync performs many writes (DELETE/INSERT loops) and can increase lock contention.

Vulnerable code (changed):

const db = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled });
db.exec("PRAGMA busy_timeout = 5000");

Impact: a client that can trigger memory_search (or any path that uses this manager) during lock contention can cause repeated 5s event-loop stalls, amplifying into request queueing/thread starvation and service unavailability.

Recommendation

Avoid long busy_timeout waits on the request-path DatabaseSync connection.

Options (choose one):

  1. Use separate connections: a read-only/search connection with a small timeout, and a background write/sync connection with longer timeout.
// Request-path (search): fail fast to avoid event-loop stalls
const readDb = new DatabaseSync(dbPath, { readOnly: true, allowExtension: false });
readDb.exec("PRAGMA busy_timeout = 50");// Background sync/indexing: can tolerate waiting
const writeDb = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled });
writeDb.exec("PRAGMA busy_timeout = 5000");
  1. Keep busy_timeout low (e.g., 50–250ms) for this manager and handle SQLITE_BUSY by returning an error or deferring work.

  2. Move SQLite work off the main thread (switch to async DB access or a worker thread) so lock waits do not block the event loop.

Also consider enabling WAL mode (if compatible) to reduce reader/writer blocking and lock convoying.


Analyzed PR: #39183 at commit cc744d0

Last updated on: 2026-03-07T22:14:12Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 7, 2026

Greptile Summary

This PR fixes a SQLITE_BUSY crash that occurred under concurrent process load by ensuring PRAGMA busy_timeout = 5000 is set on every SQLite connection open. Previously, write connections in manager-sync-ops.ts had no timeout at all (default 0 ms), and the read-only QMD connection in qmd-manager.ts had only a 1 ms timeout — both effectively causing immediate failures when another process held a lock.

Key changes:

  • openDatabaseAtPath() in manager-sync-ops.ts now calls db.exec("PRAGMA busy_timeout = 5000") immediately after opening, covering all write opens (initial, temp-DB for vacuum/rebuild, and post-swap reopen).
  • ensureDb() in qmd-manager.ts raises busy_timeout from 1 ms to 5000 ms on the read-only index connection.

One design consideration worth noting:
The original busy_timeout = 1 on the QMD read-only connection carried an explicit comment — "Keep QMD recall responsive when the updater holds a write lock" — signalling an intentional fail-fast strategy. Raising this to 5000 ms means QMD memory recall could stall for up to 5 seconds if an exclusive lock is held (e.g. during a VACUUM or when running without WAL journal mode). In WAL mode — strongly implied by the -wal/-shm file-suffix handling in moveIndexFiles/removeIndexFiles — readers and writers coexist without blocking, so the timeout would rarely fire in practice. However, no PRAGMA journal_mode = WAL is set explicitly in the connection-open code, so behaviour depends on whatever mode was last persisted to the database file. If WAL mode is reliably enabled elsewhere (e.g. by the qmd updater process), this change is safe. If it is not guaranteed, a more conservative value (e.g. 1000–2000 ms) for the read path might better preserve the original responsiveness goal.

Confidence Score: 4/5

  • Safe to merge; the fix is correct and well-scoped, with a minor design trade-off on the QMD read-only path that is low-risk in WAL mode.
  • Both changes correctly apply busy_timeout on every connection open, directly addressing the SQLITE_BUSY root cause. The write-path change (manager-sync-ops.ts) is straightforward and unambiguously correct. The QMD read-path change reverses a deliberate fail-fast design decision (1 ms → 5000 ms), which is acceptable but introduces the possibility of up to 5-second read stalls if an exclusive lock is held and WAL mode is not active. Since WAL mode appears to be in use based on the -wal/-shm file handling, the practical risk is low, warranting a 4 rather than a perfect 5.
  • src/memory/qmd-manager.ts — confirm that the QMD SQLite database reliably uses WAL journal mode so the 5000 ms read timeout cannot cause user-visible recall latency spikes.

Last reviewed commit: 8b43255

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b43255ea5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/memory/qmd-manager.ts Outdated
this.db.exec("PRAGMA busy_timeout = 1");
// busy_timeout is per-connection; set it on every open so concurrent
// processes retry instead of failing immediately with SQLITE_BUSY.
this.db.exec("PRAGMA busy_timeout = 5000");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep QMD read timeout low to prevent blocking requests

Setting the read-only QMD connection to PRAGMA busy_timeout = 5000 can block the Node event loop for up to 5s whenever the updater or another process holds a SQLite lock, because this code path uses synchronous DatabaseSync queries (resolveDocLocation and readCounts) during request handling. This regresses the previous low-latency behavior (the old 1ms timeout and existing busy-error handling) and can cause visible stalls/timeouts under normal concurrent update traffic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — lowered to 1000 ms in cc744d0. The QMD read path uses synchronous DatabaseSync, so 5 s was too aggressive. In WAL mode readers rarely contend with writers anyway, so 1 s provides a safe margin without risking event-loop stalls.

The QMD connection uses synchronous DatabaseSync queries on the main
thread, so a 5 s timeout could block the event loop. In WAL mode
readers rarely contend with writers, so 1 s is a safe upper bound that
still prevents the SQLITE_BUSY crash without risking noticeable stalls.
steipete added a commit that referenced this pull request Mar 7, 2026
@steipete
Copy link
Copy Markdown
Contributor

steipete commented Mar 7, 2026

Landed on main in 3a2fdc513.

What we changed while landing:

  • kept your core fix: apply PRAGMA busy_timeout on each connection open for both sync-store and QMD paths
  • aligned QMD timeout to 1000ms (1s) instead of 1ms, with rationale in-code for synchronous read-path safety
  • added regression tests asserting busy-timeout pragma values on both managers:
    • src/memory/manager.readonly-recovery.test.ts
    • src/memory/qmd-manager.test.ts
  • added changelog entry in 2026.3.7 fixes

Source commits reviewed:

Thanks @MumuTW.

@steipete steipete closed this Mar 7, 2026
vincentkoc pushed a commit to BryanTegomoh/openclaw-upstream that referenced this pull request Mar 8, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
ziomancer pushed a commit to ziomancer/openclaw that referenced this pull request Mar 8, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
openperf pushed a commit to openperf/moltbot that referenced this pull request Mar 8, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
mcaxtr pushed a commit to mcaxtr/openclaw that referenced this pull request Mar 8, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 21, 2026
alexey-pelykh added a commit to remoteclaw/remoteclaw that referenced this pull request Mar 21, 2026
…hanks @MumuTW) (#1762)

(cherry picked from commit 3a2fdc5)

Co-authored-by: Peter Steinberger <steipete@gmail.com>
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…hanks @MumuTW)

Co-authored-by: MumuTW <clothl47364@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLite busy_timeout should be set on every new connection (not persistent across restarts)

2 participants