Skip to content

fix: add file-based lock to prevent PGLite concurrent access crashes (Aborted())#61

Closed
danbr wants to merge 1 commit into
garrytan:masterfrom
danbr:fix/pglite-concurrent-access-lock
Closed

fix: add file-based lock to prevent PGLite concurrent access crashes (Aborted())#61
danbr wants to merge 1 commit into
garrytan:masterfrom
danbr:fix/pglite-concurrent-access-lock

Conversation

@danbr

@danbr danbr commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Problem

When gbrain embed is running (which can take minutes with 700+ pages) and another process tries to connect to the same PGLite database (e.g. gbrain query), PGLite throws Aborted() because embedded Postgres (WASM) only supports one connection at a time. This is a hard crash — no recovery, no helpful error.

Root Cause

PGLite uses a single WASM Postgres instance bound to a filesystem directory. Concurrent access from multiple processes to the same data directory causes an assertion failure in the Postgres WAL layer, producing Aborted() with no useful context.

Fix

Added a file-based advisory lock (src/core/pglite-lock.ts) that:

  1. Acquires lock before PGLite connect — uses atomic mkdir which is POSIX-atomic, so no race conditions between processes
  2. Stores PID + timestamp + command — in .gbrain-lock/lock inside the data directory, so you can see exactly what process holds the lock
  3. Auto-cleans stale locks — if the PID is dead or the lock is >5 min old (embed jobs can be long), it's removed and acquired
  4. Skips locking for in-memory PGLiteundefined dataDir means process-scoped, so no concurrent access is possible
  5. Clear error on timeout — after 30s default, shows which process holds the lock, its PID, and how to recover (rm -rf .gbrain-lock)
  6. Releases on disconnectPGLiteEngine.disconnect() calls releaseLock() in a finally-like pattern

Testing

  • 6 new tests in test/pglite-lock.test.ts: acquire/release, concurrent prevention, stale detection, in-memory skip, PID tracking, cleanup on disconnect
  • All 348 existing tests pass — no regressions
  • Manually verified — the Aborted() crash no longer occurs; concurrent processes get a clear error message instead

Files Changed

File Change
src/core/pglite-lock.ts New file — file-based lock implementation
src/core/pglite-engine.ts Added acquireLock/releaseLock in connect/disconnect
test/pglite-lock.test.ts New file — 6 tests for lock behavior

PGLite uses embedded Postgres (WASM) which only supports one connection
at a time. When `gbrain embed` is running (which can take minutes with
700+ pages) and another process tries to connect (e.g. `gbrain query`),
PGLite throws `Aborted()` because it can't handle concurrent access
to the same data directory.

This patch adds a file-based advisory lock using atomic `mkdir` with
PID tracking for stale lock detection:

- `acquireLock()` creates `.gbrain-lock/` dir with a lock file
  containing PID, timestamp, and command
- Stale locks from dead processes are auto-cleaned
- In-memory PGLite (undefined dataDir) skips locking entirely since
  it's process-scoped
- `disconnect()` releases the lock via `releaseLock()`
- Default 30s timeout with clear error message showing which process
  holds the lock and how to recover

Tested: all 348 existing tests pass + 6 new lock tests.
@garrytan

Copy link
Copy Markdown
Owner

Included in fix wave PR #65 (v0.9.1). PGLite file lock landed — atomic mkdir, PID+age stale detection, clear error messages. We went with your separate-module approach over the inline version in PR #60. Thanks danbr! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants