-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Note: This issue was filed by Claude Code (Anthropic's AI coding agent) on behalf of a user who encountered and diagnosed this problem across multiple projects.
Feature Description
When using OpenViking in local mode (SyncOpenViking with PersistStore) across multiple concurrent sessions — such as multiple Claude Code instances, multiple MCP stdio servers, or multiple AI agents working in the same project — the RocksDB LOCK file frequently becomes stale on Windows, blocking all subsequent sessions from initializing.
This is related to #473 (multi-session contention) but proposes a concrete, safe fix rather than requiring users to switch to HTTP mode.
Root Cause
RocksDB creates a LOCK file when a database is opened. On Windows:
- Even after
client.close()is called, the OS file handle may persist until the Python process fully exits - If a process crashes, is killed, or times out, the LOCK file is never released
- The next process that tries to open the same database gets:
IO error: .../LOCK: The process cannot access the file because it is being used by another process
This is especially problematic in hook-based architectures (Claude Code hooks, MCP stdio servers) where many short-lived Python processes open and close the same database.
Proposed Fix
Detect and remove stale LOCK files before opening the database. This is safe because:
- RocksDB LOCK files are purely advisory (0 bytes, contain no data)
- All actual data lives in SST files, WAL logs, and MANIFEST files
- On Windows,
os.remove()on a held file raisesPermissionError— providing an atomic staleness check:os.remove()succeeds → no process holds it → stale lock, safe to removeos.remove()raisesPermissionError→ live process holds it → leave it alone
Reference Implementation
Here's the patch we're running successfully in the Claude Code plugin bridge (ov_memory.py). This could be integrated into SyncOpenViking.initialize() or the storage layer directly:
import glob
import logging
import os
from pathlib import Path
_log = logging.getLogger("openviking")
def _clear_stale_vectordb_locks(data_path: str) -> None:
"""Remove RocksDB LOCK files left behind by crashed or exited processes.
RocksDB creates a LOCK file when a database is opened and holds it for the
lifetime of the process. On Windows the file handle is kept by the OS, so
``os.remove()`` will raise ``PermissionError`` if a **live** process still
holds it — making this a safe, atomic staleness check:
* ``os.remove()`` succeeds → no process held the file → stale, safe to
remove (the next DB open will recreate it).
* ``os.remove()`` raises ``PermissionError`` → a live process holds it →
we leave it alone.
Data integrity is not affected because all actual data lives in SST files,
WAL logs, and the MANIFEST. The LOCK file is purely an advisory mutex.
"""
base = Path(data_path)
lock_pattern = str(base / "vectordb" / "*" / "store" / "LOCK")
lock_files = glob.glob(lock_pattern)
if not lock_files:
return
for lock_path in lock_files:
try:
os.remove(lock_path)
_log.info("Removed stale RocksDB LOCK: %s", lock_path)
except PermissionError:
# A live process holds this lock — leave it alone.
_log.debug("LOCK held by live process, skipping: %s", lock_path)
except OSError as exc:
_log.debug("Could not remove LOCK %s: %s", lock_path, exc)Called before SyncOpenViking(path=data_path).initialize().
Why This Should Be in OpenViking Core
- Every local-mode user hits this — anyone running multiple agents, sessions, or hooks against the same project will encounter stale locks after a crash or timeout
- The fix is trivial and safe — 20 lines, zero risk of data loss, uses the OS's own file locking as the safety check
- The alternative (HTTP mode) is heavy — requiring users to run a persistent server process just to avoid stale locks is a significant ergonomic cost
- Windows is disproportionately affected — RocksDB lock cleanup on Windows is less reliable than on Linux/macOS
Environment
- OpenViking version: 0.2.6
- Python: 3.12
- OS: Windows 10 Pro
- Use case: Claude Code hooks (
SessionStart,Stop,SessionEnd) with multiple concurrent sessions in the same project directory
Relationship to Other Issues
- Directly related to [Bug]: Multiple stdio MCP sessions can contend for the same OpenViking data directory and surface misleading errors #473 — same root cause (multi-session contention on local storage), different trigger (Claude Code hooks vs MCP stdio)
- Could also help with [Bug]: [Windows10] 本地向量后端无法正常创建并持久化向量索引 #576 (Windows vectordb persistence issues)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status