fix(windows): clean stale RocksDB LOCK files on startup#798
Merged
zhoujh01 merged 1 commit intovolcengine:mainfrom Mar 20, 2026
Merged
fix(windows): clean stale RocksDB LOCK files on startup#798zhoujh01 merged 1 commit intovolcengine:mainfrom
zhoujh01 merged 1 commit intovolcengine:mainfrom
Conversation
On Windows, RocksDB LOCK files persist after a process crash because
Windows does not always release file handles immediately after process
termination. This blocks subsequent PersistStore opens with:
IO error: .../LOCK: The process cannot access the file because it
is being used by another process.
Add clean_stale_rocksdb_locks() utility that attempts os.remove() on
each LOCK file during initialization:
- If PermissionError → file is held by a live process, skip it
- If remove succeeds → file was stale, cleaned up
Called from OpenVikingService.initialize() after acquiring the PID lock
and before opening storage. No-op on POSIX (flock handles this natively).
Closes volcengine#650
Co-Authored-By: Claude <noreply@anthropic.com>
zhoujh01
approved these changes
Mar 20, 2026
Collaborator
|
The code has been merged. It's recommended to move the call to the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #650 — On Windows, RocksDB LOCK files persist after a process crash (Windows doesn't always release file handles immediately after process termination). This blocks subsequent
PersistStoreopens with:This PR adds a stale LOCK file cleaner that runs during
OpenVikingService.initialize(), after the PID advisory lock is acquired and before storage is opened.How it works
LOCKfiles under the data directory using generalized glob patterns (**/store/LOCKand**/LOCKto cover allPersistStorepaths)os.remove()on each LOCK file:PermissionError→ file is held by a live process → skip it (safe)PersistStorewill recreate itflock()handles cleanup natively on Linux/macOSPlacement rationale
The cleanup runs in
OpenVikingService.initialize()right afteracquire_data_dir_lock()(the PID advisory lock from #473). This is the ideal location because:init_context_collection()prevents thePersistStoreopen failureRelationship to #790
PR #790 fixes the PID lock staleness (
_is_pid_alive()raisingOSErroron Windows). This PR fixes the RocksDB LOCK staleness — a separate file created by the native storage engine. Both issues manifest on Windows after crashes but are independent fixes.Additional context
We discovered this running OpenViking with the Claude Code plugin bridge on Windows 11 across multiple concurrent sessions. The debug log shows the failure pattern clearly:
We have additional findings around live LOCK contention (retry with exponential backoff) and orphan session recovery that we documented in issue #650. Those are better suited for follow-up PRs as they involve more architectural decisions.
Changes
openviking/storage/vectordb/utils/stale_lock.pyclean_stale_rocksdb_locks()openviking/service/core.pyinitialize()after PID locktests/storage/test_stale_lock.pyTest plan
pytest.mark.skipifto run platform-appropriate assertions