Skip to content

fix(mcp): add early detection for multi-instance stdio contention#611

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
mvanhorn:osc/473-mcp-stdio-contention-detection
Mar 15, 2026
Merged

fix(mcp): add early detection for multi-instance stdio contention#611
MaojiaSheng merged 1 commit intovolcengine:mainfrom
mvanhorn:osc/473-mcp-stdio-contention-detection

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

When multiple OpenViking processes share the same data directory (common with stdio MCP in multi-session hosts), they silently contend for AGFS and VectorDB resources. Users see misleading errors like Collection 'context' does not exist or Transport closed instead of a clear contention diagnostic.

This adds a PID-based advisory lock that detects the problem at startup and fails with an actionable error message.

Why this matters

  • #473 - 4 comments, with @ZaynJarvis confirming: "I agree documentation and error messages should be improved to prevent multi-session use on stdio MCP"
  • Documentation was addressed in #518 (merged). This PR completes the code-side fix.
  • Without detection, users spend hours debugging misleading errors before discovering the root cause is process contention

Changes

New file: openviking/utils/process_lock.py

  • acquire_data_dir_lock(data_dir) - PID-based advisory lock
  • DataDirectoryLocked exception with clear error message suggesting HTTP mode
  • Handles stale locks from crashed processes (checks if PID is alive)
  • Cleanup via atexit and SIGTERM handler

Modified: openviking/service/core.py

  • Calls acquire_data_dir_lock() at the start of initialize(), before any storage access

New test: tests/misc/test_process_lock.py

  • Lock acquisition on empty directory
  • Same-PID reacquisition (idempotent)
  • Stale lock replacement (dead PID)
  • Live PID blocks with clear error message

Error message when contention is detected

DataDirectoryLocked: Another OpenViking process (PID 12345) is already
using the data directory '/path/to/data'. Running multiple OpenViking
instances on the same data directory causes silent storage contention
and data corruption.

To fix this, use one of these approaches:
  1. Use HTTP mode: start a single openviking-server and connect
     via --transport http (recommended for multi-session hosts)
  2. Use separate data directories for each instance
  3. Stop the other process (PID 12345) first

Testing

  • All 4 test cases pass (lock acquire, reacquire, stale replacement, live detection)
  • ruff format --check and ruff check pass on all changed files
  • HTTP mode is unaffected (openviking-server handles concurrency natively via uvicorn workers)

Relates to #473

This contribution was developed with AI assistance (Claude Code).

When multiple OpenViking processes share the same data directory (common
with stdio MCP in multi-session hosts), they silently contend for AGFS
and VectorDB resources. This produces misleading errors like "Collection
'context' does not exist" instead of pointing to the actual cause.

Add a PID-based advisory lock in OpenVikingService.initialize() that:
- Detects an existing live process using the same data directory
- Raises DataDirectoryLocked with a clear error message explaining the
  contention and suggesting HTTP mode or separate data directories
- Cleans up stale lock files from crashed processes
- Releases the lock on normal exit via atexit

The lock uses a .openviking.pid file in the data directory. HTTP mode
(openviking-server) is unaffected since it handles concurrency natively.

Relates to volcengine#473
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 14, 2026

CLA assistant check
All committers have signed the CLA.

@MaojiaSheng MaojiaSheng merged commit e56e245 into volcengine:main Mar 15, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants