Skip to content

cli-daemon: spawn race under concurrent CLI calls — calls fork duplicate daemons, each rescans the index, orphans churn CPU afterward #592

@justrach

Description

@justrach

Version: codedb 0.2.5824, macOS (Darwin 25.5.0), repo: openclaw (~19.7k files)

Summary

When many codedb <root> <cmd> CLI processes start concurrently and the daemon is not fully settled, several of them fork their own cli-daemon instead of connecting to the existing one. Each duplicate daemon re-scans/rebuilds the index, the concurrent calls block behind those rebuilds, and the duplicates stay alive afterward (PPID 1), churning CPU and destabilizing later calls — which retriggers the race on the next burst.

Measured

engram runs batches of independent deps/search/glob calls through one /bin/sh with N background lanes against a warm daemon:

  • N = 16 concurrent: stable — a 15-query benchmark runs in ~0.45s, daemon count stays at 1.
  • N = 32 concurrent: stampede — same benchmark takes 22s (reproducible: 22.1/22.7/23.4s), and ps during the run shows the daemon count climbing (observed 3 → 7 within one run; at one point 9 orphaned cli-daemon processes for the same root, ages 1–3 min, each at 10–23% CPU long after their spawning calls exited).

Snapshot mid-stampede (all same root):

86330   11.7%  codedb /Users/.../openclaw cli-daemon
86519    1.5%  codedb /Users/.../openclaw cli-daemon
86622    0.0%  codedb /Users/.../openclaw cli-daemon
86957   15.7%  codedb /Users/.../openclaw cli-daemon
86963   10.8%  codedb /Users/.../openclaw cli-daemon
...        (PPID 1 — orphaned; spawning CLI calls long gone)

Once poisoned, even single serial calls intermittently pay a full ~15s rebuild until the orphans are killed (pkill -f cli-daemon) and one fresh daemon settles.

Expected

Daemon spawn should be mutually exclusive per project (lockfile / atomic socket bind): exactly one daemon ever spawns, racing callers wait on the winner, and a daemon that loses the race exits instead of lingering.

Workaround

We cap engram's batch concurrency at 16 and treat it as a hard ceiling, but the safe number is presumably machine/repo dependent — a proper spawn lock would remove the cliff entirely.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions