Skip to content

consolidation: fix 7 defects in PR #24168 LSP feature (orphan procs, patch_replace double-lint, opt-in defaults, +4 more)#24467

Closed
scubamount wants to merge 3 commits into
NousResearch:mainfrom
scubamount:lsp-consolidation-pr24168-fixes
Closed

consolidation: fix 7 defects in PR #24168 LSP feature (orphan procs, patch_replace double-lint, opt-in defaults, +4 more)#24467
scubamount wants to merge 3 commits into
NousResearch:mainfrom
scubamount:lsp-consolidation-pr24168-fixes

Conversation

@scubamount

Copy link
Copy Markdown
Contributor

Consolidation of PR #24168 (LSP semantic diagnostics) — 7 defect fixes

Builds on top of #24168 HEAD (7fa5657). Supersedes the competing #24155 (simpler/smaller approach to the same feature). Addresses every issue raised in the #24168 review thread plus three additional defects found during a holistic re-review.

Context

PR #24168 adds a substantial LSP integration (~4200 LOC across 2 commits) — full agent/lsp/ module spawning real language servers (pyright, gopls, rust-analyzer, etc.) as subprocesses, feeding semantic diagnostics into write_file/patch_replace post-write lint. Architecture is sound; the issues below are wiring/lifecycle/defaults bugs that would bite production users.

Defects fixed

# Defect Severity
D1 patch_replace double-lint silently drops every LSP diagnostic. The recursive write_file call rolls the baseline forward; the outer _check_lint_delta then sees an empty delta. Net effect: patch_replace would never surface a single LSP semantic error. Independently diagnosed in the #24168 bot review addendum. Critical
D2 Path-keyed _delta_baseline collides under concurrent gateway load — two chats editing the same path in different worktrees stomp each other's baseline. High
D3 Idle LSP subprocesses never reaped. Long-lived gateway accumulates one server per (language, workspace) forever (pyright ~200 MB, gopls ~80 MB, tsserver ~150 MB). DEFAULT_IDLE_TIMEOUT = 600 existed but was never wired to a reaper. High
D4 Workspace cache unbounded dict, not thread-safe. Long-lived process grows without ceiling; concurrent resolve can corrupt the dict. High
D5 Default lsp.enabled: true + install_strategy: auto triggers silent npm install pyright on first .py edit in a git repo — SOC 2 / ISO 27001 finding in audited environments. Also raised as a blocker in the #24168 bot review. High
D6 INSTALL_RECIPES unpinned (e.g. pyright@latest, gopls@latest). Supply-chain risk; same Dependabot discipline that applies to application deps applies to tools we shell out to. Medium
D7 Orphaned LSP subprocesses on hermes exit. Daemon thread death doesn't propagate SIGTERM to child servers — they live on holding 80–200 MB each until the kernel reaps them. Raised as blocker (1) in the #24168 bot review. High

Fix summary

  • D1: Token-keyed _delta_baseline (UUID per snapshot). patch_replace captures its own token before the write, passes it to write_file via _lsp_baseline_token=, and the inner _check_lint_delta runs with skip_lsp=True so the token survives to the outer post-write check.
  • D2: Same token-keyed map. Each snapshot returns a fresh token; no implicit global state for concurrent writers to collide on.
  • D3: Background _reaper_loop coroutine scheduled on _BackgroundLoop. _reap_idle() is the unit-testable single pass.
  • D4: OrderedDict LRU (cap 1024) + threading.Lock. Added public invalidate_workspace_cache(path) + clear_workspace_cache() for callers observing workspace topology changes (git init, worktree add/remove).
  • D5: Defaults flipped to lsp.enabled: false + install_strategy: "manual". Both opt-in. Existing-binary discovery still works in manual mode (servers on PATH).
  • D6: Every auto-install recipe pinned to a specific verified version. Inline CVE notes for yaml-language-server@1.23.0 (CVE-2026-33532, CVSS 4.3 low, DoS-only on deeply-nested YAML, server is sandboxed) and gopls v0.21.1 (CVE-2026-42503, CVSS 8.8 high — exploit gated on -listen/-port flags our default invocation doesn't pass).
  • D7: atexit.register(shutdown_service) on first successful get_service() + explicit shutdown_service() call wired into cli._run_cleanup. Belt + suspenders. shutdown_service is idempotent and swallows inner exceptions so an exit hook can never crash the process.

Test coverage

  • 204/204 unit tests pass (26 new regression tests in tests/agent/lsp/test_consolidation_regressions.py — one per defect, plus thread-safety + idempotency + ordering pins).
  • Live verification of D7: spawned a child Python process that imported the LSP service, opened a real pyright subprocess (PID 77138), and exited normally. After exit, no orphan pyright process survived. Atexit hook fired correctly.

Diffstat

agent/lsp/__init__.py                              |  24 +-
agent/lsp/install.py                               |  61 +-
agent/lsp/manager.py                               | 230 ++++++-
agent/lsp/workspace.py                             | 110 ++-
cli.py                                             |  10 +
hermes_cli/config.py                               |  29 +-
tests/agent/lsp/test_consolidation_regressions.py  | 902 ++++++++++++++++++++  (new)
tests/agent/lsp/test_service.py                    |  11 +-
tools/file_operations.py                           | 107 ++-
9 files changed, 1400 insertions(+), 84 deletions(-)

#24168 bot review cross-reference

Bot finding Resolution here
(1) Orphan LSP server processes (blocker) ✅ D7 — atexit + cli._run_cleanup, live-verified
(2) Enabled by default + auto-install too aggressive ✅ D5
(3) Five test_client_e2e.py failures Already passing 6/6 on this branch (stale finding)
(4) Competing #24155 This consolidation supersedes both
(5) Stale parent commits in #24168 Will be addressed when @teknium1 rebases #24168; not in scope here
Minor: dead DEFAULT_IDLE_TIMEOUT ✅ D3 — reaper is now live, constant is load-bearing
Addendum: patch_replace LSP double-call ✅ D1 — independently diagnosed, same fix shape

Open items, non-blocking

  • CVE notes inline for yaml-language-server and gopls — both bounded by deployment posture (sandboxed YAML parsing, no flags). Track upstream patches separately.
  • Pre-existing race on _service singleton in get_service() not introduced by this PR; low probability in practice (single-threaded tool dispatch in hermes main loop).

Verifying locally

git fetch https://github.com/scubamount/hermes-agent.git lsp-consolidation-pr24168-fixes
git checkout FETCH_HEAD
python -m pytest tests/agent/lsp/ tests/tools/test_file_operations.py \
                 tests/tools/test_file_operations_edge_cases.py \
                 tests/tools/test_file_write_safety.py

Expected: 204 passed.


cc @teknium1 (PR #24168 author) — this is positioned as a friendly consolidation, not a takeover; happy to close in favor of you pulling these patches into #24168 directly, or to keep this PR live as a stacked target. Whichever you prefer.

teknium1 and others added 3 commits May 11, 2026 21:28
…ile/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.
Consolidates fixes for issues found during a holistic review of the
LSP write_file/patch integration. Builds on top of PR #24168 HEAD
(7fa5657). Six defects from initial review + one (D7) raised by the
bot reviewer comment thread are addressed below. Bot reviewer issues
(orphan subprocesses, default-enabled aggressiveness, test failures)
are all resolved or stale; addendum patch_replace bug independently
diagnosed with same fix shape.

Defects fixed
=============

D1 patch_replace double-lint silently drops every LSP diagnostic
    `write_file` rolls the baseline forward; the outer
    `patch_replace._check_lint_delta` then sees an empty delta.
    Fix: token-keyed `_delta_baseline`; `patch_replace` captures the
    token, passes it to `write_file` via `_lsp_baseline_token=`, and
    the inner `_check_lint_delta` runs with `skip_lsp=True` so the
    token survives to the outer post-write check.

D2 Path-keyed _delta_baseline collides under concurrent gateway load
    Two chats editing the same path in different worktrees stomp each
    other's baseline. Fix: same token-keyed map (UUID per snapshot).

D3 Idle LSP subprocesses never reaped
    Long-lived gateway accumulates one LSP per (language, workspace)
    forever (pyright ~200MB, gopls ~80MB, tsserver ~150MB). Fix:
    background `_reaper_loop` coroutine scheduled on the
    `_BackgroundLoop`. `_reap_idle()` is the unit-testable single
    pass.

D4 Workspace cache unbounded + not thread-safe
    Long-lived process grows without ceiling; concurrent resolve can
    corrupt the dict. Fix: `OrderedDict` LRU bounded at 1024 +
    `threading.Lock`. Added public `invalidate_workspace_cache(path)`
    and `clear_workspace_cache()` for callers that observe workspace
    topology changes.

D5 Default install_strategy=auto triggers silent installs
    SOC 2 / ISO 27001 finding: first .py edit in a git repo triggers
    `npm install pyright` with no consent. Fix: defaults are
    `lsp.enabled: false` + `install_strategy: manual`. Both are opt-in.

D6 INSTALL_RECIPES unpinned (pyright@latest, gopls@latest)
    Supply-chain risk. Fix: every auto-install recipe pinned to a
    specific verified version. CVE notes inline for yaml-language-server
    (yaml@2.7.1 CVE-2026-33532, CVSS 4.3 low) and gopls v0.21.1
    (CVE-2026-42503, CVSS 8.8 — exploit gated on `-listen`/`-port`
    flags our default invocation doesn't use).

D7 Orphan LSP subprocesses on hermes exit
    Daemon thread death doesn't propagate SIGTERM to child language
    servers. Fix: `atexit.register(shutdown_service)` on first
    `get_service()` success + explicit `shutdown_service()` call in
    `cli._run_cleanup`. Belt + suspenders. Live-verified: child
    process spawned pyright, exited normally, atexit fired, no orphan.

Test coverage
=============

204/204 unit tests pass (26 new regression tests pin every defect).
1 live verification (D7 orphan check against real pyright).

Files
=====

agent/lsp/__init__.py        | 24  (atexit hook + idempotent shutdown)
agent/lsp/install.py         | 61  (pinned recipes + docstring fix)
agent/lsp/manager.py         | 230 (token-keyed baseline, reaper, concurrency)
agent/lsp/workspace.py       | 110 (LRU cache + lock + invalidation API)
cli.py                       | 10  (LSP shutdown in _run_cleanup)
hermes_cli/config.py         | 29  (opt-in defaults + audit-compliance comments)
tests/agent/lsp/test_service.py | 11 (token-aware delta test)
tools/file_operations.py     | 107 (3-tier lint + patch_replace D1 fix)
tests/agent/lsp/test_consolidation_regressions.py | 902 (new, regression pins)
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Hi @scubamount — thanks for the holistic review and the detailed defect-by-defect writeup. We went through each of D1–D7 against current main and want to share what we found before deciding what to do with this PR.

Verification against origin/main:

# Claim Verified on main? Notes
D1 patch_replace double-lint silently drops every LSP diagnostic ❌ False on current main patch_replace calls write_file once, which captures the LSP baseline before writing and surfaces diagnostics via write_result.lsp_diagnostics, propagated into PatchResult at tools/file_operations.py:1084. The second _check_lint_delta in patch_replace is the syntax delta, not LSP — they're intentionally separate channels (see file_operations.py:1208 docstring). E2E'd with a fake LSP service: snapshot called once, get_diagnostics called once with delta=True, diagnostics propagate correctly to the final result.
D2 Path-keyed _delta_baseline collides under concurrent gateway load ⚠️ Theoretical _service is process-wide singleton, abs-path keyed. Realistic only when in-process gateway dispatches concurrent tools on the same path — Hermes' tool dispatch is single-threaded inside the main loop, so the collision window is narrow in practice. Not blocking.
D3 Idle LSP subprocesses never reaped ✅ Real Filed as #25016 with the focused reaper fix.
D4 Workspace cache unbounded dict, not thread-safe ⚠️ Partial The cache is plain dict with no lock, but Hermes' tool dispatch is single-threaded. Real concern only on long-lived processes touching many distinct paths; can be addressed alongside #25016. Not blocking.
D5 enabled: True + install_strategy: "auto" → silent npm install on first .py edit ✅ Real Filed as #25015 with the opt-in defaults fix.
D6 INSTALL_RECIPES unpinned (@latest) ✅ Real Filed as #25017 for the pinning work.
D7 Orphaned LSP subprocesses on hermes exit (atexit not wired) ❌ Already fixed on main atexit.register(_atexit_shutdown) lives at agent/lsp/__init__.py:75. The "belt + suspenders" extra cli._run_cleanup hook would be redundant.

On the PR itself:

The framing is "fix 7 defects" and the description lists narrow, surgical changes — but the actual diff is +531/-457 in non-test scope across the LSP module. It also removes the lsp_diagnostics field from WriteResult / PatchResult (a public contract change consumed by the agent and downstream parsers) and rewrites significant chunks of manager.py. With D1 turning out to be wrong on current main and D7 already fixed, the bundled rewrite is doing more than the description claims, and we'd rather not merge that as a single unit.

What we're doing:

The three verified-real findings (D3, D5, D6) are now standalone issues — #25016, #25015, #25017 — each with the original analysis credited to you and a focused fix scope. We'd love a follow-up PR from you against any of those if you're up for it; the fix shape you proposed for D3 in particular (background reaper on _BackgroundLoop with _reap_idle() as the unit-testable single pass) is exactly what we'd want to see, and it'd land cleanly on its own.

Closing this PR in favor of the split-out issues. The non-bundled findings are all credited to you in the issue bodies. Thanks again for the holistic review — D5 and D3 in particular are real wins worth landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists tool/file File tools (read, write, patch, search) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants