consolidation: fix 7 defects in PR #24168 LSP feature (orphan procs, patch_replace double-lint, opt-in defaults, +4 more)#24467
Conversation
…ile/patch
Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.
LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.
The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.
New module agent/lsp/ split into:
- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}
Wired into tools/file_operations.py:
- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged
Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).
Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.
Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.
Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call.
Consolidates fixes for issues found during a holistic review of the LSP write_file/patch integration. Builds on top of PR #24168 HEAD (7fa5657). Six defects from initial review + one (D7) raised by the bot reviewer comment thread are addressed below. Bot reviewer issues (orphan subprocesses, default-enabled aggressiveness, test failures) are all resolved or stale; addendum patch_replace bug independently diagnosed with same fix shape. Defects fixed ============= D1 patch_replace double-lint silently drops every LSP diagnostic `write_file` rolls the baseline forward; the outer `patch_replace._check_lint_delta` then sees an empty delta. Fix: token-keyed `_delta_baseline`; `patch_replace` captures the token, passes it to `write_file` via `_lsp_baseline_token=`, and the inner `_check_lint_delta` runs with `skip_lsp=True` so the token survives to the outer post-write check. D2 Path-keyed _delta_baseline collides under concurrent gateway load Two chats editing the same path in different worktrees stomp each other's baseline. Fix: same token-keyed map (UUID per snapshot). D3 Idle LSP subprocesses never reaped Long-lived gateway accumulates one LSP per (language, workspace) forever (pyright ~200MB, gopls ~80MB, tsserver ~150MB). Fix: background `_reaper_loop` coroutine scheduled on the `_BackgroundLoop`. `_reap_idle()` is the unit-testable single pass. D4 Workspace cache unbounded + not thread-safe Long-lived process grows without ceiling; concurrent resolve can corrupt the dict. Fix: `OrderedDict` LRU bounded at 1024 + `threading.Lock`. Added public `invalidate_workspace_cache(path)` and `clear_workspace_cache()` for callers that observe workspace topology changes. D5 Default install_strategy=auto triggers silent installs SOC 2 / ISO 27001 finding: first .py edit in a git repo triggers `npm install pyright` with no consent. Fix: defaults are `lsp.enabled: false` + `install_strategy: manual`. Both are opt-in. D6 INSTALL_RECIPES unpinned (pyright@latest, gopls@latest) Supply-chain risk. Fix: every auto-install recipe pinned to a specific verified version. CVE notes inline for yaml-language-server (yaml@2.7.1 CVE-2026-33532, CVSS 4.3 low) and gopls v0.21.1 (CVE-2026-42503, CVSS 8.8 — exploit gated on `-listen`/`-port` flags our default invocation doesn't use). D7 Orphan LSP subprocesses on hermes exit Daemon thread death doesn't propagate SIGTERM to child language servers. Fix: `atexit.register(shutdown_service)` on first `get_service()` success + explicit `shutdown_service()` call in `cli._run_cleanup`. Belt + suspenders. Live-verified: child process spawned pyright, exited normally, atexit fired, no orphan. Test coverage ============= 204/204 unit tests pass (26 new regression tests pin every defect). 1 live verification (D7 orphan check against real pyright). Files ===== agent/lsp/__init__.py | 24 (atexit hook + idempotent shutdown) agent/lsp/install.py | 61 (pinned recipes + docstring fix) agent/lsp/manager.py | 230 (token-keyed baseline, reaper, concurrency) agent/lsp/workspace.py | 110 (LRU cache + lock + invalidation API) cli.py | 10 (LSP shutdown in _run_cleanup) hermes_cli/config.py | 29 (opt-in defaults + audit-compliance comments) tests/agent/lsp/test_service.py | 11 (token-aware delta test) tools/file_operations.py | 107 (3-tier lint + patch_replace D1 fix) tests/agent/lsp/test_consolidation_regressions.py | 902 (new, regression pins)
|
Hi @scubamount — thanks for the holistic review and the detailed defect-by-defect writeup. We went through each of D1–D7 against current Verification against
On the PR itself: The framing is "fix 7 defects" and the description lists narrow, surgical changes — but the actual diff is +531/-457 in non-test scope across the LSP module. It also removes the What we're doing: The three verified-real findings (D3, D5, D6) are now standalone issues — #25016, #25015, #25017 — each with the original analysis credited to you and a focused fix scope. We'd love a follow-up PR from you against any of those if you're up for it; the fix shape you proposed for D3 in particular (background reaper on Closing this PR in favor of the split-out issues. The non-bundled findings are all credited to you in the issue bodies. Thanks again for the holistic review — D5 and D3 in particular are real wins worth landing. |
Consolidation of PR #24168 (LSP semantic diagnostics) — 7 defect fixes
Builds on top of #24168 HEAD (7fa5657). Supersedes the competing #24155 (simpler/smaller approach to the same feature). Addresses every issue raised in the #24168 review thread plus three additional defects found during a holistic re-review.
Context
PR #24168 adds a substantial LSP integration (~4200 LOC across 2 commits) — full
agent/lsp/module spawning real language servers (pyright, gopls, rust-analyzer, etc.) as subprocesses, feeding semantic diagnostics intowrite_file/patch_replacepost-write lint. Architecture is sound; the issues below are wiring/lifecycle/defaults bugs that would bite production users.Defects fixed
patch_replacedouble-lint silently drops every LSP diagnostic. The recursivewrite_filecall rolls the baseline forward; the outer_check_lint_deltathen sees an empty delta. Net effect:patch_replacewould never surface a single LSP semantic error. Independently diagnosed in the #24168 bot review addendum._delta_baselinecollides under concurrent gateway load — two chats editing the same path in different worktrees stomp each other's baseline.(language, workspace)forever (pyright ~200 MB, gopls ~80 MB, tsserver ~150 MB).DEFAULT_IDLE_TIMEOUT = 600existed but was never wired to a reaper.dict, not thread-safe. Long-lived process grows without ceiling; concurrent resolve can corrupt the dict.lsp.enabled: true+install_strategy: autotriggers silentnpm install pyrighton first.pyedit in a git repo — SOC 2 / ISO 27001 finding in audited environments. Also raised as a blocker in the #24168 bot review.INSTALL_RECIPESunpinned (e.g.pyright@latest,gopls@latest). Supply-chain risk; same Dependabot discipline that applies to application deps applies to tools we shell out to.Fix summary
_delta_baseline(UUID per snapshot).patch_replacecaptures its own token before the write, passes it towrite_filevia_lsp_baseline_token=, and the inner_check_lint_deltaruns withskip_lsp=Trueso the token survives to the outer post-write check._reaper_loopcoroutine scheduled on_BackgroundLoop._reap_idle()is the unit-testable single pass.OrderedDictLRU (cap 1024) +threading.Lock. Added publicinvalidate_workspace_cache(path)+clear_workspace_cache()for callers observing workspace topology changes (git init, worktree add/remove).lsp.enabled: false+install_strategy: "manual". Both opt-in. Existing-binary discovery still works in manual mode (servers on PATH).yaml-language-server@1.23.0(CVE-2026-33532, CVSS 4.3 low, DoS-only on deeply-nested YAML, server is sandboxed) andgopls v0.21.1(CVE-2026-42503, CVSS 8.8 high — exploit gated on-listen/-portflags our default invocation doesn't pass).atexit.register(shutdown_service)on first successfulget_service()+ explicitshutdown_service()call wired intocli._run_cleanup. Belt + suspenders.shutdown_serviceis idempotent and swallows inner exceptions so an exit hook can never crash the process.Test coverage
tests/agent/lsp/test_consolidation_regressions.py— one per defect, plus thread-safety + idempotency + ordering pins).Diffstat
#24168 bot review cross-reference
DEFAULT_IDLE_TIMEOUTpatch_replaceLSP double-callOpen items, non-blocking
yaml-language-serverandgopls— both bounded by deployment posture (sandboxed YAML parsing, no flags). Track upstream patches separately._servicesingleton inget_service()not introduced by this PR; low probability in practice (single-threaded tool dispatch in hermes main loop).Verifying locally
git fetch https://github.com/scubamount/hermes-agent.git lsp-consolidation-pr24168-fixes git checkout FETCH_HEAD python -m pytest tests/agent/lsp/ tests/tools/test_file_operations.py \ tests/tools/test_file_operations_edge_cases.py \ tests/tools/test_file_write_safety.pyExpected: 204 passed.
cc @teknium1 (PR #24168 author) — this is positioned as a friendly consolidation, not a takeover; happy to close in favor of you pulling these patches into #24168 directly, or to keep this PR live as a stacked target. Whichever you prefer.