Skip to content

feat(lsp): ship LSP diagnostics as plugin with zero core changes#24414

Closed
daimon-nous[bot] wants to merge 6 commits into
mainfrom
lsp-plugin
Closed

feat(lsp): ship LSP diagnostics as plugin with zero core changes#24414
daimon-nous[bot] wants to merge 6 commits into
mainfrom
lsp-plugin

Conversation

@daimon-nous

@daimon-nous daimon-nous Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

write_file and patch now surface real semantic diagnostics — type errors, undefined names, missing imports — from 26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, etc.). Shipped as plugins/lsp/, zero core code touched.

Why a plugin (not core)

PR #24168 hardwired 4200 LOC into agent/lsp/, tools/file_operations.py, hermes_cli/config.py, and hermes_cli/main.py. That baked LSP into every user's hot path whether they wanted it or not, auto-installed npm packages without consent, and had no process cleanup on exit.

The plugin approach:

  • Opt-in: users explicitly enable via plugins.enabled — no surprise background processes
  • Zero core changes: wired entirely through existing hooks (pre_tool_call, transform_tool_result, on_session_end)
  • Lifecycle managed: servers shut down on session end + atexit (no orphaned processes)
  • Detect-only default: finds servers already on PATH, never auto-downloads unless the user runs hermes lsp install

What the agent sees

{
  "bytes_written": 42,
  "dirs_created": false,
  "lsp_diagnostics": "<diagnostics file=\"foo.py\">\nERROR [2:12] Type \"str\" is not assignable to return type \"int\" [reportReturnType] (Pyright)\n</diagnostics>"
}

Only diagnostics introduced by this edit — pre-existing errors are filtered via a baseline snapshot captured before the write.

Changes

  • plugins/lsp/ (11 modules, ~4800 LOC): async LSP client, JSON-RPC protocol framer, server registry, workspace resolver, binary installer, structured eventlog, reporter, CLI
  • tests/plugins/lsp/ (10 files, 77 tests): protocol, workspace, reporter, eventlog, client E2E against mock server, backend gate, full hook-flow integration
  • website/docs/user-guide/features/lsp.md: feature docs (setup, CLI, 26 languages, troubleshooting)
  • Core: 0 lines changed

Hook wiring

Hook What it does
pre_tool_call On write_file/patch — resolve path, check git workspace + local backend, snapshot LSP baseline
transform_tool_result After write — fetch diagnostics (3s timeout), compute delta, inject lsp_diagnostics field into result JSON
on_session_end + atexit Kill all spawned language server subprocesses

Enable + use

# ~/.hermes/config.yaml
plugins:
  enabled:
    - lsp
hermes lsp status           # what's installed
hermes lsp install pyright  # install into $HERMES_HOME/lsp/bin/
hermes lsp install gopls

Validation

Result
pyright E2E reportReturnType error surfaced in 0.3s
typescript-language-server E2E TS2322 surfaced in 1.2s
Plugin disabled Zero behavioral change, no processes spawned
Outside git workspace LSP dormant
Docker/SSH backend Skipped gracefully
Session end cleanup All servers killed (ps aux verified)
tests/plugins/lsp/ 77/77 pass
tests/tools/ No regressions (51/51)
Core diff 0 lines
hermes lsp status (cold shell) Works, 26 servers listed
Claude Code /review + /simplify All findings addressed

Credit

Salvages the LSP implementation from #24168 by @teknium1 (client, protocol, servers, workspace, install, reporter modules are their code relocated to the plugin) and incorporates design decisions from #24155 by @OutThisLife (opt-in default, detect-only binary resolution).

Closes #24168. Closes #24155.

Co-authored-by: Teknium 127238744+teknium1@users.noreply.github.com

alt-glitch and others added 4 commits May 12, 2026 13:01
Ship LSP semantic diagnostics as a bundled plugin (plugins/lsp/) using
existing hook system.  Zero lines of core code modified.

Plugin wiring:
- pre_tool_call: capture LSP baseline before write_file/patch
- transform_tool_result: inject diagnostics into tool result JSON
- on_session_start/on_session_end + atexit: lifecycle management

Key design:
- Baselines keyed by (session_id, abs_path) for concurrent safety
- Diagnostics added as 'lsp_diagnostics' JSON field (preserves shape)
- Per-file workspace detection (no static session-start gate)
- V4A multi-file patch skipped for MVP
- Short timeout (3s) — cold start degrades gracefully
- os.path.exists heuristic for Docker/SSH backend skip
- First relevant write with no server → INFO log with install hint

Tests: 77/77 pass including:
- Protocol framing, reporter formatting, workspace resolution
- Client E2E against mock LSP server (live_system_guard_bypass)
- Eventlog steady-state silence contract
- Backend-gate heuristic (local vs non-local paths)
- Full hook flow integration (pre→write→transform with diagnostics)

Source: PR #24168 by @teknium1, PR #24155 by @OutThisLife
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
…ration

register_cli_command's setup_fn receives an already-created parser,
not the parent's SubParsersAction. Added setup_lsp_parser() that adds
subcommands (status, list, install, restart, which) to the provided
parser.

Verified: 'hermes lsp status' works from cold shell when plugin is
enabled in plugins.enabled config.
Fixes from Claude Code adversarial review:
- Snapshot _service to local var before .is_active() (TOCTOU fix)
- Guard session_id against None with 'or ""'
- Remove text-append fallback — only inject when result is dict JSON
- Add ValueError to json.dumps except clause
- Guard result=None with 'or ""' and isinstance check

Non-dict JSON results and non-JSON results are now left unmodified
(return None = no injection) rather than risking format corruption.
- Remove dead _post_tool_call (body was only comments)
- Remove _on_session_start (redundant — _ensure_service lazy-inits)
- Remove _atexit_cleanup (duplicate of _on_session_end)
- Switch _baselines from dict to set (presence sentinel only)
- Remove redundant enabled_for recheck in transform_tool_result
- Remove V4A guard (path-empty check already covers it)
- Use modern type syntax (X | None, dict[], set[])
- Reduce from 322 → 217 lines, same behavior

77/77 tests pass.
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: lsp-plugin vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8205 on HEAD, 8191 on base (🆕 +14)

🆕 New issues (52):

Rule Count
invalid-argument-type 36
unresolved-import 7
unresolved-attribute 5
not-iterable 1
unsupported-operator 1
not-subscriptable 1
no-matching-overload 1
First entries
run_agent.py:13539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 4 union elements`
run_agent.py:4293: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown, Unknown] | int | dict[Unknown | str, Unknown | str | dict[str, str]]`
plugins/lsp/cli.py:180: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(Unknown & ~AlwaysFalsy) | Literal[True] | (list[Unknown] & ~AlwaysFalsy)`
run_agent.py:9965: [unresolved-attribute] unresolved-attribute: Attribute `lower` is not defined on `dict[Unknown, Unknown] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy` in union `(str & ~AlwaysFalsy) | (Unknown & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:2528: [invalid-argument-type] invalid-argument-type: Argument to function `ensure_lmstudio_model_loaded` is incorrect: Expected `str`, found `str | Unknown | Divergent | ... omitted 3 union elements`
run_agent.py:5858: [unresolved-attribute] unresolved-attribute: Attribute `split` is not defined on `dict[Unknown, Unknown]`, `int`, `dict[Unknown | str, Unknown | str | dict[str, str]]` in union `str | Unknown | Divergent | ... omitted 3 union elements`
tests/plugins/lsp/test_protocol.py:16: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
run_agent.py:13080: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.update_token_counts` is incorrect: Expected `str`, found `str | Unknown | Divergent | ... omitted 3 union elements`
run_agent.py:11451: [invalid-argument-type] invalid-argument-type: Argument to function `_fixed_temperature_for_model` is incorrect: Expected `str | None`, found `str | Unknown | Divergent | ... omitted 3 union elements`
plugins/lsp/cli.py:181: [not-iterable] not-iterable: Object of type `(Unknown & ~AlwaysFalsy) | Literal[True] | (list[Unknown] & ~AlwaysFalsy)` may not be iterable
run_agent.py:9522: [invalid-argument-type] invalid-argument-type: Argument to function `_get_anthropic_max_output` is incorrect: Expected `str`, found `str | Unknown | Divergent | ... omitted 3 union elements`
run_agent.py:11522: [unresolved-attribute] unresolved-attribute: Attribute `strip` is not defined on `dict[Unknown, Unknown] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy` in union `(str & ~AlwaysFalsy) | (Unknown & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:12296: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | Divergent | Unknown | ... omitted 3 union elements`
run_agent.py:8921: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str | None`, found `Divergent | Unknown | str | ... omitted 3 union elements`
run_agent.py:13076: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.update_token_counts` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown, Unknown] | int | dict[Unknown | str, Unknown | str | dict[str, str]]`
run_agent.py:13773: [invalid-argument-type] invalid-argument-type: Argument to function `_pool_may_recover_from_rate_limit` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown, Unknown] | int | dict[Unknown | str, Unknown | str | dict[str, str]]`
run_agent.py:3384: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_stale_timeout` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown, Unknown] | int | dict[Unknown | str, Unknown | str | dict[str, str]]`
tests/plugins/lsp/test_hook_flow.py:16: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
run_agent.py:8840: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `Divergent | Unknown | str | ... omitted 3 union elements`
tests/plugins/lsp/test_eventlog.py:12: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
run_agent.py:8837: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `Divergent | Unknown | str | ... omitted 3 union elements`
run_agent.py:9714: [invalid-argument-type] invalid-argument-type: Argument to function `github_model_reasoning_efforts` is incorrect: Expected `str | None`, found `str | Unknown | Divergent | ... omitted 3 union elements`
cli.py:8711: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str`, found `str | Unknown | Divergent | ... omitted 3 union elements`
run_agent.py:12290: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control_long_lived` is incorrect: Expected `bool`, found `int | Divergent | Unknown | ... omitted 3 union elements`
run_agent.py:13993: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown, Unknown] | int | dict[Unknown | str, Unknown | str | dict[str, str]]`
... and 27 more

✅ Fixed issues (34):

Rule Count
invalid-argument-type 30
unresolved-attribute 3
unsupported-operator 1
First entries
run_agent.py:2528: [invalid-argument-type] invalid-argument-type: Argument to function `ensure_lmstudio_model_loaded` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:3384: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_stale_timeout` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:5858: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["/"]` and `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13773: [invalid-argument-type] invalid-argument-type: Argument to function `_pool_may_recover_from_rate_limit` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:4882: [invalid-argument-type] invalid-argument-type: Argument to function `save_trajectory` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:13037: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:12992: [invalid-argument-type] invalid-argument-type: Argument to function `normalize_usage` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:7317: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:11451: [invalid-argument-type] invalid-argument-type: Argument to function `_fixed_temperature_for_model` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13080: [invalid-argument-type] invalid-argument-type: Argument to bound method `SessionDB.update_token_counts` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13039: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13993: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
cli.py:8711: [invalid-argument-type] invalid-argument-type: Argument to function `estimate_usage_cost` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:8921: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:13542: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:8837: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:9539: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_profile` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:12290: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control_long_lived` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:12296: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:9687: [invalid-argument-type] invalid-argument-type: Argument to function `lmstudio_model_reasoning_options` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:8921: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str | None`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
run_agent.py:11522: [unresolved-attribute] unresolved-attribute: Attribute `strip` is not defined on `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown, Unknown] & ~AlwaysFalsy` in union `(str & ~AlwaysFalsy) | (Unknown & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:9966: [unresolved-attribute] unresolved-attribute: Attribute `lower` is not defined on `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown, Unknown] & ~AlwaysFalsy` in union `(str & ~AlwaysFalsy) | (Unknown & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:3384: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_stale_timeout` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
... and 9 more

Unchanged: 4275 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…eshooting

Covers: enable flow, server installation (detect-only default vs
hermes lsp install), how diagnostics reach the model, config knobs,
all 26 supported languages, and troubleshooting common issues.
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins tool/file File tools (read, write, patch, search) labels May 12, 2026
@scubamount

Copy link
Copy Markdown
Contributor

Independent review of #24414

Reviewed the plugin against current main (dd0923bb8); 77/77 plugin tests pass and ruff check plugins/lsp/ is clean. Worktree based off head 2b47b40c.

Architecturally this PR is the right direction. Zero edits to tools/file_operations.py or hermes_cli/main.py is a significant win over #24168's ~150 LOC of core changes. The protocol layer (plugins/lsp/protocol.py) is also notably stronger — 8 KiB header cap, 64 MiB body cap, ASCII-only header decode, proper EOF handling. The walk-cap defense in workspace.py matches what we proposed in #24467.

That said, three of the seven defects we filed against #24168 have regressed in this port. They're not equivalent bugs in a new place — they're the same bugs, same severity, same root cause, in code that was clearly aware of the review. Filing them here so they don't ship hidden inside the plugin boundary.

🔴 Regression 1 — auto-install is enabled by default (same as our D5 against #24168)

LSPService.create_from_config (manager.py:207) resolves install_strategy to "auto" from an empty config. The plugin's only consent gate is the user adding lsp to plugins.enabled — after that, the first .py write in any git workspace triggers npm install pyright silently. Same for gopls, typescript-language-server, intelephense, the @vue/@svelte/@astro family, and 8 more entries in INSTALL_RECIPES.

Reproducer (run with empty ~/.hermes/config.yaml plus plugins: {enabled: [lsp]}):

from plugins.lsp.manager import LSPService
svc = LSPService.create_from_config()
print(svc._install_strategy)  # → 'auto'

Suggested fix: default install_strategy to "manual". Document opt-in via lsp.install_strategy: auto in config or via an explicit hermes lsp install <server> CLI invocation. The CLI subcommand is already wired in plugins/lsp/cli.py — first-run path can route through it.

🔴 Regression 2 — path-keyed delta baseline collides on concurrent edits (same as our D2)

LSPService._delta_baseline: Dict[str, List[...]] (manager.py:184) is keyed only by absolute path. In gateway mode where two sessions edit the same file concurrently, the baselines stomp each other and the agent loses visibility into errors it introduced.

Reproducer (mock service exercising only the dict):

T1: snapshot([])                     # baseline = []
T2:                snapshot([err_A]) # baseline = [err_A]   ← overwrites T1's
T1: filter(current=[err_A])          # → []  (err_A filtered as "pre-existing")

T1's real error is suppressed. Verified with a threading harness — concrete output available on request.

Suggested fix: key the baseline by (session_id, abs_path) (the same key shape used in __init__.py:_baselines). Plumb session_id from the hook kwargs through snapshot_baseline and get_diagnostics_sync.

🔴 Regression 3 — _idle_timeout set but never read; no reaper (same as our D3)

manager.py:66 defines DEFAULT_IDLE_TIMEOUT = 600 and __init__ assigns self._idle_timeout. Grep for any reader returns one hit — the assignment itself:

$ rg '_idle_timeout' plugins/lsp/
plugins/lsp/manager.py:167:        self._idle_timeout = idle_timeout

No _last_used reaper task is scheduled. Long-lived gateway processes accumulate one LSP server child per (server_id, workspace_root) pair touched, forever. processId: os.getpid() in initialize params is a useful safety net on hard crash, but doesn't help a healthy long-running process.

Suggested fix: schedule a periodic task on the background loop that walks self._last_used and shuts down clients past self._idle_timeout. ~30 LOC; same pattern we proposed in #24467 commit f63ca97.

🟠 Other findings (non-blocking but worth a pass)

  • _baselines set in __init__.py is mutated from _pre_tool_call and _transform_tool_result without a lock. Hooks run on per-session gateway threads. Lines 134-136 have a classic TOCTOU between membership test and discard. Also: entries leak unbounded if _pre_tool_call adds a key but _transform_tool_result doesn't fire (tool errored, hook ordering edge case).

  • _delta_baseline Dict writes at manager.py:273, 279, 322, 334 happen outside self._state_lock (which is used elsewhere for _clients/_broken/_last_used). Same race class as above.

  • _workspace_cache in workspace.py is a module-global dict, unsynchronized, never bounded. Slow growth in long-lived gateway processes touching many distinct cwds.

  • _install_results negative cache in install.py:98 stores None on failed install with no invalidation path. If the user later succeeds via hermes lsp install <pkg> manually, the cached None still wins until process restart.

  • _transform_tool_result round-trips the tool result through json.loads + json.dumps (lines 163, 167). Downstream consumers depending on byte-identical tool output (caching keys, content hashes) will see the hash flip whenever the plugin is enabled, with no functional change.

  • Reporter XML injection (minor): report_for_file at reporter.py:60 doesn't escape " or </diagnostics> in file_path. Low impact since paths come from agent calls, but unsanitized.

Suggested path forward

We have the same 7-fix consolidation already validated for #24168 in #24467 (204/204 tests passing, rebases clean against current main). The three critical regressions above (P9/P10/P13 in our scheme) port directly to this plugin codebase — happy to open a follow-up PR against daimon-nous:lsp-plugin mirroring those fixes if that's useful. The remaining findings are smaller and could land in the same PR or be left as TODOs.

If maintainers prefer #24414's plugin approach to #24168 + #24467 (which we'd agree with on blast-radius grounds), we'll close #24467 once these regressions are addressed.

Independent reviewer; not affiliated with the maintainer team. Worktree, reproducers, and full defect catalog (P1–P19) available on request.

teknium1 added a commit that referenced this pull request May 12, 2026
Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.
teknium1 added a commit that referenced this pull request May 12, 2026
…ile/patch (#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
@alt-glitch

Copy link
Copy Markdown
Collaborator

closed as #24168 has been merged

@alt-glitch alt-glitch closed this May 13, 2026
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…ile/patch (NousResearch#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from NousResearch#24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from NousResearch#24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
AlexFoxD pushed a commit to AlexFoxD/hermes-agent that referenced this pull request May 21, 2026
…ile/patch (NousResearch#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from NousResearch#24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from NousResearch#24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ile/patch (NousResearch#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from NousResearch#24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from NousResearch#24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/file File tools (read, write, patch, search) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants