feat(lint): opt-in LSP-backed lint path in _check_lint by OutThisLife · Pull Request #24155 · NousResearch/hermes-agent

OutThisLife · 2026-05-12T04:14:00Z

Summary

Replaces npx tsc --noEmit <single-file> (and the go vet / rustfmt --check siblings) with project-aware diagnostics from a real language server, behind an opt-in flag, so the agent stops chasing phantom "Cannot find module" errors that vanish the moment tsconfig.json is in scope.

The lint hook in tools/file_operations.py::_check_lint now resolves in this order:

LSP (new) — only fires when lint.lsp.enabled is true, the backend is LocalEnvironment, the file's extension maps to a known language, a project-root marker (tsconfig.json / Cargo.toml / go.mod, …) exists in the ancestor chain, and the server binary is on PATH. Returns None on every other case so the caller falls through cleanly.
In-process linters (unchanged) — ast.parse, json.loads, yaml.safe_load, tomllib.
Shell linter table (unchanged) — py_compile, node --check, tsc, go vet, rustfmt --check. This is what runs today and what the disabled LSP path falls through to.

With the flag off (the default) the only behaviour change is one extra import tools.lsp_lint on the lint path.

Why a hand-rolled client (vs. `multilspy` / `sansio-lsp-client` / `pygls`)

Documented in detail in the tools/lsp_client.py module docstring. Short version:

multilspy (MS Research, v0.0.15 "pre-alpha") is async-first, hard-pins jedi-language-server as a dep, auto-downloads server binaries to ~/.multilspy/ (collides with HERMES_HOME profile isolation), and its public API is built around navigation requests rather than diagnostics.
sansio-lsp-client would save ~50 lines of framing in exchange for a small new dep, but every other moving part below — subprocess pump, reader thread, request demux, per-(env, root) cache, idle reaper, error containment — would still be ours.
pygls is server-side only; client is "Coming Soon" in v2.

Net cost/benefit: own ~500 lines, document the contract, pull in lsprotocol for typed messages later if/when we need pull diagnostics or progress tokens.

What is in this PR

tools/lsp_client.py — stdio JSON-RPC client with Content-Length framing, per-(language, project_root, server_cmd) cache, daemon idle reaper, sync diagnostics(path, content) that uses textDocument/didOpen + publishDiagnostics with a configurable settle window for servers that emit "indexing" diagnostics first (notably typescript-language-server).
tools/lsp_lint.py — the bridge between _check_lint and the client. Returns None whenever LSP cannot or should not handle the file. Never raises (the lint hook is on the hot edit path; an exception here would surface as a write failure rather than a degraded warning).
tools/file_operations.py::_check_lint — calls the bridge first, falls through to existing linters unchanged.
hermes_cli/config.py — new lint.lsp.{enabled,servers,diagnostic_timeout,settle_ms,idle_shutdown} block, off by default. New top-level key, so deep-merge handles upgrades and no _config_version bump is required.

What is explicitly NOT in this PR

Container / SSH / Modal / Daytona / Vercel sandbox backends. The server has to live where the files live, which means baking it into the sandbox image or installing on first use; that is the gnarly half and gets its own PR. Until then the bridge falls through for non-local environments.
didChange / live editing. Every Hermes call here re-opens the file with the freshly-written bytes, so we never have an in-flight buffer to keep in sync.
Pull diagnostics (LSP 3.17 textDocument/diagnostic). Push works on every server we care about today.
Hint / info severities. Filtered out so the agent's verdict ("lint clean / lint dirty") doesn't change relative to the existing shell linters.

Test plan

scripts/run_tests.sh tests/tools/test_lsp_client.py tests/tools/test_lsp_lint_bridge.py tests/tools/test_check_lint_lsp_routing.py — 37 new tests pass (pure helpers + end-to-end against an in-tree fake LSP server written in Python so no system typescript-language-server is required).
scripts/run_tests.sh tests/tools/test_file_operations_edge_cases.py tests/tools/test_file_tools_live.py tests/tools/test_file_ops_cwd_tracking.py tests/tools/test_file_write_safety.py tests/tools/test_write_deny.py — 126 existing file-operations tests still pass (regression guard for the default-off path).
Manual smoke: enable in ~/.hermes/config.yaml (lint: { lsp: { enabled: true } }), npm i -g typescript-language-server, edit a .ts file via the agent inside a project with tsconfig.json, observe project-aware diagnostics in place of phantom tsc --noEmit noise.
Manual smoke: same as above with the flag off (the default) — agent should see the unchanged tsc --noEmit output.

Follow-ups (separate PRs)

Container/SSH backends — server installation/lifecycle inside the sandbox, JSON-RPC over _run_bash's stdin/stdout.
Pull diagnostics + capability negotiation, gated on what the server advertises in initialize.
_check_lint_delta could ditch the pre-content re-ast.parse once we have diagnostic snapshots from both states, since LSP gives us structured diagnostics rather than line-equality strings.

The post-edit lint hook in `tools/file_operations.py::_check_lint` has historically resolved `.ts`/`.tsx` to `npx tsc --noEmit <single-file>`, which has no view of the surrounding project and floods the agent with phantom "Cannot find module" errors that disappear the moment `tsconfig.json` is in scope. Same shape for `.go` (`go vet` orphan) and `.rs` (`rustfmt --check` is style, not types). This change adds an opt-in LSP path that runs *before* the in-process / shell linter table: - `tools/lsp_client.py` — stdio JSON-RPC client with Content-Length framing, per-(language, project_root, server_cmd) cache, idle reaper. Sync API (`diagnostics(path, content)`) returns one snapshot via `textDocument/didOpen` + `publishDiagnostics` with a configurable settle window for servers that emit "indexing" diagnostics first. Hand-rolled rather than pulled from `multilspy` / `sansio-lsp-client` / `pygls` — see the module docstring for the comparison. - `tools/lsp_lint.py` — bridge between `_check_lint` and the client. Returns `None` (caller falls through to the existing shell linter) whenever LSP cannot or should not handle the file: feature off, language unmapped, env not local, no project root marker, server binary missing, request timed out. Never raises. - `tools/file_operations.py::_check_lint` — calls the bridge first, falls through to the in-process and shell linters unchanged. With the flag off (the default) this is one extra `import` on the lint path and zero behaviour change. - `hermes_cli/config.py` — new `lint.lsp.{enabled,servers,...}` block, off by default. New top-level key, so the deep-merge handles upgrades and no `_config_version` bump is required. Out of scope for this PR (explicitly noted in the module docstrings): - Container / SSH / Modal / Daytona backends. The server has to run inside the sandbox, which means baking it into the image or installing on first use; that's the gnarly half and gets its own PR. - didChange / live editing. We re-open with the freshly written bytes on every call. - Pull diagnostics (LSP 3.17). Push works on every server we care about today. Tests: - `test_lsp_client.py` — pure helpers (project root walk, URI, framing, Diagnostic conversion) + end-to-end coverage against an in-tree fake LSP server (Python script over stdio) for clean / dirty / settle / timeout / registry caching / shutdown. - `test_lsp_lint_bridge.py` — feature flag, backend gating, project root gating, language-map gating, error containment. - `test_check_lint_lsp_routing.py` — regression guard that the default (LSP disabled) path still hits in-process / shell linters, plus proof that an enabled bridge short-circuits the shell linter.

github-actions · 2026-05-12T04:15:13Z

🔎 Lint report: `bb/lsp-lint` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8178 on HEAD, 8175 on base (🆕 +3)

🆕 New issues (27):

Rule	Count
`unsupported-operator`	13
`unresolved-attribute`	7
`invalid-argument-type`	3
`unresolved-import`	3
`invalid-return-type`	1

First entries

tests/tools/test_web_providers.py:155: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["search_backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/tools/test_web_providers.py:156: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["extract_backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_kanban_core_functionality.py:3176: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/cli/test_reasoning_command.py:552: [invalid-argument-type] invalid-argument-type: Argument to bound method `TestCase.assertIn` is incorrect: Expected `Iterable[Any] | Container[Any]`, found `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/cli/test_fast_command.py:480: [invalid-argument-type] invalid-argument-type: Argument to bound method `TestCase.assertIn` is incorrect: Expected `Iterable[Any] | Container[Any]`, found `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_aux_config.py:41: [unsupported-operator] unsupported-operator: Operator `>` is not supported between objects of type `Unknown | int | list[Unknown] | ... omitted 4 union elements` and `Literal[0]`
tests/tools/test_browser_lightpanda.py:242: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["engine"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_destructive_slash_confirm_gate.py:32: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_mcp_reload_confirm_gate.py:34: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_aux_config.py:56: [unresolved-attribute] unresolved-attribute: Attribute `keys` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
hermes_cli/config.py:3656: [unresolved-attribute] unresolved-attribute: Attribute `items` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/tools/test_web_providers.py:154: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/hermes_cli/test_aux_config.py:37: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["title_generation"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
hermes_cli/config.py:3029: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `tuple[int, int]`, found `tuple[Any, str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements]`
tests/agent/test_curator.py:860: [unsupported-operator] unsupported-operator: Operator `>` is not supported between objects of type `Unknown | int | list[Unknown] | ... omitted 4 union elements` and `Literal[0]`
tests/gateway/test_whatsapp_reply_prefix.py:121: [unsupported-operator] unsupported-operator: Operator `>=` is not supported between objects of type `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements` and `int`
tests/cli/test_resume_display.py:648: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["resume_display"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
tests/agent/test_auxiliary_config_bridge.py:277: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["provider"]` and `Unknown | int | str | ... omitted 13 union elements`
hermes_cli/config.py:3675: [unresolved-attribute] unresolved-attribute: Attribute `items` is not defined on `int`, `str`, `list[Unknown]`, `float`, `None` in union `Unknown | int | str | ... omitted 13 union elements`
tests/tools/test_check_lint_lsp_routing.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_lsp_lint_bridge.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_curator.py:855: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["curator"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
gateway/restart.py:10: [invalid-argument-type] invalid-argument-type: Argument to constructor `float.__new__` is incorrect: Expected `str | Buffer | SupportsFloat | SupportsIndex`, found `Unknown | int | str | ... omitted 13 union elements`
tests/tools/test_lsp_client.py:26: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_browser_console.py:265: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["record_sessions"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 26 union elements`
... and 2 more

✅ Fixed issues (24):

Rule	Count
`unsupported-operator`	13
`unresolved-attribute`	7
`invalid-argument-type`	3
`invalid-return-type`	1

First entries

tests/hermes_cli/test_aux_config.py:37: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["title_generation"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/tools/test_web_providers.py:155: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["search_backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/hermes_cli/test_aux_config.py:56: [unresolved-attribute] unresolved-attribute: Attribute `keys` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
hermes_cli/config.py:3636: [unresolved-attribute] unresolved-attribute: Attribute `items` is not defined on `int`, `str`, `list[Unknown]`, `float`, `None` in union `Unknown | int | str | ... omitted 12 union elements`
tests/hermes_cli/test_aux_config.py:41: [unsupported-operator] unsupported-operator: Operator `>` is not supported between objects of type `Unknown | int | list[Unknown] | ... omitted 3 union elements` and `Literal[0]`
hermes_cli/config.py:3627: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/agent/test_auxiliary_config_bridge.py:278: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["model"]` and `Unknown | int | str | ... omitted 12 union elements`
tests/hermes_cli/test_kanban_core_functionality.py:3176: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/tools/test_browser_console.py:265: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["record_sessions"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/agent/test_auxiliary_config_bridge.py:277: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["provider"]` and `Unknown | int | str | ... omitted 12 union elements`
tests/agent/test_curator.py:860: [unsupported-operator] unsupported-operator: Operator `>` is not supported between objects of type `Unknown | int | list[Unknown] | ... omitted 3 union elements` and `Literal[0]`
tests/hermes_cli/test_destructive_slash_confirm_gate.py:32: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/hermes_cli/test_mcp_reload_confirm_gate.py:34: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
gateway/restart.py:10: [invalid-argument-type] invalid-argument-type: Argument to constructor `float.__new__` is incorrect: Expected `str | Buffer | SupportsFloat | SupportsIndex`, found `Unknown | int | str | ... omitted 12 union elements`
tests/cli/test_resume_display.py:648: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["resume_display"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/gateway/test_whatsapp_reply_prefix.py:121: [unsupported-operator] unsupported-operator: Operator `>=` is not supported between objects of type `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements` and `int`
hermes_cli/config.py:3617: [unresolved-attribute] unresolved-attribute: Attribute `items` is not defined on `str`, `list[Unknown]`, `list[str]`, `int` in union `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/agent/test_curator.py:855: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["curator"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/cli/test_reasoning_command.py:552: [invalid-argument-type] invalid-argument-type: Argument to bound method `TestCase.assertIn` is incorrect: Expected `Iterable[Any] | Container[Any]`, found `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/cli/test_fast_command.py:480: [invalid-argument-type] invalid-argument-type: Argument to bound method `TestCase.assertIn` is incorrect: Expected `Iterable[Any] | Container[Any]`, found `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/tools/test_web_providers.py:154: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/tools/test_browser_lightpanda.py:242: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["engine"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
tests/tools/test_web_providers.py:156: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["extract_backend"]` and `str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements`
hermes_cli/config.py:2990: [invalid-return-type] invalid-return-type: Return type does not match returned value: expected `tuple[int, int]`, found `tuple[Any, str | dict[Unknown, Unknown] | list[Unknown] | ... omitted 25 union elements]`

Unchanged: 4283 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Adds per-call structured logging on the dedicated ``hermes.lint.lsp`` logger so an opt-in user can answer "did LSP fire on that edit?" with ``rg 'lsp\['`` against ``~/.hermes/logs/agent.log``. Levels are tuned so a 1000-write session emits exactly ONE INFO line at the default threshold, not 1000. Level model ----------- * ``DEBUG`` (invisible at the default INFO threshold) for every per-call steady-state event: ``clean``, ``feature off``, ``extension not mapped``, ``backend not local``, repeated ``no project root`` for an already-announced file, repeated ``server unavailable`` for an already-announced binary. * ``INFO`` for state transitions worth surfacing once: ``active for <root>`` the first time a (language, project_root) client starts, ``no project root for <path>`` the first time we see that orphan file. Plus every ``N diags`` event — diagnostics are inherently rare per-edit and are exactly the failure signal users want to grep for. * ``WARNING`` for action-required failures the first time per (language, binary): ``server unavailable`` (binary not on PATH), ``no server configured``. Per-call ``WARNING`` for timeouts, server errors, and unexpected bridge exceptions — these are inherently novel events, not steady state, and each one is its own signal. Dedup is in-process module-level sets guarded by a lock. Sets grow at most by the number of distinct (language, project_root) and (language, binary) pairs touched in one Python process — a few hundred entries in the most aggressive monorepo session, which is bytes of memory. A bounded LRU was rejected because evicting an entry would risk re-firing the WARNING/INFO line we explicitly want to suppress. Why this matters ---------------- The previous draft logged every per-call event at INFO. ``agent.log`` caps at 5 MB × 3 backups (= 20 MB) via ``RotatingFileHandler``, so nothing would crash, but a normal coding session would dwarf the actual signal under hundreds of ``lsp[typescript] clean (...)`` lines. The new model preserves the verification answer ("LSP active for <root>") and the action-required signals while keeping clean steady state out of the user's face. Tests ----- * ``TestLogLevelsSteadyState`` — feature off, unmapped extension, non- local backend, and repeated clean writes all stay at DEBUG. Exactly one INFO ("active for ...") survives across N calls. * ``TestLogLevelsNovelEvents`` — diagnostics are INFO per call; ``active for`` fires once per (language, root). * ``TestLogLevelsActionRequired`` — server unavailable warns once per binary; orphan files INFO once per path; timeouts WARN every time.

Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call.

daimon-nous · 2026-05-12T15:16:52Z

Shipped as plugin in #24414. Your design influence (opt-in default, simpler scope, lint.lsp.enabled gating) shaped the final architecture — the plugin is disabled by default and uses detect-only binary resolution rather than auto-install. Thank you for the PR.

…ile/patch (#24168) * feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.

…ile/patch (NousResearch#24168) * feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from NousResearch#24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from NousResearch#24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.

alt-glitch added type/feature New feature or request comp/tools Tool registry, model_tools, toolsets tool/file File tools (read, write, patch, search) P3 Low — cosmetic, nice to have labels May 12, 2026

alt-glitch mentioned this pull request May 12, 2026

feat(lsp): semantic diagnostics from real language servers in write_file/patch #24168

Merged

daimon-nous Bot mentioned this pull request May 12, 2026

feat(lsp): ship LSP diagnostics as plugin with zero core changes #24414

Closed

scubamount mentioned this pull request May 12, 2026

consolidation: fix 7 defects in PR #24168 LSP feature (orphan procs, patch_replace double-lint, opt-in defaults, +4 more) #24467

Closed

OutThisLife closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lint): opt-in LSP-backed lint path in _check_lint#24155

feat(lint): opt-in LSP-backed lint path in _check_lint#24155
OutThisLife wants to merge 2 commits into
mainfrom
bb/lsp-lint

OutThisLife commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

daimon-nous Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OutThisLife commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a hand-rolled client (vs. multilspy / sansio-lsp-client / pygls)

What is in this PR

What is explicitly NOT in this PR

Test plan

Follow-ups (separate PRs)

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: bb/lsp-lint vs origin/main

ruff

ty (type checker)

Uh oh!

daimon-nous Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OutThisLife commented May 12, 2026 •

edited

Loading

Why a hand-rolled client (vs. `multilspy` / `sansio-lsp-client` / `pygls`)

github-actions Bot commented May 12, 2026 •

edited

Loading

🔎 Lint report: `bb/lsp-lint` vs `origin/main`