Default toolsets cleanup by pefontana · Pull Request #2 · pefontana/hermes-agent

pefontana · 2026-04-15T19:33:45Z

What does this PR do?

Related Issue

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

How to Test

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform:

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

delegation.default_toolsets was declared in cli.py's CLI_CONFIG default dict and documented in cli-config.yaml.example, but never read: none of tools/delegate_tool.py, _load_config(), or any call site ever looked it up. The live fallback is the DEFAULT_TOOLSETS module constant at tools/delegate_tool.py:101, which stays as-is. hermes_cli/config.py's DEFAULT_CONFIG["delegation"] already omits the key — this commit aligns cli.py with that. Adds a regression test in tests/hermes_cli/test_config_drift.py so a future refactor that re-adds the key without wiring it up to _load_config() fails loudly. Part of Initiative 2 / M0.5.

Matches the default-config removal in the preceding commit. default_toolsets was documented for users to set but was never actually read at runtime, so showing it in the example config and the delegation user guide was misleading. No deprecation note is added: the key was always a no-op, so users who copied it from the example continue to see no behavior change. Their config.yaml still parses; the key is just silently unused, same as before. Part of Initiative 2 / M0.5.

…config The prior form of this test asserted on CLI_CONFIG["delegation"] after importing cli, which only passed by accident of pytest-xdist worker scheduling. cli._hermes_home is frozen at module import time (cli.py:76), before the tests/conftest.py autouse HERMES_HOME-isolation fixture can fire, so CLI_CONFIG ends up populated by deep-merging the contributor's actual ~/.hermes/config.yaml over the defaults (cli.py:359-366). Any contributor (like me) who still has the legacy key set in their own config causes a false failure the moment another test file in the same xdist worker imports cli at module level. Asserting on the source of load_cli_config() instead sidesteps all of that: the test now checks the defaults literal directly and is independent of user config, HERMES_HOME, import order, and worker scheduling. Demonstrated failure mode before this fix: pytest tests/hermes_cli/test_config_drift.py \ tests/hermes_cli/test_skills_hub.py -o addopts="" -> FAILED (CLI_CONFIG["delegation"] contained "default_toolsets" from the user's ~/.hermes/config.yaml) Part of Initiative 2 / M0.5.

…#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call #2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended.

Three independent reviews surfaced a handful of real bugs. Fixing all of them here: * **SIGTERM orphans hook subprocesses (codex #1).** The CLI only installed a SIGINT handler — SIGTERM (from ``kill``, ``timeout``, systemd stop, CI harnesses) skips atexit entirely and leaves every in-flight hook subprocess running as an orphan owned by init. Adds ``_async_pool_sigterm_handler`` which terminates tracked subprocess groups inline, then routes to ``sys.exit(128 + SIGTERM)``. Inline termination is required because ``ThreadPoolExecutor`` uses non-daemon threads: Python waits for every worker to return before running atexit, and workers block inside ``proc.communicate(timeout=spec.timeout)`` until the subprocess dies. Renamed ``_maybe_install_sigint_handler`` → ``_maybe_install_signal_handlers`` (with back-compat alias). Verified: ``kill -TERM`` on a hermes CLI running a 4 s ``sleep`` hook now exits in ~0.7 s with no orphan, was 4 s + orphan. * **Subprocess groups for reliable termination.** Hooks are now spawned with ``start_new_session=True`` so the subprocess is its own PGID leader. Shutdown / SIGINT / SIGTERM paths call ``os.killpg`` on the group instead of ``proc.terminate()`` — without this, a bash script's orphaned ``sleep`` child kept the parent stdout FD open and blocked ``proc.communicate`` for the full sleep duration. ``_terminate_group`` / ``_kill_group`` helpers fall back to plain ``terminate`` / ``kill`` on edge cases where ``getpgid`` fails (already-exited proc, non-POSIX). * **``hermes hooks test --no-wait`` blocks for full hook runtime (codex #2).** The flag advertised fire-and-forget but the CLI's ``ThreadPoolExecutor`` atexit ``pool.shutdown(wait=True)`` joined the worker anyway, which in turn waited for the subprocess. ``_cmd_test`` now polls briefly for ``_live_procs`` to fill (so the subprocess definitely spawned), then ``os._exit(0)`` — skipping atexit entirely. The subprocess keeps running under init because of ``start_new_session=True``. Verified: CLI exit dropped from 2.3 s to 76 ms for a 2-second hook, and the hook still writes its audit log 3 s later after the CLI is gone. * **Stale ``_child_role_for_batch`` test (claude #1 / hermes #2).** The test from commit 76d3ffd4 asserted the *old* helper field name — no code path sets it post-refactor (455c136f), so the test passed trivially without verifying anything. Fixed to assert ``_child_role`` (the real field) is stripped, and added an explanatory message so a future failure is easier to diagnose. Module-header docstring updated too. * **``submit()`` RuntimeError branch: stale-semaphore parity fix (claude #3).** Same pattern I already fixed in ``_on_async_future_done``, missed here: a concurrent ``_reset_async_pool`` between ``acquire`` and ``release`` would cause ``_async_sem_get()`` to lazy-create a fresh sem and over- release on it. Snapshot ``_async_sem_inst`` + swallow ``ValueError`` like the symmetric path. * **Shutdown race: proc registered after the snapshot (claude #4 / hermes #1).** Worker that got between ``subprocess.Popen()`` and ``_register_live_proc(proc)`` would miss the shutdown-sweep snapshot and block for the full ``spec.timeout``. After registering, the worker now checks ``_async_shutting_down`` and self-terminates its subprocess group. * **WARN log noise on SIGTERM'd children (claude #5).** Shutdown-induced exits (rc = -15 / -9) no longer spam a per-proc ``WARNING`` — demoted to ``DEBUG`` when ``_async_shutting_down`` is set. Both the atexit path and the signal handlers now set the flag before terminating, so a Ctrl-C or a ``kill -TERM`` with 10 running hooks emits zero warn lines instead of 10. Still outstanding (documented trade-offs, not fixed here): * Gateway shutdown blocks the event loop for up to ``grace_seconds`` (claude #2). Acknowledged as a follow-up candidate via ``loop.run_in_executor``. * ``_maybe_install_signal_handlers`` is still leading-underscore (claude NousResearch#6). Cosmetic; kept consistent with the rest of the module's private-by-convention API. All 101 hook tests still pass.

Two more real bugs surfaced by a follow-up review round: * **Windows regression (codex #1 / hermes).** The subprocess termination helpers called ``os.killpg(os.getpgid(proc.pid), signal.SIGKILL)`` guarded only by ``except (ProcessLookupError, OSError)``. On Windows those module attributes don't exist (``AttributeError``) and ``signal.SIGKILL`` is undefined, so any timeout / shutdown path would crash instead of cleaning up. Adds ``_IS_WINDOWS`` and the platform-guarded ``proc.terminate()`` / ``proc.kill()`` fallback, matching the convention in ``tools/process_registry.py``. Adds ``agent/shell_hooks.py`` to ``tests/tools/test_windows_compat.py`` ``GUARDED_FILES`` so the AST check enforces this going forward. * **SIGKILL escalation skipped under signal-initiated shutdown (codex #2).** ``_async_pool_sig{int,term}_handler`` flip ``_async_shutting_down = True`` before atexit / gateway shutdown runs ``shutdown_async_hooks``, which previously used that same flag for idempotency. Result: the documented SIGTERM-wait-SIGKILL escalation was silently skipped on every signal-initiated shutdown — a hook script with ``trap '' TERM`` only died via the per-hook timeout (60 s default) instead of the shutdown grace (5 s). Decouples the two concerns: ``_async_shutting_down`` still gates new submissions and demotes worker-exit log levels; idempotency now lives on a separate ``_async_cleanup_ran`` under ``_async_cleanup_lock`` (which also makes concurrent shutdown callers thread-safe, which the bare-bool gate wasn't). Regression test (``test_shutdown_escalates_even_when_shutting_down_flag_preset``) covers the TERM-trap path: pre-sets the flag, spawns a TERM-trapping subprocess, calls shutdown, and asserts the child was killed within the grace window. 112 hook tests pass. The 24 broader failures seen in tests/tools and tests/gateway reproduce identically on the pre-change branch — they are unrelated pre-existing breaks from being cut from an older merge base.

pefontana added 3 commits April 15, 2026 16:17

pefontana closed this in 8cc3ceb Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default toolsets cleanup#2

Default toolsets cleanup#2
pefontana wants to merge 3 commits into
delegate-dispatch-cleanupfrom
default-toolsets-cleanup

pefontana commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pefontana commented Apr 15, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant