fix(tests): unblock CI on eight unrelated test failures by 0xyg3n · Pull Request #18352 · NousResearch/hermes-agent

0xyg3n · 2026-05-01T11:15:58Z

Summary

Eight independent test-only fixes to unblock the Tests workflow on main. None are functionality changes — each fix targets a test that has been failing on main @ 75e1339d independently of any open PR. The two run-to-run variants (update_yes_flag and the kanban WS test, alternating between back-to-back failing runs) share root causes with the others — recent production changes drifted from frozen test contracts, or test mocks no longer cover the full surface they patch.

Reference run: https://github.com/NousResearch/hermes-agent/actions/runs/25210234363

Failures addressed

Test	Root cause	Fix
`tests/acp/test_server.py::TestSessionOps::test_send_available_commands_update`	`_ADVERTISED_COMMANDS` grew to add `steer` and `queue`; the test still asserted the 7-command list.	Update the expected list to match `_ADVERTISED_COMMANDS`.
`tests/gateway/test_teams.py::TestTeamsSend::test_send_typing`	Teams adapter started importing `microsoft_teams.common.http.client.ClientOptions` (microsoft-teams-apps 2.0.0); the test's fake SDK didn't register that submodule, so the whole `try` import block fell to `except ImportError` and `TypingActivityInput` became `None`. `send_typing` then raised inside its bare `except`, swallowing the call.	Register `microsoft_teams.common.http.client.ClientOptions` in the test mock so the import block stays on the success path.
`tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap`	8 threads × 20 real-`AIAgent` constructions (160 inits) exceeded the 30 s thread-join after recent guardrail/MCP work made `AIAgent.__init__` measurably heavier.	Halve to 80 inits (still ~5× over the 16-slot cap, so eviction races are still exercised) and bump the join timeout to 60 s.
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty` `tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`	`cmd_update` wraps `_cmd_update_impl` in `_install_hangup_protection`, which replaces `sys.stdout` / `sys.stderr` with `_UpdateOutputStream` mirrors. Under xdist the wrapper's own `isatty()` answer became race-dependent on prior writes flagging `_original_broken`, subverting the test's `patch("hermes_cli.main.sys")`.	Patch `_install_hangup_protection` to a no-op state and stub `_finalize_update_output` so `isatty()` answers the test mock directly.
`tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_required`	`_check_ws_token` resolves `hermes_cli.web_server` via `from hermes_cli import web_server as _ws`. When an earlier test on the same xdist worker imported the real module, `web_server` was cached as an attribute on the `hermes_cli` package; subsequent `from hermes_cli import web_server` resolved via that attribute and silently bypassed the test's `sys.modules` stub, comparing the test token against the real module's per-process random `_SESSION_TOKEN`.	Patch the `web_server` attribute on the package in addition to the `sys.modules` entry so both import shapes return the stub.
`tests/run_agent/test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pending` `tests/run_agent/test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted`	Concurrent-tool path now consults `self._tool_guardrails.before_call(...)` and calls `self._append_guardrail_observation(...)` after each result; the hand-rolled `_Stub` had neither, raising `AttributeError` before exercising interrupt fan-out.	Add a permissive `_tool_guardrails` MagicMock that returns always-allow `SimpleNamespace` decisions, bind the real `_append_guardrail_observation` (no-op under the permissive mock), stub `_set_tool_guardrail_halt`, and accept `**kwargs` in the fake tool callables so the new `pre_tool_block_checked=` argument doesn't trip them.
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies` `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`	PR `a49f4c6` ("fix: prevent tui rebuilding assets") intentionally replaced the explicit cp-then-install materialize block with the simpler `COPY ui-tui/packages/hermes-ink/` + `npm_config_install_links=false` approach (npm 10+ default symlink behaviour), but the contract tests still asserted the old command sequence verbatim.	Update both tests to accept either materialization shape — the behavioural contract (`@hermes/ink` reachable at `node_modules/@hermes/ink` at runtime) is preserved by both paths.

Verification

Targeted run, twice (no flakes):

python -m pytest -q -o addopts="-m 'not integration'" \
  tests/acp/test_server.py::TestSessionOps::test_send_available_commands_update \
  tests/gateway/test_teams.py::TestTeamsSend::test_send_typing \
  tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap \
  tests/hermes_cli/test_update_yes_flag.py \
  tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_required \
  tests/run_agent/test_concurrent_interrupt.py \
  tests/tools/test_dockerfile_pid1_reaping.py
# 17 passed (run 1)
# 17 passed (run 2)

Broader regression — all touched test files:

python -m pytest -q tests/acp/ tests/gateway/test_teams.py \
  tests/gateway/test_agent_cache.py tests/hermes_cli/test_update_yes_flag.py \
  tests/plugins/test_kanban_dashboard_plugin.py \
  tests/run_agent/test_concurrent_interrupt.py \
  tests/tools/test_dockerfile_pid1_reaping.py
# 321 passed

No production code is touched; the diff is test-files-only.

Eight independent test failures observed on main @ 75e1339, all caused by recent production changes drifting from frozen test contracts (or by test mocks that didn't track the production interface they exercise). Each fix is local to one test file; no production code changes. * tests/acp/test_server.py: advertised slash-command list grew to add ``steer`` and ``queue``; align the expected list with ``_ADVERTISED_COMMANDS``. * tests/gateway/test_teams.py: the Teams adapter started importing ``microsoft_teams.common.http.client.ClientOptions`` to set the User-Agent (microsoft-teams-apps 2.0.0). The fake SDK installed by ``_ensure_teams_mock()`` didn't register that submodule, so the adapter's whole try/except ImportError ran the except branch and every SDK symbol — including ``TypingActivityInput`` — became ``None``. ``send_typing`` then raised inside its own bare except, swallowing the call and leaving ``mock_app.send`` un-awaited. Add ``microsoft_teams.common.http.client`` with a ``ClientOptions`` stub so the import block stays on the success path. * tests/gateway/test_agent_cache.py: 8 threads × 20 real-AIAgent inserts (160 constructions) exceeded the 30s thread-join timeout on loaded CI runners after recent guardrail/MCP work made AIAgent init measurably heavier. Halve to 80 constructions (still ~5× over the 16-slot cap so eviction races are exercised) and bump the join timeout to 60s. * tests/hermes_cli/test_update_yes_flag.py: ``cmd_update`` wraps ``_cmd_update_impl`` in ``_install_hangup_protection``, which replaces ``sys.stdout`` / ``sys.stderr`` with ``_UpdateOutputStream`` mirrors. In the two prompt-flow tests that patch ``hermes_cli.main.sys`` to make ``isatty()`` return True, the wrapping subverts the patch under xdist concurrency: the wrapper caches the (mocked) original but replaces ``sys.stdout`` with itself, so subsequent ``sys.stdout.isatty()`` calls go through the wrapper's own ``isatty()`` whose answer depends on whether earlier writes flagged ``_original_broken``. Patch ``_install_hangup_protection`` to a no-op state and stub ``_finalize_update_output`` so isatty() answers the test mock directly. * tests/plugins/test_kanban_dashboard_plugin.py: ``_check_ws_token`` resolves ``hermes_cli.web_server`` via ``from hermes_cli import web_server as _ws``. When an earlier test on the same xdist worker has imported the real module, ``hermes_cli.web_server`` gets cached as an attribute on the ``hermes_cli`` package; ``from hermes_cli import web_server`` then resolves via that attribute and silently bypasses the test's ``sys.modules`` stub, comparing the test token against the real module's per-process random ``_SESSION_TOKEN`` and rejecting the WS connection. Patch the ``web_server`` attribute on the package in addition to the ``sys.modules`` entry so both import shapes return the stub. * tests/run_agent/test_concurrent_interrupt.py: the agent's concurrent tool-execution path now consults a per-turn ``_tool_guardrails`` controller and calls ``_append_guardrail_observation`` after each tool result. The test's hand-rolled ``_Stub`` had neither, so every test raised ``AttributeError`` before exercising interrupt fan-out. Add a permissive ``_tool_guardrails`` MagicMock that returns always-allow ``SimpleNamespace`` decisions, bind the real ``_append_guardrail_observation`` (no-op under the permissive mock), stub ``_set_tool_guardrail_halt``, and make the test's fake tool callables accept ``**kwargs`` so the new ``pre_tool_block_checked=`` argument doesn't trip them. * tests/tools/test_dockerfile_pid1_reaping.py: PR a49f4c6 ("fix: prevent tui rebuilding assets") intentionally replaced the explicit cp-then-install materialize block with the simpler ``COPY ui-tui/packages/hermes-ink/`` + ``npm_config_install_links=false`` approach (npm 10+ default symlink behaviour), but the contract tests still asserted the old command sequence verbatim. Update both tests to accept either materialization shape — the behavioural contract (``@hermes/ink`` reachable at ``node_modules/@hermes/ink`` at runtime) is preserved by both.

Two tests in this file failed flakily on CI under xdist: test_no_yes_flag_still_prompts_in_tty and test_yes_restores_stash_without_prompting. Captured stdout in the failures showed the *real* production functions (``_stash_local_changes_if_needed``, ``_restore_stashed_changes``) running, and the migration prompt's ``Non-interactive session`` branch firing despite the test patching ``hermes_cli.main.sys`` to make ``isatty()`` return True — every patch a no-op. Root cause: ``test_env_loader.py`` and ``test_skills_subparser.py`` evict ``hermes_cli.main`` from ``sys.modules``. When either runs first on an xdist worker, our module-top ``from hermes_cli.main import cmd_update`` keeps a reference to the *old* module object whose ``__dict__`` is no longer ``sys.modules["hermes_cli.main"].__dict__``. ``@patch("hermes_cli.main.X")`` then patches the *new* module that the suite has since reloaded, but ``cmd_update.__globals__`` still points at the old dict — so every internal name (``_install_hangup_protection``, ``_cmd_update_impl``, ``_stash_local_changes_if_needed``, ``_restore_stashed_changes``, ``sys``, …) resolves to the unpatched original. This is the same pollution ``test_update_stale_dashboard.py`` already documents and guards against in its autouse fixture. Fix: resolve ``cmd_update`` from the *live* ``hermes_cli.main`` inside each test (via ``sys.modules`` / ``importlib.import_module``) so the function we call shares ``__globals__`` with the module that ``@patch("hermes_cli.main.X")`` is patching. Also add an autouse fixture that guarantees the live module is imported before each test.

0xyg3n · 2026-05-01T11:43:48Z

The two test_update_yes_flag.py failures on the previous CI run were caused by xdist module-pollution, not by the patches I added.

test_env_loader.py and test_skills_subparser.py evict hermes_cli.main from sys.modules. When either runs first on the same xdist worker, the module-top from hermes_cli.main import cmd_update here keeps a reference to the old module object — cmd_update.__globals__ is the old __dict__, no longer sys.modules["hermes_cli.main"].__dict__. @patch("hermes_cli.main.X") then patches the new module the suite reloaded into sys.modules, but cmd_update's name lookups still resolve through the old globals, so every patch becomes a no-op and the real production code runs through. Same pollution that test_update_stale_dashboard.py already documents in its autouse fixture.

Fix in 8dd92b8: resolve cmd_update from the live hermes_cli.main inside each test (_live_cmd_update()), and an autouse fixture that re-imports if the prior test evicted it. The function we now call shares __globals__ with the module that @patch("hermes_cli.main.X") patches.

Reproduced the failure locally by running pytest tests/hermes_cli/ -n auto until a worker happened to schedule one of the polluters before the update_yes_flag tests; with the fix, 5 consecutive runs of the full tests/hermes_cli/ suite show the two tests passing every time.

0xyg3n mentioned this pull request May 1, 2026

fix(gateway/discord): require allowlist auth on slash commands #18125

Closed

alt-glitch added type/test Test coverage or test infrastructure P2 Medium — degraded but workaround exists labels May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): unblock CI on eight unrelated test failures#18352

fix(tests): unblock CI on eight unrelated test failures#18352
0xyg3n wants to merge 2 commits into
NousResearch:mainfrom
0xyg3n:fix/upstream-ci-rot

0xyg3n commented May 1, 2026

Uh oh!

0xyg3n commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xyg3n commented May 1, 2026

Summary

Failures addressed

Verification

Uh oh!

0xyg3n commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants