Skip to content

fix(tests): unblock CI on eight unrelated test failures#18352

Open
0xyg3n wants to merge 2 commits into
NousResearch:mainfrom
0xyg3n:fix/upstream-ci-rot
Open

fix(tests): unblock CI on eight unrelated test failures#18352
0xyg3n wants to merge 2 commits into
NousResearch:mainfrom
0xyg3n:fix/upstream-ci-rot

Conversation

@0xyg3n

@0xyg3n 0xyg3n commented May 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Eight independent test-only fixes to unblock the Tests workflow on main. None are functionality changes — each fix targets a test that has been failing on main @ 75e1339d independently of any open PR. The two run-to-run variants (update_yes_flag and the kanban WS test, alternating between back-to-back failing runs) share root causes with the others — recent production changes drifted from frozen test contracts, or test mocks no longer cover the full surface they patch.

Reference run: https://github.com/NousResearch/hermes-agent/actions/runs/25210234363

Failures addressed

Test Root cause Fix
tests/acp/test_server.py::TestSessionOps::test_send_available_commands_update _ADVERTISED_COMMANDS grew to add steer and queue; the test still asserted the 7-command list. Update the expected list to match _ADVERTISED_COMMANDS.
tests/gateway/test_teams.py::TestTeamsSend::test_send_typing Teams adapter started importing microsoft_teams.common.http.client.ClientOptions (microsoft-teams-apps 2.0.0); the test's fake SDK didn't register that submodule, so the whole try import block fell to except ImportError and TypingActivityInput became None. send_typing then raised inside its bare except, swallowing the call. Register microsoft_teams.common.http.client.ClientOptions in the test mock so the import block stays on the success path.
tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap 8 threads × 20 real-AIAgent constructions (160 inits) exceeded the 30 s thread-join after recent guardrail/MCP work made AIAgent.__init__ measurably heavier. Halve to 80 inits (still ~5× over the 16-slot cap, so eviction races are still exercised) and bump the join timeout to 60 s.
tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
cmd_update wraps _cmd_update_impl in _install_hangup_protection, which replaces sys.stdout / sys.stderr with _UpdateOutputStream mirrors. Under xdist the wrapper's own isatty() answer became race-dependent on prior writes flagging _original_broken, subverting the test's patch("hermes_cli.main.sys"). Patch _install_hangup_protection to a no-op state and stub _finalize_update_output so isatty() answers the test mock directly.
tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_required _check_ws_token resolves hermes_cli.web_server via from hermes_cli import web_server as _ws. When an earlier test on the same xdist worker imported the real module, web_server was cached as an attribute on the hermes_cli package; subsequent from hermes_cli import web_server resolved via that attribute and silently bypassed the test's sys.modules stub, comparing the test token against the real module's per-process random _SESSION_TOKEN. Patch the web_server attribute on the package in addition to the sys.modules entry so both import shapes return the stub.
tests/run_agent/test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pending
tests/run_agent/test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interrupted
Concurrent-tool path now consults self._tool_guardrails.before_call(...) and calls self._append_guardrail_observation(...) after each result; the hand-rolled _Stub had neither, raising AttributeError before exercising interrupt fan-out. Add a permissive _tool_guardrails MagicMock that returns always-allow SimpleNamespace decisions, bind the real _append_guardrail_observation (no-op under the permissive mock), stub _set_tool_guardrail_halt, and accept **kwargs in the fake tool callables so the new pre_tool_block_checked= argument doesn't trip them.
tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies
tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package
PR a49f4c6 ("fix: prevent tui rebuilding assets") intentionally replaced the explicit cp-then-install materialize block with the simpler COPY ui-tui/packages/hermes-ink/ + npm_config_install_links=false approach (npm 10+ default symlink behaviour), but the contract tests still asserted the old command sequence verbatim. Update both tests to accept either materialization shape — the behavioural contract (@hermes/ink reachable at node_modules/@hermes/ink at runtime) is preserved by both paths.

Verification

Targeted run, twice (no flakes):

python -m pytest -q -o addopts="-m 'not integration'" \
  tests/acp/test_server.py::TestSessionOps::test_send_available_commands_update \
  tests/gateway/test_teams.py::TestTeamsSend::test_send_typing \
  tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap \
  tests/hermes_cli/test_update_yes_flag.py \
  tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_required \
  tests/run_agent/test_concurrent_interrupt.py \
  tests/tools/test_dockerfile_pid1_reaping.py
# 17 passed (run 1)
# 17 passed (run 2)

Broader regression — all touched test files:

python -m pytest -q tests/acp/ tests/gateway/test_teams.py \
  tests/gateway/test_agent_cache.py tests/hermes_cli/test_update_yes_flag.py \
  tests/plugins/test_kanban_dashboard_plugin.py \
  tests/run_agent/test_concurrent_interrupt.py \
  tests/tools/test_dockerfile_pid1_reaping.py
# 321 passed

No production code is touched; the diff is test-files-only.

Eight independent test failures observed on main @ 75e1339, all caused
by recent production changes drifting from frozen test contracts (or by
test mocks that didn't track the production interface they exercise).
Each fix is local to one test file; no production code changes.

* tests/acp/test_server.py: advertised slash-command list grew to add
  ``steer`` and ``queue``; align the expected list with
  ``_ADVERTISED_COMMANDS``.

* tests/gateway/test_teams.py: the Teams adapter started importing
  ``microsoft_teams.common.http.client.ClientOptions`` to set the
  User-Agent (microsoft-teams-apps 2.0.0).  The fake SDK installed by
  ``_ensure_teams_mock()`` didn't register that submodule, so the
  adapter's whole try/except ImportError ran the except branch and
  every SDK symbol — including ``TypingActivityInput`` — became
  ``None``.  ``send_typing`` then raised inside its own bare except,
  swallowing the call and leaving ``mock_app.send`` un-awaited.  Add
  ``microsoft_teams.common.http.client`` with a ``ClientOptions`` stub
  so the import block stays on the success path.

* tests/gateway/test_agent_cache.py: 8 threads × 20 real-AIAgent
  inserts (160 constructions) exceeded the 30s thread-join timeout on
  loaded CI runners after recent guardrail/MCP work made AIAgent init
  measurably heavier.  Halve to 80 constructions (still ~5× over the
  16-slot cap so eviction races are exercised) and bump the join
  timeout to 60s.

* tests/hermes_cli/test_update_yes_flag.py: ``cmd_update`` wraps
  ``_cmd_update_impl`` in ``_install_hangup_protection``, which
  replaces ``sys.stdout`` / ``sys.stderr`` with ``_UpdateOutputStream``
  mirrors.  In the two prompt-flow tests that patch
  ``hermes_cli.main.sys`` to make ``isatty()`` return True, the
  wrapping subverts the patch under xdist concurrency: the wrapper
  caches the (mocked) original but replaces ``sys.stdout`` with
  itself, so subsequent ``sys.stdout.isatty()`` calls go through the
  wrapper's own ``isatty()`` whose answer depends on whether earlier
  writes flagged ``_original_broken``.  Patch
  ``_install_hangup_protection`` to a no-op state and stub
  ``_finalize_update_output`` so isatty() answers the test mock
  directly.

* tests/plugins/test_kanban_dashboard_plugin.py:
  ``_check_ws_token`` resolves ``hermes_cli.web_server`` via
  ``from hermes_cli import web_server as _ws``.  When an earlier test
  on the same xdist worker has imported the real module,
  ``hermes_cli.web_server`` gets cached as an attribute on the
  ``hermes_cli`` package; ``from hermes_cli import web_server`` then
  resolves via that attribute and silently bypasses the test's
  ``sys.modules`` stub, comparing the test token against the real
  module's per-process random ``_SESSION_TOKEN`` and rejecting the
  WS connection.  Patch the ``web_server`` attribute on the package
  in addition to the ``sys.modules`` entry so both import shapes
  return the stub.

* tests/run_agent/test_concurrent_interrupt.py: the agent's
  concurrent tool-execution path now consults a per-turn
  ``_tool_guardrails`` controller and calls
  ``_append_guardrail_observation`` after each tool result.  The
  test's hand-rolled ``_Stub`` had neither, so every test raised
  ``AttributeError`` before exercising interrupt fan-out.  Add a
  permissive ``_tool_guardrails`` MagicMock that returns
  always-allow ``SimpleNamespace`` decisions, bind the real
  ``_append_guardrail_observation`` (no-op under the permissive
  mock), stub ``_set_tool_guardrail_halt``, and make the test's
  fake tool callables accept ``**kwargs`` so the new
  ``pre_tool_block_checked=`` argument doesn't trip them.

* tests/tools/test_dockerfile_pid1_reaping.py: PR a49f4c6 ("fix:
  prevent tui rebuilding assets") intentionally replaced the
  explicit cp-then-install materialize block with the simpler
  ``COPY ui-tui/packages/hermes-ink/`` + ``npm_config_install_links=false``
  approach (npm 10+ default symlink behaviour), but the contract
  tests still asserted the old command sequence verbatim.  Update
  both tests to accept either materialization shape — the
  behavioural contract (``@hermes/ink`` reachable at
  ``node_modules/@hermes/ink`` at runtime) is preserved by both.
@alt-glitch alt-glitch added type/test Test coverage or test infrastructure P2 Medium — degraded but workaround exists labels May 1, 2026
Two tests in this file failed flakily on CI under xdist:
test_no_yes_flag_still_prompts_in_tty and
test_yes_restores_stash_without_prompting.  Captured stdout in the
failures showed the *real* production functions
(``_stash_local_changes_if_needed``, ``_restore_stashed_changes``)
running, and the migration prompt's ``Non-interactive session`` branch
firing despite the test patching ``hermes_cli.main.sys`` to make
``isatty()`` return True — every patch a no-op.

Root cause: ``test_env_loader.py`` and ``test_skills_subparser.py``
evict ``hermes_cli.main`` from ``sys.modules``.  When either runs first
on an xdist worker, our module-top ``from hermes_cli.main import
cmd_update`` keeps a reference to the *old* module object whose
``__dict__`` is no longer ``sys.modules["hermes_cli.main"].__dict__``.
``@patch("hermes_cli.main.X")`` then patches the *new* module that the
suite has since reloaded, but ``cmd_update.__globals__`` still points
at the old dict — so every internal name (``_install_hangup_protection``,
``_cmd_update_impl``, ``_stash_local_changes_if_needed``,
``_restore_stashed_changes``, ``sys``, …) resolves to the unpatched
original.  This is the same pollution
``test_update_stale_dashboard.py`` already documents and guards against
in its autouse fixture.

Fix: resolve ``cmd_update`` from the *live* ``hermes_cli.main`` inside
each test (via ``sys.modules`` / ``importlib.import_module``) so the
function we call shares ``__globals__`` with the module that
``@patch("hermes_cli.main.X")`` is patching.  Also add an autouse
fixture that guarantees the live module is imported before each test.
@0xyg3n

0xyg3n commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

The two test_update_yes_flag.py failures on the previous CI run were caused by xdist module-pollution, not by the patches I added.

test_env_loader.py and test_skills_subparser.py evict hermes_cli.main from sys.modules. When either runs first on the same xdist worker, the module-top from hermes_cli.main import cmd_update here keeps a reference to the old module object — cmd_update.__globals__ is the old __dict__, no longer sys.modules["hermes_cli.main"].__dict__. @patch("hermes_cli.main.X") then patches the new module the suite reloaded into sys.modules, but cmd_update's name lookups still resolve through the old globals, so every patch becomes a no-op and the real production code runs through. Same pollution that test_update_stale_dashboard.py already documents in its autouse fixture.

Fix in 8dd92b8: resolve cmd_update from the live hermes_cli.main inside each test (_live_cmd_update()), and an autouse fixture that re-imports if the prior test evicted it. The function we now call shares __globals__ with the module that @patch("hermes_cli.main.X") patches.

Reproduced the failure locally by running pytest tests/hermes_cli/ -n auto until a worker happened to schedule one of the polluters before the update_yes_flag tests; with the fix, 5 consecutive runs of the full tests/hermes_cli/ suite show the two tests passing every time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists type/test Test coverage or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants