fix(tests): unblock CI on eight unrelated test failures#18352
Conversation
Eight independent test failures observed on main @ 75e1339, all caused by recent production changes drifting from frozen test contracts (or by test mocks that didn't track the production interface they exercise). Each fix is local to one test file; no production code changes. * tests/acp/test_server.py: advertised slash-command list grew to add ``steer`` and ``queue``; align the expected list with ``_ADVERTISED_COMMANDS``. * tests/gateway/test_teams.py: the Teams adapter started importing ``microsoft_teams.common.http.client.ClientOptions`` to set the User-Agent (microsoft-teams-apps 2.0.0). The fake SDK installed by ``_ensure_teams_mock()`` didn't register that submodule, so the adapter's whole try/except ImportError ran the except branch and every SDK symbol — including ``TypingActivityInput`` — became ``None``. ``send_typing`` then raised inside its own bare except, swallowing the call and leaving ``mock_app.send`` un-awaited. Add ``microsoft_teams.common.http.client`` with a ``ClientOptions`` stub so the import block stays on the success path. * tests/gateway/test_agent_cache.py: 8 threads × 20 real-AIAgent inserts (160 constructions) exceeded the 30s thread-join timeout on loaded CI runners after recent guardrail/MCP work made AIAgent init measurably heavier. Halve to 80 constructions (still ~5× over the 16-slot cap so eviction races are exercised) and bump the join timeout to 60s. * tests/hermes_cli/test_update_yes_flag.py: ``cmd_update`` wraps ``_cmd_update_impl`` in ``_install_hangup_protection``, which replaces ``sys.stdout`` / ``sys.stderr`` with ``_UpdateOutputStream`` mirrors. In the two prompt-flow tests that patch ``hermes_cli.main.sys`` to make ``isatty()`` return True, the wrapping subverts the patch under xdist concurrency: the wrapper caches the (mocked) original but replaces ``sys.stdout`` with itself, so subsequent ``sys.stdout.isatty()`` calls go through the wrapper's own ``isatty()`` whose answer depends on whether earlier writes flagged ``_original_broken``. Patch ``_install_hangup_protection`` to a no-op state and stub ``_finalize_update_output`` so isatty() answers the test mock directly. * tests/plugins/test_kanban_dashboard_plugin.py: ``_check_ws_token`` resolves ``hermes_cli.web_server`` via ``from hermes_cli import web_server as _ws``. When an earlier test on the same xdist worker has imported the real module, ``hermes_cli.web_server`` gets cached as an attribute on the ``hermes_cli`` package; ``from hermes_cli import web_server`` then resolves via that attribute and silently bypasses the test's ``sys.modules`` stub, comparing the test token against the real module's per-process random ``_SESSION_TOKEN`` and rejecting the WS connection. Patch the ``web_server`` attribute on the package in addition to the ``sys.modules`` entry so both import shapes return the stub. * tests/run_agent/test_concurrent_interrupt.py: the agent's concurrent tool-execution path now consults a per-turn ``_tool_guardrails`` controller and calls ``_append_guardrail_observation`` after each tool result. The test's hand-rolled ``_Stub`` had neither, so every test raised ``AttributeError`` before exercising interrupt fan-out. Add a permissive ``_tool_guardrails`` MagicMock that returns always-allow ``SimpleNamespace`` decisions, bind the real ``_append_guardrail_observation`` (no-op under the permissive mock), stub ``_set_tool_guardrail_halt``, and make the test's fake tool callables accept ``**kwargs`` so the new ``pre_tool_block_checked=`` argument doesn't trip them. * tests/tools/test_dockerfile_pid1_reaping.py: PR a49f4c6 ("fix: prevent tui rebuilding assets") intentionally replaced the explicit cp-then-install materialize block with the simpler ``COPY ui-tui/packages/hermes-ink/`` + ``npm_config_install_links=false`` approach (npm 10+ default symlink behaviour), but the contract tests still asserted the old command sequence verbatim. Update both tests to accept either materialization shape — the behavioural contract (``@hermes/ink`` reachable at ``node_modules/@hermes/ink`` at runtime) is preserved by both.
Two tests in this file failed flakily on CI under xdist:
test_no_yes_flag_still_prompts_in_tty and
test_yes_restores_stash_without_prompting. Captured stdout in the
failures showed the *real* production functions
(``_stash_local_changes_if_needed``, ``_restore_stashed_changes``)
running, and the migration prompt's ``Non-interactive session`` branch
firing despite the test patching ``hermes_cli.main.sys`` to make
``isatty()`` return True — every patch a no-op.
Root cause: ``test_env_loader.py`` and ``test_skills_subparser.py``
evict ``hermes_cli.main`` from ``sys.modules``. When either runs first
on an xdist worker, our module-top ``from hermes_cli.main import
cmd_update`` keeps a reference to the *old* module object whose
``__dict__`` is no longer ``sys.modules["hermes_cli.main"].__dict__``.
``@patch("hermes_cli.main.X")`` then patches the *new* module that the
suite has since reloaded, but ``cmd_update.__globals__`` still points
at the old dict — so every internal name (``_install_hangup_protection``,
``_cmd_update_impl``, ``_stash_local_changes_if_needed``,
``_restore_stashed_changes``, ``sys``, …) resolves to the unpatched
original. This is the same pollution
``test_update_stale_dashboard.py`` already documents and guards against
in its autouse fixture.
Fix: resolve ``cmd_update`` from the *live* ``hermes_cli.main`` inside
each test (via ``sys.modules`` / ``importlib.import_module``) so the
function we call shares ``__globals__`` with the module that
``@patch("hermes_cli.main.X")`` is patching. Also add an autouse
fixture that guarantees the live module is imported before each test.
|
The two
Fix in 8dd92b8: resolve Reproduced the failure locally by running |
Summary
Eight independent test-only fixes to unblock the
Testsworkflow onmain. None are functionality changes — each fix targets a test that has been failing onmain @ 75e1339dindependently of any open PR. The two run-to-run variants (update_yes_flagand the kanban WS test, alternating between back-to-back failing runs) share root causes with the others — recent production changes drifted from frozen test contracts, or test mocks no longer cover the full surface they patch.Reference run: https://github.com/NousResearch/hermes-agent/actions/runs/25210234363
Failures addressed
tests/acp/test_server.py::TestSessionOps::test_send_available_commands_update_ADVERTISED_COMMANDSgrew to addsteerandqueue; the test still asserted the 7-command list._ADVERTISED_COMMANDS.tests/gateway/test_teams.py::TestTeamsSend::test_send_typingmicrosoft_teams.common.http.client.ClientOptions(microsoft-teams-apps 2.0.0); the test's fake SDK didn't register that submodule, so the wholetryimport block fell toexcept ImportErrorandTypingActivityInputbecameNone.send_typingthen raised inside its bareexcept, swallowing the call.microsoft_teams.common.http.client.ClientOptionsin the test mock so the import block stays on the success path.tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_capAIAgentconstructions (160 inits) exceeded the 30 s thread-join after recent guardrail/MCP work madeAIAgent.__init__measurably heavier.tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_ttytests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_promptingcmd_updatewraps_cmd_update_implin_install_hangup_protection, which replacessys.stdout/sys.stderrwith_UpdateOutputStreammirrors. Under xdist the wrapper's ownisatty()answer became race-dependent on prior writes flagging_original_broken, subverting the test'spatch("hermes_cli.main.sys")._install_hangup_protectionto a no-op state and stub_finalize_update_outputsoisatty()answers the test mock directly.tests/plugins/test_kanban_dashboard_plugin.py::test_ws_events_rejects_when_token_required_check_ws_tokenresolveshermes_cli.web_serverviafrom hermes_cli import web_server as _ws. When an earlier test on the same xdist worker imported the real module,web_serverwas cached as an attribute on thehermes_clipackage; subsequentfrom hermes_cli import web_serverresolved via that attribute and silently bypassed the test'ssys.modulesstub, comparing the test token against the real module's per-process random_SESSION_TOKEN.web_serverattribute on the package in addition to thesys.modulesentry so both import shapes return the stub.tests/run_agent/test_concurrent_interrupt.py::test_concurrent_interrupt_cancels_pendingtests/run_agent/test_concurrent_interrupt.py::test_running_concurrent_worker_sees_is_interruptedself._tool_guardrails.before_call(...)and callsself._append_guardrail_observation(...)after each result; the hand-rolled_Stubhad neither, raisingAttributeErrorbefore exercising interrupt fan-out._tool_guardrailsMagicMock that returns always-allowSimpleNamespacedecisions, bind the real_append_guardrail_observation(no-op under the permissive mock), stub_set_tool_guardrail_halt, and accept**kwargsin the fake tool callables so the newpre_tool_block_checked=argument doesn't trip them.tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependenciestests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_packageCOPY ui-tui/packages/hermes-ink/+npm_config_install_links=falseapproach (npm 10+ default symlink behaviour), but the contract tests still asserted the old command sequence verbatim.@hermes/inkreachable atnode_modules/@hermes/inkat runtime) is preserved by both paths.Verification
Targeted run, twice (no flakes):
Broader regression — all touched test files:
No production code is touched; the diff is test-files-only.