test: cover #25957 production regressions (salvage of #26039) by teknium1 · Pull Request #33661 · NousResearch/hermes-agent

teknium1 · 2026-05-28T04:42:16Z

Summary

Salvages PR #26039 (@stephenschoettler, Ethernet-approved) onto current main after the agent/background_review.py extraction made the original cherry-pick conflict.

Snapshots review_agent._session_messages BEFORE shutdown_memory_provider() + close() run, so close-time cleanup can wipe per-session state without dropping the user-visible self-improvement summary. Adds two regression tests.

Changes

agent/background_review.py — move the review_messages = list(getattr(review_agent, "_session_messages", [])) snapshot above the teardown block (was after close())
tests/run_agent/test_background_review.py — new test test_background_review_summarizer_receives_captured_messages_after_close patches the module-level agent.background_review.summarize_background_review_actions (the now-canonical call site after the extraction) and verifies the summarizer receives the captured tool messages even when close() wipes _session_messages
tests/agent/test_context_compressor_summary_continuity.py — new test test_handoff_in_protected_head_populates_previous_summary_before_update for protected-head handoff rehydration

Validation

python3 -m pytest tests/run_agent/test_background_review.py tests/agent/test_context_compressor_summary_continuity.py -q → 8 passed, 0 failed
Bug-catching verified: reverted the production fix → new bg-review test fails with empty review_messages; reapplied → all pass

Attribution

Original PR #26039 commit 63eaf6055 was authored by @stephenschoettler. The cherry-pick conflicted on run_agent.py because the bg-review block was extracted to agent/background_review.py since the PR was opened (the test had to be adapted to patch the new module-level function instead of the legacy AIAgent._summarize_background_review_actions forwarder), so the salvage is a single re-attributed commit rather than a clean cherry-pick. Closes #26039.

Snapshot review_agent._session_messages before teardown so close() can clean per-session state without dropping the user-visible self-improvement summary. Adds two regressions: - bg-review summarizer receives captured review-agent tool messages after review_agent.close() runs - context-compressor protected-head handoff rehydration populates _previous_summary and keeps the old handoff out of newly summarized turns Salvaged from PR #26039 onto current main after agent/background_review.py extraction. Original commit 63eaf60; bg-review test updated to patch the module-level summarize_background_review_actions in agent.background_review instead of the now-forwarder AIAgent._summarize_background_review_actions.

github-actions · 2026-05-28T04:43:03Z

🔎 Lint report: `hermes/hermes-d85c18f1` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9520 on HEAD, 9520 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5016 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.

Two unrelated transient failures on PR NousResearch#33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.

Two unrelated transient failures on PR NousResearch#33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch. #AI commit#

Two unrelated transient failures on PR NousResearch#33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.

Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.

alt-glitch added type/test Test coverage or test infrastructure comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have labels May 28, 2026

teknium1 merged commit 4a6f186 into main May 28, 2026
31 of 33 checks passed

teknium1 deleted the hermes/hermes-d85c18f1 branch May 28, 2026 05:14

This was referenced May 28, 2026

test: cover #25957 production regressions #26039

Closed

test(ci): harden two flaky tests (cron yaml-warning, browser supervisor teardown) #33675

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: cover #25957 production regressions (salvage of #26039)#33661

test: cover #25957 production regressions (salvage of #26039)#33661
teknium1 merged 1 commit into
mainfrom
hermes/hermes-d85c18f1

teknium1 commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 28, 2026

Summary

Changes

Validation

Attribution

Uh oh!

github-actions Bot commented May 28, 2026

🔎 Lint report: hermes/hermes-d85c18f1 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔎 Lint report: `hermes/hermes-d85c18f1` vs `origin/main`