Skip to content

fix(ci): stabilize shared test state after 21012#25957

Merged
ethernet8023 merged 1 commit into
NousResearch:mainfrom
stephenschoettler:fix/main-ci-unblocker-after-21012
May 15, 2026
Merged

fix(ci): stabilize shared test state after 21012#25957
ethernet8023 merged 1 commit into
NousResearch:mainfrom
stephenschoettler:fix/main-ci-unblocker-after-21012

Conversation

@stephenschoettler

Copy link
Copy Markdown
Contributor

What does this PR do?

Stabilizes the shared test failures that remained after #21012 landed on main.

The fixes are intentionally small and CI-focused:

  • preserve background-review agent messages before closing the review agent, so the self-improvement summary can still report actions
  • reset auxiliary-client unhealthy-provider cache between tests, matching the existing runtime-main reset
  • isolate hermes update tests from developer-local lazy backend activation state
  • update the provider discovery count for the newly registered provider profile
  • make compression-feasibility tests independent from local provider config and custom-provider attrs
  • rehydrate persisted context summaries even when the handoff sits in the protected head after resume

Related Issue

Related to the CI gate for the hermes-lcm context-engine merge train, including stephenschoettler/hermes-lcm#133. This does not close that issue.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • run_agent.py: capture review_agent._session_messages before closing and clearing review_agent.
  • agent/context_compressor.py: find persisted handoff summaries from the first non-system message through the compression window, not only from the computed compress-start boundary.
  • tests/conftest.py: reset agent.auxiliary_client unhealthy-provider state per test.
  • tests/hermes_cli/test_update_autostash.py: no-op active lazy-backend refresh in update-autostash tests.
  • tests/providers/test_plugin_discovery.py: update the provider registry assertion from 33 profiles to 34.
  • tests/run_agent/test_compression_feasibility.py: mock auxiliary provider resolution and include custom_providers in expected calls.
  • tests/agent/test_context_compressor_summary_continuity.py: keep a regression shape that actually compresses new turns after a persisted handoff.

How to Test

  1. Run the focused current-main failing set:
    ./scripts/run_tests.sh -n 0 tests/agent/test_auxiliary_client.py tests/agent/test_context_compressor.py tests/agent/test_context_compressor_summary_continuity.py tests/hermes_cli/test_update_autostash.py tests/run_agent/test_provider_parity.py tests/providers/test_plugin_discovery.py tests/run_agent/test_background_review.py tests/run_agent/test_compression_feasibility.py -q --tb=short
  2. Run file sanity checks:
    git diff --check && python -m compileall -q agent/context_compressor.py run_agent.py tests/conftest.py tests/agent/test_context_compressor_summary_continuity.py tests/hermes_cli/test_update_autostash.py tests/providers/test_plugin_discovery.py tests/run_agent/test_compression_feasibility.py
  3. Run lint on touched files:
    python -m ruff check agent/context_compressor.py run_agent.py tests/conftest.py tests/agent/test_context_compressor_summary_continuity.py tests/hermes_cli/test_update_autostash.py tests/providers/test_plugin_discovery.py tests/run_agent/test_compression_feasibility.py

Validation Status

  • Focused failing set passed: 378 passed in 62.78s.
  • git diff --check passed.
  • python -m compileall -q ... passed for touched files.
  • python -m ruff check ... passed for touched files.
  • Full pytest tests/ -q is not claimed here. A broader local non-integration run on this branch still showed unrelated current-main failures outside this focused unblocker scope, so this PR keeps the validation claim narrow.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Arch Linux, Python 3.11 venv via scripts/run_tests.sh

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — no platform-specific code paths added
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Focused validation output:

378 passed in 62.78s (0:01:02)
All checks passed!

@alt-glitch alt-glitch added type/test Test coverage or test infrastructure P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 14, 2026
@ethernet8023

Copy link
Copy Markdown
Collaborator

thank you!

there's two prod bugfixes in here aren't just fixes to existing CI tests, and you haven't added any new regression testing for them. i'm going to merge this given that it will unblock CI, but could you make a follow-up PR adding two tests?

one asserting handoff-in-protected-head rehydration populates _previous_summary, and
one asserting bg-review action capture works after review_agent is closed

the continuity test adds messages to an existing fixture so compression triggers at all, but it doesn't explicitly assert "summary in protected head causes _previous_summary to be populated."
a future regression on the head-search would probably slip thru.

the fix for bg-review has no assertion that _summarize_background_review_actions actually receives the messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/test Test coverage or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants