Skip to content

fix: stabilization fixes and architectural enhancements#14957

Open
aniruddhaadak80 wants to merge 7 commits into
NousResearch:mainfrom
aniruddhaadak80:stabilization-fixes
Open

fix: stabilization fixes and architectural enhancements#14957
aniruddhaadak80 wants to merge 7 commits into
NousResearch:mainfrom
aniruddhaadak80:stabilization-fixes

Conversation

@aniruddhaadak80

@aniruddhaadak80 aniruddhaadak80 commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary of Changes

Details

This PR addresses several critical stability issues and architectural enhancements to improve the Hermes Agent's reliability across platforms and its overall user experience.

Related Issues

Fixes #14712, #12176

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Architectural improvement

Copilot AI review requested due to automatic review settings April 24, 2026 07:07
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aniruddhaadak80 aniruddhaadak80 requested a review from a team June 5, 2026 07:01

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review -- PR #14957

Verdict: Request Changes -- one critical issue must be resolved before merge; several items warrant follow-up.


Critical

scratch/test_proxy.py committed to production

A bare debug/exploration script with loose top-level print() calls -- zero pytest/unittest structure -- has been committed. It tests built-in Python stdlib behavior (urllib.request.proxy_bypass_environment) that has no connection to any actual change in this PR. This file has no home in the production tree. Must be removed before merge. If the intent is to codify this behaviour, move it to tests/tools/test_proxy_bypass.py with proper pytest test functions.


Warnings

1. Dead code in test_curses_color_compat.py

The file already has a module-level pytest.skip(allow_module_level=True) guard that fires before any of the new code runs. The newly added pytestmark = pytest.mark.skipif(...) and the entire try/except ImportError: class DummyCurses block are unreachable dead code. Clean this up.

2. run_agent.py verifier methods not wired up in this diff

_record_file_mutation_result(), _file_mutation_verifier_enabled(), and _format_file_mutation_failure_footer() are added/restored to AIAgent but no call sites in run_agent.py are modified. If these existed before and were accidentally dropped, the PR should note when. If they are new, they need at least one call site or integration test.

3. approval.py: ~/.hermes/ replacement hardcoded for Windows paths

In _rewrite_resolved_hermes_home, the substitution string is always "~/.hermes/" (POSIX home-relative) even for the new Windows path branch. On native win32, this is not meaningful -- consider making the replacement Windows-aware.

4. api_server.py mutates os.environ at server startup

os.environ["IMAGE_SERVE_BASE_URL"] is set as a process-global side effect after bind. If a test suite starts multiple APIServerAdapter instances, the first one silently wins. Consider an instance attribute and explicit injection into the image tool.


Suggestions

  • hermes_cli/config.py: TINKER_API_KEY / WANDB_API_KEY are RL-training keys unrelated to stabilisation scope. Consider a dedicated RL-tools PR.
  • delegate_tool.py per-task api_key: confirm it is masked in any debug logging emitted by _build_child_agent.
  • image_gen.serve_base_url and _maybe_rewrite_image_url are well-structured and defensively guarded.

Looks Good

  • cron session_name: threads cleanly through create_job -> _run_job_impl -> cronjob() with schema exposure.
  • Skills/tools cache invalidation: clear_skills_system_prompt_cache(clear_snapshot=True) after save is the right fix for immediate toggle effect.
  • Windows test suite skips in test_search_hidden_dirs.py, test_file_operations.py, test_local_shell_init.py, test_windows_native_support.py are targeted and correct.
  • test_background_review_summary.py import path update follows the refactor correctly.

Reviewed by Hermes Agent (automated)

@austinpickett

Copy link
Copy Markdown
Collaborator

Code Review Summary

PR #14957 -- fix: stabilization fixes and architectural enhancements
Author: @aniruddhaadak80 | Priority: P2 | Type: bug
Verdict: Request Changes


Critical

scratch/test_proxy.py must not be committed to production. This is a bare debug/exploration script (top-level print() calls, no pytest structure) that exercises stdlib behavior unrelated to any diff in this PR. Remove or move to tests/tools/test_proxy_bypass.py with proper test functions.


Warnings

  1. Dead code in test_curses_color_compat.py -- the existing pytest.skip(allow_module_level=True) guard fires before the new pytestmark block or DummyCurses class are ever reached. One of the two skip mechanisms should be removed.
  2. Unconnected verifier methods in run_agent.py -- _record_file_mutation_result, _file_mutation_verifier_enabled, and _format_file_mutation_failure_footer are added/restored but no call sites are modified in this diff. The commit message says "restore AIAgent verifier helpers" -- please clarify when they were removed from main and whether there is a follow-up PR that wires them back in.
  3. approval.py Windows path branch -- _rewrite_resolved_hermes_home replaces all resolved HERMES_HOME paths with the hardcoded POSIX string ~/.hermes/, even when handling Windows backslash paths. On native win32 this is not meaningful.
  4. api_server.py os.environ mutation -- IMAGE_SERVE_BASE_URL is written to os.environ as a process-global side effect at server startup. Fine for single-instance production use, but will silently pollute test environments that start multiple server instances.

Suggestions

  • TINKER_API_KEY / WANDB_API_KEY in config.py are RL-tool keys that feel out of scope for a stabilisation PR -- consider splitting.
  • Per-task api_key in delegate_tool.py: confirm the key is masked in debug logging from _build_child_agent.

Looks Good

  • cron session_name: cleanly threaded through all four layers (create_job, _run_job_impl, cronjob(), schema). Resumable cron sessions are a solid feature.
  • Cache invalidation on skill/tool toggle: clear_skills_system_prompt_cache(clear_snapshot=True) in save_disabled_skills and _save_platform_tools is the correct fix -- toggles now take effect immediately.
  • Windows test skips: targeted @pytest.mark.skipif(sys.platform == "win32", ...) additions across five test files are all appropriate and minimal.
  • test_background_review_summary.py: import path correctly updated to agent.background_review.summarize_background_review_actions.
  • image_gen serving (_maybe_rewrite_image_url, api_server.py /images/ static route): well-structured, defensively guarded, env var + config precedence is clear.
  • delegate_tool.py per-task endpoint overrides: heuristic api_mode detection for per-task base_url is reasonable and properly documented.

Reviewed by Hermes Agent (automated)

aniruddhaadak80 added a commit to aniruddhaadak80/hermes-agent that referenced this pull request Jun 11, 2026
austinpickett
austinpickett previously approved these changes Jun 11, 2026

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three blockers from my last review are resolved:

  • scratch/test_proxy.py — absent from diff ✓
  • Dead/unconnected methods — removed ✓
  • Duplicate verifier helpers in run_agent.py — cleaned per commit message ✓

The remaining changes are genuine fixes:

  • Chinese think-tag stripping (思考/反思/推理/推敲) in both strip_think_blocks and the CLI stream filter — real bug for Chinese-locale models leaking thought blocks
  • Windows path fixes in subdirectory_hints.pyRuntimeError catch, posix=os.name!='nt' on shlex.split, and \\ path separator recognition
  • Image serve URL via ContextVar in image_generation_tool.py + api_server.py — correct thread-safe fix for remote API clients that can't access local FS
  • Cache invalidation after hermes tools skill/tool disable — happens outside conversations, correct behavior
  • --yes on headless skills install — prevents interactive prompt in web server context
  • session_name for cron jobs — small, well-contained, allows named sessions to resume history across runs
  • Per-task endpoint creds in delegate_task — useful, degrades cleanly to parent creds

One minor nit: the api_mode heuristic in delegate_tool.py checks hardcoded URL substrings (api.anthropic.com, api.kimi.com/coding, chatgpt.com/backend-api/codex) — will silently miss future endpoints, but the fallback to parent api_mode is safe. Consider a config-driven approach as a follow-up.

@austinpickett

Copy link
Copy Markdown
Collaborator

The branch has a merge conflict against current main — a quick rebase should clear it. Likely culprits from recent merges: revert(cron): remove per-job profile support (#43956) (touches cron/jobs.py and cron/scheduler.py where your session_name addition lives) and feat(agent): coding-context posture (#43316) (touches cli.py).

git fetch origin
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

Everything else is approved and looking good — just needs the rebase.

@austinpickett

Copy link
Copy Markdown
Collaborator

Re-review (2026-06-11, new commit 70a9177aa)

Thanks for the rebase — check-attribution now passes and the branch is MERGEABLE.

One new CI failure introduced by this PR:

test_cron_run_job_codex_path_handles_internal_401_refresh fails in test(6) and test(2):

TypeError: test_cron_run_job_codex_path_handles_internal_401_refresh.<locals>.<lambda>() got an unexpected keyword argument 'target_model'

Root cause: cron/scheduler.py line ~1656 now passes target_model=model in runtime_kwargs to resolve_runtime_provider, but the test's mock lambda is lambda requested=None: {...} — it doesn't accept target_model. The mock needs to be updated to accept (and ignore) the extra kwarg:

# before
lambda requested=None: { ... }

# after
lambda requested=None, target_model=None, **kwargs: { ... }

The test_web_server_skills_profiles.py failures (test_hub_install_without_profile_keeps_legacy_argv, test_hub_install_spawns_with_profile_flag, test_profiles_create_builder_fields_model_mcp_and_keep_skills) are pre-existing on origin/main before your commit — I reproduced them on clean main. They are not caused by this PR.

Please fix the mock lambda in tests/cron/test_codex_execution_paths.py and push. Everything else looks good.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (12)

tools/image_generation_tool.py:1

  • _maybe_rewrite_image_url() currently rewrites any non-URL string when a base URL is present (including relative paths like out.png or images/foo.png). This can produce incorrect URLs and hide real relative-path semantics. Fix by first confirming the value is an absolute local filesystem path (e.g., os.path.isabs(...) for POSIX, drive-letter/UNC detection for Windows, and optionally ~-prefixed paths after expanduser()), and only then rewrite; otherwise return the original string.
    tools/image_generation_tool.py:1
  • _maybe_rewrite_image_url() currently rewrites any non-URL string when a base URL is present (including relative paths like out.png or images/foo.png). This can produce incorrect URLs and hide real relative-path semantics. Fix by first confirming the value is an absolute local filesystem path (e.g., os.path.isabs(...) for POSIX, drive-letter/UNC detection for Windows, and optionally ~-prefixed paths after expanduser()), and only then rewrite; otherwise return the original string.
    tools/image_generation_tool.py:1
  • _maybe_rewrite_image_url() currently rewrites any non-URL string when a base URL is present (including relative paths like out.png or images/foo.png). This can produce incorrect URLs and hide real relative-path semantics. Fix by first confirming the value is an absolute local filesystem path (e.g., os.path.isabs(...) for POSIX, drive-letter/UNC detection for Windows, and optionally ~-prefixed paths after expanduser()), and only then rewrite; otherwise return the original string.
    tools/cronjob_tools.py:1
  • session_name is passed/stored without normalization, so an empty string can be persisted (and later treated as falsy elsewhere), creating a confusing 'set-but-not-effective' state. Consider normalizing session_name similarly to other optional string fields (strip whitespace; treat empty as None) on both create and update paths, and document how to clear the field (e.g., \"\" clears vs null leaves unchanged).
    tools/cronjob_tools.py:1
  • session_name is passed/stored without normalization, so an empty string can be persisted (and later treated as falsy elsewhere), creating a confusing 'set-but-not-effective' state. Consider normalizing session_name similarly to other optional string fields (strip whitespace; treat empty as None) on both create and update paths, and document how to clear the field (e.g., \"\" clears vs null leaves unchanged).
    tools/cronjob_tools.py:1
  • session_name is passed/stored without normalization, so an empty string can be persisted (and later treated as falsy elsewhere), creating a confusing 'set-but-not-effective' state. Consider normalizing session_name similarly to other optional string fields (strip whitespace; treat empty as None) on both create and update paths, and document how to clear the field (e.g., \"\" clears vs null leaves unchanged).
    tools/cronjob_tools.py:1
  • session_name is passed/stored without normalization, so an empty string can be persisted (and later treated as falsy elsewhere), creating a confusing 'set-but-not-effective' state. Consider normalizing session_name similarly to other optional string fields (strip whitespace; treat empty as None) on both create and update paths, and document how to clear the field (e.g., \"\" clears vs null leaves unchanged).
    tools/cronjob_tools.py:1
  • session_name is passed/stored without normalization, so an empty string can be persisted (and later treated as falsy elsewhere), creating a confusing 'set-but-not-effective' state. Consider normalizing session_name similarly to other optional string fields (strip whitespace; treat empty as None) on both create and update paths, and document how to clear the field (e.g., \"\" clears vs null leaves unchanged).
    tools/skills_tool.py:1
  • Path normalization is repeated in multiple comprehensions via .replace(\"\\\\\", \"/\"). For clarity and consistency, consider using Path(...).as_posix() (or PurePosixPath(relative_path) if you need a pure conversion) in one place, or a small helper to normalize relative paths to '/' separators.
    tools/skills_tool.py:1
  • Path normalization is repeated in multiple comprehensions via .replace(\"\\\\\", \"/\"). For clarity and consistency, consider using Path(...).as_posix() (or PurePosixPath(relative_path) if you need a pure conversion) in one place, or a small helper to normalize relative paths to '/' separators.
    tools/skills_tool.py:1
  • Path normalization is repeated in multiple comprehensions via .replace(\"\\\\\", \"/\"). For clarity and consistency, consider using Path(...).as_posix() (or PurePosixPath(relative_path) if you need a pure conversion) in one place, or a small helper to normalize relative paths to '/' separators.
    tools/skills_tool.py:1
  • Path normalization is repeated in multiple comprehensions via .replace(\"\\\\\", \"/\"). For clarity and consistency, consider using Path(...).as_posix() (or PurePosixPath(relative_path) if you need a pure conversion) in one place, or a small helper to normalize relative paths to '/' separators.

Comment on lines +168 to +175
_DUMMY_HASH_VAL: str | None = None


def _get_dummy_hash() -> str:
global _DUMMY_HASH_VAL
if _DUMMY_HASH_VAL is None:
_DUMMY_HASH_VAL = hash_password("dummy-password-for-constant-time-verify")
return _DUMMY_HASH_VAL
username.encode("utf-8"), self._username.encode("utf-8")
)
target_hash = self._password_hash if username_ok else _DUMMY_HASH
target_hash = self._password_hash if username_ok else _get_dummy_hash()
Comment on lines 101 to 108
_OPEN_THINK_TAGS = (
"<REASONING_SCRATCHPAD>", "<think>", "<reasoning>",
"<THINKING>", "<thinking>", "<thought>",
"<reasoning_scratchpad>", "<think>", "<reasoning>",
"<thinking>", "<thought>", " 思考", " 反思", " 推理", " 推敲",
)
_CLOSE_THINK_TAGS = (
"</REASONING_SCRATCHPAD>", "</think>", "</reasoning>",
"</THINKING>", "</thinking>", "</thought>",
"</reasoning_scratchpad>", "</think>", "</reasoning>",
"</thinking>", "</thought>", " 思考", " 反思", " 推理", " 推敲",
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

4 participants