refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction) by teknium1 · Pull Request #27248 · NousResearch/hermes-agent

teknium1 · 2026-05-17T03:36:18Z

Summary

run_agent.py from 16,083 → 3,821 lines (-12,262, 76% reduction), redistributed across 14 cohesive agent/*.py modules. Behavior unchanged: every extraction keeps a thin forwarder method on AIAgent so call sites and test patches (patch("run_agent.OpenAI", ...), patch("run_agent.handle_function_call", ...), etc.) keep working.

Why

run_agent.py had grown to 16k lines. run_conversation alone was 3,877 lines. __init__ 1,381. The file was painful to read, painful to grep, and painful to test-locate. AGENTS.md called it out as one of five "do not delegate edits on this file" load-bearing files for exactly that reason.

This refactor leaves AIAgent as the orchestrator — thin forwarder methods that delegate to focused modules in agent/.

Changes

15 atomic commits, each individually green against tests/run_agent/ + tests/agent/:

Module	LOC moved	Contents
`agent/message_sanitization.py`	395	surrogate scrub, tool-arg JSON repair, ascii fallback, image-strip
`agent/tool_dispatch_helpers.py`	255	parallel-batch gating, multimodal envelopes, destructive-command detection
`agent/process_bootstrap.py` + `agent/iteration_budget.py`	166	lazy OpenAI proxy, broken-pipe-safe stdio, thread-safe iteration counter
`agent/background_review.py`	486	self-improvement loop (memory/skill review fork) + the three prompt strings
`agent/conversation_compression.py`	478	feasibility probe, compress+rotate-session, image-too-large recovery
`agent/system_prompt.py`	263	three-tier stable/context/volatile builder
`agent/tool_executor.py`	848	sequential + concurrent tool dispatch
`agent/stream_diag.py`	243	per-attempt diagnostics, exception-chain flatten, retry/drop logging
`agent/chat_completion_helpers.py`	1,921	7 methods incl. streaming + non-streaming API callers + build_api_kwargs
`agent/codex_runtime.py`	351	Codex App Server + Responses-API stream paths
`agent/agent_runtime_helpers.py`	2,317	22 mid-sized AIAgent helpers (switch_model, _invoke_tool, repair, sanitize, etc.)
`agent/conversation_loop.py`	3,856	`run_conversation` — the agent loop
`agent/agent_init.py`	1,380	`__init__`

Architectural pattern

Every extraction follows the same shape, by design:

Pull the method body into a module-level function name(agent, ...) that takes the parent AIAgent as the first arg.
self.X becomes agent.X via word-boundary regex (~163 occurrences in run_conversation alone).
AIAgent keeps a thin forwarder method preserving the original signature.
Symbols tests patch on run_agent.X (_set_interrupt, handle_function_call, OpenAI, cleanup_vm, logger, AIAgent.X class attrs) resolve through a _ra() lazy reference inside each extracted module so the patch contract is preserved.
Source-introspection guards (inspect.getsource(AIAgent.run_conversation), etc.) updated to point at the new module location.

The _ra() indirection is doing real work — without it, dozens of test patches would silently miss the extracted code.

Validation

	Before	After
`run_agent.py` lines	16,083	3,821
`tests/run_agent/` + `tests/agent/` passing	4,313	4,313
`tests/run_agent/` + `tests/agent/` failing	1 (pre-existing)	1 (same pre-existing)
Skipped	3	3

The one persistent failure (tests/agent/test_auxiliary_client.py::TestGetTextAuxiliaryClient::test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api) was confirmed failing on main before any commit in this branch — it's unrelated to this refactor.

Live E2E

End-to-end-verified from the worktree against real providers:

openai/gpt-5.4 via OpenRouter (chat-completions, streaming) — text response, terminal tool, read_file, write_file, web_search, multi-tool chain, session resume
anthropic/claude-sonnet-4.6 via OpenRouter (native Anthropic format path)
moonshotai/kimi-k2-thinking via OpenRouter (reasoning-content path)

Plus AIAgent direct-instantiation smoke (__init__, forwarder methods all present, IterationBudget consume/refund, sanitization roundtrips).

Test-file updates

Six tests were updated alongside the refactor — all are structural guards that scan the agent loop source for known patterns, and the patterns moved:

tests/run_agent/test_run_agent.py — 5 inspect.getsource(AIAgent.run_conversation) rerouted to agent.conversation_loop.run_conversation; TestAnthropicInterruptHandler rerouted to the extracted streaming/non-streaming callers; TestMemoryNudgeCounterPersistence / TestMemoryProviderTurnStart accept either self.X or agent.X
tests/run_agent/test_memory_nudge_counter_hydration.py — scan both run_agent.py and agent/conversation_loop.py
tests/run_agent/test_jsondecodeerror_retryable.py — scan both
tests/run_agent/test_tool_executor_contextvar_propagation.py — AST guard scans both run_agent.py and agent/tool_executor.py

Risk

Behaviour-preserving, but it's the largest single file edit in the project's history. Targeted suites pass; live E2E confirmed on three model paths and the streaming + tool-execution paths. The full tests/ suite was not run end-to-end (only tests/run_agent/ + tests/agent/); CI is the source of truth for cross-file regressions.
The _ra() lazy-reference pattern is the load-bearing piece that keeps test patches working. If a future test patches a new run_agent.X name, the extracted module's reference to that name may need routing.
15 commits are atomic and individually mergeable in principle; rebase-merge preserves the history.

How to review

Easiest path: walk the commits in order. Each one tells you what was extracted and which tests it ran. The pattern is identical across all 15 — once you've reviewed the first two (sanitization, tool-dispatch helpers), the rest are mechanical applications of the same template.

…nitization.py Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates, \_sanitize_structure_surrogates, \_sanitize_messages_surrogates, \_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments, \_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii, \_strip_images_from_messages, \_sanitize_structure_non_ascii) and the \_SURROGATE_RE constant out of run_agent.py into a new module. These are stateless byte-walking helpers with no AIAgent dependency. Backward compatibility: run_agent re-exports every name via a single import block, so existing 'from run_agent import _sanitize_surrogates' imports in tests and cli.py keep working unchanged. Same pattern the file already uses for _summarize_user_message_for_log (codex_responses_adapter). run_agent.py: 16077 -> 15682 lines (-395).

…atch_helpers.py Pull module-level helpers used by the tool-execution path out of run_agent.py: * parallelism gating — _NEVER_PARALLEL_TOOLS, _PARALLEL_SAFE_TOOLS, _PATH_SCOPED_TOOLS, _DESTRUCTIVE_PATTERNS, _REDIRECT_OVERWRITE, _is_destructive_command, _should_parallelize_tool_batch, _extract_parallel_scope_path, _paths_overlap * multimodal envelopes — _is_multimodal_tool_result, _multimodal_text_summary, _append_subdir_hint_to_multimodal * file-mutation verifier inputs — _extract_file_mutation_targets, _extract_error_preview * trajectory normalization — _trajectory_normalize_msg All pure functions. run_agent re-exports every name so existing 'from run_agent import _is_multimodal_tool_result' callers in tests/tools/, tests/run_agent/, and tools/file_state.py keep working. tests/run_agent/: 1341 passed, 3 skipped. run_agent.py: 15682 -> 15427 lines (-255).

Three small extractions into focused modules: * agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import), \_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio, \_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap. * agent/iteration_budget.py — IterationBudget class (thread-safe consume/ refund counter shared by parent agent and subagents). run_agent re-exports every name so existing test patches like patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget' keep working unchanged. Verified the patch-rebinding contract for OpenAI explicitly. tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py: 1347 passed, 3 skipped. run_agent.py: 15427 -> 15261 lines (-166).

@staticmethod

…background_review.py Move the background-review subsystem (the self-improvement loop — see the README) out of run_agent.py into a dedicated module. * summarize_background_review_actions — was the @staticmethod that builds the user-facing action summary * spawn_background_review_thread — builds the thread target + prompt; the actual review loop body (forked AIAgent, runtime inheritance, tool whitelist, suppression, teardown) lives in _run_review_in_thread * build_memory_write_metadata — provenance for external memory mirrors AIAgent keeps thin wrappers for backward compatibility AND because tests patch run_agent.threading.Thread to assert lifecycle behavior — the threading.Thread construction stays in AIAgent._spawn_background_review, the inner work moves out. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client.py::test_custom_endpoint... — confirmed failing on main before this change). 3 skipped. run_agent.py: 15272 -> 14972 lines (-300).

…n_compression.py Move four compression-related methods to a dedicated module: * check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor * replay_compression_warning — re-emit stored warning through gateway status_callback * compress_context — run compressor, split SQLite session, notify plugins+memory * try_shrink_image_parts_in_messages — image-too-large recovery via re-encode AIAgent keeps thin forwarder methods so existing call sites and tests that patch run_agent.AIAgent methods keep working. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 15013 -> 14535 lines (-478).

…ompt.py Four AIAgent methods move into a dedicated module: * build_system_prompt_parts — three-tier stable/context/volatile dict * build_system_prompt — joiner used at session start * invalidate_system_prompt — drop cache + reload memory * format_tools_for_system_message — trajectory-format tool dump The extracted helpers look up patch-target names (load_soul_md, build_skills_system_prompt, get_toolset_for_tool, build_environment_hints, build_context_files_prompt, build_nous_subscription_prompt) through the run_agent module via _ra() instead of importing them directly. That preserves the patch surface tests rely on (patch('run_agent.load_soul_md', ...) and friends). AIAgent keeps thin forwarder methods. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14555 -> 14292 lines (-263).

Move the two big tool-dispatch methods out of run_agent.py: * execute_tool_calls_concurrent — 408-line concurrent path (interrupt pre-flight, guardrail+plugin block, callback fan-out, ContextVar- preserving ThreadPoolExecutor, periodic heartbeats for the gateway inactivity monitor, per-tool result handling with subdir hints + guardrail observations + checkpoint, /steer drain) * execute_tool_calls_sequential — 441-line sequential path (the original behavior used for single-tool batches and interactive tools) Both take the parent AIAgent as their first argument; AIAgent keeps thin forwarders so call sites unchanged. handle_function_call is routed through _ra() so tests that patch run_agent.handle_function_call keep working. _set_interrupt likewise. The AST guard in test_tool_executor_contextvar_propagation.py is updated to scan both run_agent.py AND agent/tool_executor.py so it still catches the executor.submit(_run_tool, ...) regression regardless of which file the body lives in. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14309 -> 13461 lines (-848).

Move the five stream-drop diagnostic helpers + the headers tuple: * STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc. * stream_diag_init — fresh per-attempt diagnostic dict * stream_diag_capture_response — snapshot upstream headers + HTTP status * flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering * log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb * emit_stream_drop — user-facing status line + activity touch AIAgent keeps thin forwarder methods (and exposes the headers tuple as _STREAM_DIAG_HEADERS for back-compat). All test patches and call sites unchanged. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13470 -> 13227 lines (-243).

…mpletion_helpers.py Six methods move into a new module — bodies live there, AIAgent keeps thin forwarder methods so call sites and tests are unchanged. * interruptible_api_call — non-streaming API call with interrupt handling * build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs * build_assistant_message — normalize assistant message dict (reasoning, tool_calls, codex passthrough fields, alibaba glm-4.7 quirk) * try_activate_fallback — provider fallback chain activation * handle_max_iterations — controlled stop when iteration budget exhausts * cleanup_task_resources — per-turn VM + browser teardown (skipped for persistent environments) Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed through _ra() so the patch surface is preserved. Two TestAnthropicInterruptHandler source-introspection tests were updated to scan agent.chat_completion_helpers.interruptible_api_call instead of AIAgent._interruptible_api_call — the body lives in the extracted module now. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13282 -> 12253 lines (-1029).

…chat_completion_helpers.py Move _interruptible_streaming_api_call out of run_agent.py — the biggest single method in the file. Body lives next to interruptible_api_call in agent/chat_completion_helpers.py so streaming + non-streaming code share one home. Nested closures (_call_chat_completions, _call_anthropic, the codex stream branch) all come along with the body and still capture the parent function's locals as expected. AIAgent keeps a thin forwarder method. is_local_endpoint added to the import block (used by the stream stale-timeout disable logic). One source-introspection test in TestAnthropicInterruptHandler is updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call instead of AIAgent._interruptible_streaming_api_call. tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 12277 -> 11385 lines (-892).

@staticmethod

…cated modules Two new modules: * agent/codex_runtime.py — three Codex API-mode methods - run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver - run_codex_stream (125 LOC) — Codex Responses API stream - run_codex_create_stream_fallback (78 LOC) — fallback after Responses stream=true initial create failure * agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments (static), repair_message_sequence, strip_think_blocks, recover_with_credential_pool, try_recover_primary_transport, drop_thinking_only_and_merge_users (static), restore_primary_runtime, extract_reasoning, dump_api_request_debug, anthropic_prompt_cache_policy, create_openai_client AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class attrs) are routed through _ra() to honor the patch contract. The _TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport and is referenced as a module-level constant in the extracted code. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 11391 -> 9887 lines (-1504).

The three big review-prompt strings (_MEMORY_REVIEW_PROMPT, _SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move out of the AIAgent class body and into agent/background_review.py where they're consumed. AIAgent re-exposes them as class attributes via 'from ... import' inside the class body — Python binds those names into the class namespace so existing AIAgent._MEMORY_REVIEW_PROMPT references keep working. spawn_background_review_thread also falls back to the module-level constants if an agent doesn't have the attribute (preserves the test pattern of mocking these on the agent). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 9986 -> 9800 lines (-186).

…oop.py The 3,877-line run_conversation body — the agent loop itself — moves out of run_agent.py into a dedicated module. AIAgent.run_conversation is now a thin forwarder that delegates to agent.conversation_loop.run_conversation with the AIAgent instance as the first argument. This is the largest single extraction in the run_agent.py refactor. The body keeps all 163 self.X references intact (rewritten as agent.X), all nested closures, all retry/backoff/compression machinery. Symbols that tests or callers patch on run_agent (_set_interrupt, handle_function_call, AIAgent class attrs) are resolved through _ra() inside the extracted module so the patch surface is preserved. Five tests doing inspect.getsource(AIAgent.run_conversation) updated to scan agent.conversation_loop.run_conversation. Two source-introspection tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart) updated to accept either self.X (legacy) or agent.X (extracted form) in the matched assertions. Live E2E verified on three model paths: * openai/gpt-5.4 (OpenAI chat completions via OpenRouter) * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter) * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path) Plus read_file tool execution, terminal tool, web_search. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client::test_custom_endpoint... — same as on main). run_agent.py: 9800 -> 5944 lines (-3856). Total reduction since baseline: 16083 -> 5944 (-10139, 63%).

The largest method left on AIAgent (60+ parameters, the entire startup sequence — credential resolution, provider auto-detection, context engine bootstrap, memory store hydration, plugin lifecycle hooks) moves into agent/agent_init.py. AIAgent.__init__ is now a thin wrapper that calls agent.agent_init.init_agent(self, ...) with the original full parameter list preserved. Module-level run_agent names referenced in the body (_openrouter_prewarm_done, _qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI, get_tool_definitions, check_toolset_requirements) are resolved through _ra() so test patches on those names keep working. agent_init's logger warnings are routed via _ra().logger so tests patching run_agent.logger capture them (TestStringKSuffixContextLengthWarns, TestCustomProvidersInvalidContextLengthWarns). Live E2E reconfirmed on three model paths (openai/gpt-5.4, anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 5944 -> 4564 lines (-1380). Total reduction since baseline: 16083 -> 4564 (-11519, 72%).

…elpers.py Final extraction pass — the methods left over after run_conversation and __init__ moved out. Together these 10 cover ~813 LOC of medium- sized helpers: * switch_model (194 LOC) — model switching mid-session * _invoke_tool (87) — central tool dispatch with overrides * _repair_tool_call (72) — argument JSON repair entrypoint * _sanitize_api_messages (71) — role-filter for API send * _looks_like_codex_intermediate_ack (72) — codex transcript heuristic * _copy_reasoning_content_for_api (70) — reasoning preservation * _cleanup_dead_connections (70) — periodic dead-socket sweep * _extract_api_error_context (65) — error-dump context builder * _apply_pending_steer_to_tool_results (63) — /steer injection * _force_close_tcp_sockets (59) — aggressive socket cleanup AIAgent keeps thin forwarder methods for all 10 (staticmethods preserved where present). Names tests patch on run_agent (handle_function_call, AIAgent class attrs, logger) routed through _ra() so the patch surface is preserved. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as on main). run_agent.py: 4634 -> 3821 lines (-813). Final total: 16083 -> 3821 (-12262, 76% reduction).

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot

Pull request overview

This PR is a large-scale, mechanical refactor that splits run_agent.py (16,083 lines) into run_agent.py (3,821 lines) plus 14 new agent/*.py modules. The extraction pattern is uniform: method bodies become module-level functions name(agent, ...) that take the parent AIAgent as the first arg, while AIAgent retains thin forwarder methods so call sites and test patches (patch("run_agent.X", ...)) keep working through a _ra() lazy back-reference. The included test changes update source-introspection guards to point at the new module locations and accept both self.X and agent.X spellings.

Changes:

Extract 14 cohesive modules under agent/ covering sanitization, tool dispatch/execution, system prompt assembly, conversation compression, background review, Codex runtime, conversation loop, init, and helpers.
Preserve test-patch contract by routing test-patched names (OpenAI, _set_interrupt, handle_function_call, etc.) through a _ra() lazy reference to run_agent.
Update source-introspection-based tests to scan the new module locations alongside run_agent.py.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
agent/agent_init.py	Houses extracted `AIAgent.__init__` body (~1.4k lines of provider auto-detect, credential resolution, context-engine bootstrap).
agent/background_review.py	Self-improvement-loop fork + three review-prompt strings.
agent/codex_runtime.py	Codex App Server and Responses-API streaming paths.
agent/conversation_compression.py	Compression feasibility probe, compress-and-rotate-session, image-too-large recovery.
agent/iteration_budget.py	Thread-safe iteration-budget counter.
agent/message_sanitization.py	Surrogate scrub, tool-arg JSON repair, ASCII fallback, image strip.
agent/process_bootstrap.py	Lazy OpenAI proxy, crash-resistant stdio, proxy env resolution.
agent/stream_diag.py	Per-attempt stream diagnostics + retry/drop logging.
agent/system_prompt.py	Three-tier (stable/context/volatile) system-prompt builder.
agent/tool_dispatch_helpers.py	Parallel-batch gating, multimodal envelopes, destructive-command detection, mutation tracking.
agent/tool_executor.py	Sequential + concurrent tool dispatch.
tests/run_agent/test_run_agent.py	Reroutes `inspect.getsource(...)` assertions to extracted modules; accepts `agent.X` spellings.
tests/run_agent/test_memory_nudge_counter_hydration.py	Scans both `run_agent.py` and `agent/conversation_loop.py` for hydration markers.
tests/run_agent/test_jsondecodeerror_retryable.py	Scans concatenated source from `run_agent` + `agent.conversation_loop`.
tests/run_agent/test_tool_executor_contextvar_propagation.py	AST-walks both modules for `executor.submit(...)` matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                        _still_running = [
+                            parsed_calls[futures.index(f)][1]
+                            for f in not_done
+                            if f in futures
+                        ]
+                        agent._touch_activity(
+                            f"concurrent tools running ({_conc_elapsed}s, "
+                            f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
+                        )


+    if not skip_memory:
+        try:
+            _mem_provider_name = mem_config.get("provider", "") if mem_config else ""


+                    _conc_elapsed = int(time.time() - _conc_start)
+                    # Heartbeat every ~30s (6 × 5s poll intervals)
+                    if _conc_elapsed > 0 and _conc_elapsed % 30 < 6:
+                        _still_running = [
+                            parsed_calls[futures.index(f)][1]
+                            for f in not_done
+                            if f in futures
+                        ]
+                        agent._touch_activity(
+                            f"concurrent tools running ({_conc_elapsed}s, "
+                            f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
+                        )


+import contextvars
+import json
+import logging
+import os
+import threading
+import time
+import uuid


+
+
+
+def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):


+        if isinstance(function_result, str):
+            result_preview = function_result if agent.verbose_logging else (
+                function_result[:200] if len(function_result) > 200 else function_result
+            )
+            _result_len = len(function_result)
+        else:
+            # Multimodal dict result (_multimodal=True) — not sliceable as string
+            result_preview = function_result
+            _result_len = len(str(function_result))
+
+        # Log tool errors to the persistent error log so [error] tags
+        # in the UI always have a corresponding detailed entry on disk.
+        _is_error_result, _ = _detect_tool_failure(function_name, function_result)
+        if not _execution_blocked:
+            function_result = agent._append_guardrail_observation(
+                function_name,
+                function_args,
+                function_result,
+                failed=_is_error_result,
+            )
+            result_preview = function_result if agent.verbose_logging else (
+                function_result[:200] if len(function_result) > 200 else function_result
+            )
+        if _is_error_result:
+            logger.warning("Tool %s returned error (%.2fs): %s", function_name, tool_duration, result_preview)
+        else:
+            logger.info("tool %s completed (%.2fs, %d chars)", function_name, tool_duration, _result_len)


+    """Return True when two paths may refer to the same subtree."""
+    left_parts = left.parts
+    right_parts = right.parts
+    if not left_parts or not right_parts:
+        # Empty paths shouldn't reach here (guarded upstream), but be safe.
+        return bool(left_parts) == bool(right_parts) and bool(left_parts)
+    common_len = min(len(left_parts), len(right_parts))


+    repo = Path(__file__).resolve().parents[2]
+    src_ra = (repo / "run_agent.py").read_text(encoding="utf-8")
+    src_cl = (repo / "agent" / "conversation_loop.py").read_text(encoding="utf-8")
+    content = src_ra + src_cl
    # Anchor on the unique comment + the modulo line.
    assert "Hydrate per-session nudge counters from persisted history" in content
-    assert "self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content
+    # The line uses ``self.`` in run_agent.py form and ``agent.`` in the
+    # extracted module, accept either.
+    assert (
+        "self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content
+        or "agent._turns_since_memory = prior_user_turns % agent._memory_nudge_interval" in content
+    )


+        # After the run_agent.py refactor the body uses ``agent.X`` instead
+        # of ``self.X``, so accept either form.
+        preamble_end = src.index("iteration_budget = IterationBudget")
        preamble = src[:preamble_end]
        assert "self._turns_since_memory = 0" not in preamble
        assert "self._iters_since_skill = 0" not in preamble
+        assert "agent._turns_since_memory = 0" not in preamble
+        assert "agent._iters_since_skill = 0" not in preamble


alt-glitch · 2026-05-17T03:44:16Z

Supersedes #24335 (and the earlier #23978 → #24334 chain). Same goal — decompose run_agent.py — but with module extraction instead of mixins and broader scope (76% vs 63.9% reduction).

alt-glitch · 2026-05-17T03:51:25Z

@daimon-nous this is a big PR. very important to review. use /hermes-agent-dev /hermes-pr-reproduction /adversarial-code-review skills. also track how this refactor differs AND how it may potentially degrade from this tracking issue suggesting refactor/s: #14182.

Copilot

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 6 comments.

+                    _conc_elapsed = int(time.time() - _conc_start)
+                    # Heartbeat every ~30s (6 × 5s poll intervals)
+                    if _conc_elapsed > 0 and _conc_elapsed % 30 < 6:
+                        _still_running = [
+                            parsed_calls[futures.index(f)][1]
+                            for f in not_done
+                            if f in futures
+                        ]
+                        agent._touch_activity(
+                            f"concurrent tools running ({_conc_elapsed}s, "
+                            f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
+                        )


+        if spinner:
+            # Build a summary message for the spinner stop
+            completed = sum(1 for r in results if r is not None)
+            total_dur = sum(r[3] for r in results if r is not None)
+            spinner.stop(f"⚡ {completed}/{num_tools} tools completed in {total_dur:.1f}s total")


+class _OpenAIProxy:
+    """Module-level proxy that looks like ``openai.OpenAI`` but imports lazily."""
+
+    __slots__ = ()
+
+    def __call__(self, *args, **kwargs):
+        return _load_openai_cls()(*args, **kwargs)
+
+    def __instancecheck__(self, obj):
+        return isinstance(obj, _load_openai_cls())
+
+    def __repr__(self):
+        return "<lazy openai.OpenAI proxy>"
+
+


+import contextvars
+import json
+import logging
+import os
+import threading
+import time
+import uuid
+from types import SimpleNamespace
+from typing import Any, Dict, List, Optional


+                response_preview = _preview_str[:agent.log_prefix_chars] + "..." if len(_preview_str) > agent.log_prefix_chars else _preview_str
+                print(f"  ✅ Tool {i+1} completed in {tool_duration:.2f}s - {response_preview}")
+
+        agent._current_tool = None


+            polluting trajectories with user-specific persona or project instructions.
+        load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary
+            identity even when skip_context_files=True. Project context files from the cwd
+            remain skipped.


daimon-nous · 2026-05-17T04:37:02Z

Review — @daimon-nous

Audit scope: Full symbol re-export verification (50 externally-imported names), _ra() call-site tracing (62 sites across 7 modules), __init__ parameter parity check (64/64 match), self→agent rename completeness (0 stray self. in extracted code), targeted test run (342/342 pass).

Mechanically sound. The extraction is clean — re-exports are complete, the _ra() indirection correctly preserves all 285+ test patches against run_agent.*, and the 15 atomic commits are individually reviewable. The self → agent conversion was thorough (747 occurrences in conversation_loop.py alone, zero misses).

Two concrete improvements for before merge:

__init__ positional forwarding is fragile. The forwarder passes all 64 params positionally — if anyone adds param fix(security): prevent shell injection in sudo password piping #65 on main, they must update THREE places in lockstep (signature, init_agent signature, forwarding call) or get silent wrong-value bugs. Should be keyword-forwarded:
```
init_agent(self, base_url=base_url, api_key=api_key, provider=provider, ...)
```
Dead _ra() in agent/codex_runtime.py (line 32) — defined, never called. Minor cleanup.

Architectural note re #14182: The _ra() pattern and 51 forwarder methods are tech debt the eventual package restructure would need to unwind. Whether this is acceptable depends on #14182's timeline — if near-term, this adds friction; if distant, the readability win (16k→3.8k) justifies it.

rujbodyslam-art

Ruj

…ad code, tighten guards Four fixes from PR NousResearch#27248 review: 1. **__init__ forwarder is now keyword-forwarded** (daimon-nous review). Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64 params positionally to agent.agent_init.init_agent, so adding a 65th param on main would require three lockstep edits (signature, init_agent signature, forwarder call) or silently shift every value. Keyword forwarding makes this trivially safe — adding a param now only needs the two signatures and one extra keyword line. 2. **Drop dead _ra() in agent/codex_runtime.py** (daimon-nous + Copilot). The lazy run_agent reference was defined but never called inside this module — the codex paths use agent.* accessors only. 3. **Drop unused imports in agent/codex_runtime.py** (Copilot): contextvars, threading, time, uuid, Optional. Carried over from run_agent.py during the original extraction. 4. **Tighten three source-introspection test guards** (Copilot): - test_memory_nudge_counter_hydration.py — was scanning the concatenated source of run_agent.py + agent/conversation_loop.py and matching self.X or agent.X form. Now asserts the hydration block lives in agent/conversation_loop.py specifically with the agent.X form — the body never moves back, so if it ever drifts a future re-introduction fails the guard. - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on agent.iteration_budget = IterationBudget exactly (was just iteration_budget = IterationBudget) so an unrelated identifier ending in iteration_budget can't match. - test_run_agent.py::TestMemoryProviderTurnStart — assert the agent._user_turn_count form directly (the extracted body uses agent.X, not self.X — accepting either was a transitional fudge). - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py only, not the concatenation. Not addressed in this commit: * Pre-existing bugs in agent/tool_executor.py (heartbeat index mismatch when calls are blocked, _current_tool clobber in result loop, blocked-counted-as-completed in spinner summary, dead result_preview computation). These were preserved byte-for-byte from the original _execute_tool_calls_concurrent — worth a separate follow-up PR with proper tests. * _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged by any of the original test patches (nothing actually does isinstance(x, OpenAI) against the proxy instance). * agent_init.py:949 mem_config potential NameError — pre-existing; only triggers if _agent_cfg.get('memory', {}) itself raises, which it can't with a stock dict. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing test_auxiliary_client failure (unchanged). run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded init call's verbosity). Final: 16083 -> 3937 (-12146, 75% reduction).

…-27dc9cc2 refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)

teknium1 added 15 commits May 16, 2026 17:41

OutThisLife requested a review from Copilot May 17, 2026 03:40

Copilot started reviewing on behalf of OutThisLife May 17, 2026 03:40 View session

github-advanced-security AI found potential problems May 17, 2026

View reviewed changes

Copilot AI reviewed May 17, 2026

View reviewed changes

alt-glitch added type/refactor Code restructuring, no behavior change P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder labels May 17, 2026

NousResearch deleted a comment from cardtest15-coder May 17, 2026

OutThisLife requested a review from Copilot May 17, 2026 04:27

Copilot started reviewing on behalf of OutThisLife May 17, 2026 04:27 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

kshitijk4poor mentioned this pull request May 18, 2026

tracker: ruff complexity reduction (C, PLR) — PR-to-lint mapping #23972

Open

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

rujbodyslam-art reviewed May 29, 2026

View reviewed changes

gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026

Merge pull request NousResearch#27248 from NousResearch/hermes/hermes…

cbbe459

…-27dc9cc2 refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)

reefmind mentioned this pull request Jun 4, 2026

tracking: hermes_agent package restructure #14182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)#27248

refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)#27248
teknium1 merged 37 commits into
mainfrom
hermes/hermes-27dc9cc2

teknium1 commented May 17, 2026

Uh oh!

github-advanced-security AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

daimon-nous Bot commented May 17, 2026

Uh oh!

rujbodyslam-art left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants




		def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):

Conversation

teknium1 commented May 17, 2026

Summary

Why

Changes

Architectural pattern

Validation

Live E2E

Test-file updates

Risk

How to review

Uh oh!

github-advanced-security AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

daimon-nous Bot commented May 17, 2026

Review — @daimon-nous

Uh oh!

rujbodyslam-art left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants