Skip to content

refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)#27248

Merged
teknium1 merged 37 commits into
mainfrom
hermes/hermes-27dc9cc2
May 17, 2026
Merged

refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)#27248
teknium1 merged 37 commits into
mainfrom
hermes/hermes-27dc9cc2

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

run_agent.py from 16,083 → 3,821 lines (-12,262, 76% reduction), redistributed across 14 cohesive agent/*.py modules. Behavior unchanged: every extraction keeps a thin forwarder method on AIAgent so call sites and test patches (patch("run_agent.OpenAI", ...), patch("run_agent.handle_function_call", ...), etc.) keep working.

Why

run_agent.py had grown to 16k lines. run_conversation alone was 3,877 lines. __init__ 1,381. The file was painful to read, painful to grep, and painful to test-locate. AGENTS.md called it out as one of five "do not delegate edits on this file" load-bearing files for exactly that reason.

This refactor leaves AIAgent as the orchestrator — thin forwarder methods that delegate to focused modules in agent/.

Changes

15 atomic commits, each individually green against tests/run_agent/ + tests/agent/:

Module LOC moved Contents
agent/message_sanitization.py 395 surrogate scrub, tool-arg JSON repair, ascii fallback, image-strip
agent/tool_dispatch_helpers.py 255 parallel-batch gating, multimodal envelopes, destructive-command detection
agent/process_bootstrap.py + agent/iteration_budget.py 166 lazy OpenAI proxy, broken-pipe-safe stdio, thread-safe iteration counter
agent/background_review.py 486 self-improvement loop (memory/skill review fork) + the three prompt strings
agent/conversation_compression.py 478 feasibility probe, compress+rotate-session, image-too-large recovery
agent/system_prompt.py 263 three-tier stable/context/volatile builder
agent/tool_executor.py 848 sequential + concurrent tool dispatch
agent/stream_diag.py 243 per-attempt diagnostics, exception-chain flatten, retry/drop logging
agent/chat_completion_helpers.py 1,921 7 methods incl. streaming + non-streaming API callers + build_api_kwargs
agent/codex_runtime.py 351 Codex App Server + Responses-API stream paths
agent/agent_runtime_helpers.py 2,317 22 mid-sized AIAgent helpers (switch_model, _invoke_tool, repair, sanitize, etc.)
agent/conversation_loop.py 3,856 run_conversation — the agent loop
agent/agent_init.py 1,380 __init__

Architectural pattern

Every extraction follows the same shape, by design:

  1. Pull the method body into a module-level function name(agent, ...) that takes the parent AIAgent as the first arg.
  2. self.X becomes agent.X via word-boundary regex (~163 occurrences in run_conversation alone).
  3. AIAgent keeps a thin forwarder method preserving the original signature.
  4. Symbols tests patch on run_agent.X (_set_interrupt, handle_function_call, OpenAI, cleanup_vm, logger, AIAgent.X class attrs) resolve through a _ra() lazy reference inside each extracted module so the patch contract is preserved.
  5. Source-introspection guards (inspect.getsource(AIAgent.run_conversation), etc.) updated to point at the new module location.

The _ra() indirection is doing real work — without it, dozens of test patches would silently miss the extracted code.

Validation

Before After
run_agent.py lines 16,083 3,821
tests/run_agent/ + tests/agent/ passing 4,313 4,313
tests/run_agent/ + tests/agent/ failing 1 (pre-existing) 1 (same pre-existing)
Skipped 3 3

The one persistent failure (tests/agent/test_auxiliary_client.py::TestGetTextAuxiliaryClient::test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api) was confirmed failing on main before any commit in this branch — it's unrelated to this refactor.

Live E2E

End-to-end-verified from the worktree against real providers:

  • openai/gpt-5.4 via OpenRouter (chat-completions, streaming) — text response, terminal tool, read_file, write_file, web_search, multi-tool chain, session resume
  • anthropic/claude-sonnet-4.6 via OpenRouter (native Anthropic format path)
  • moonshotai/kimi-k2-thinking via OpenRouter (reasoning-content path)

Plus AIAgent direct-instantiation smoke (__init__, forwarder methods all present, IterationBudget consume/refund, sanitization roundtrips).

Test-file updates

Six tests were updated alongside the refactor — all are structural guards that scan the agent loop source for known patterns, and the patterns moved:

  • tests/run_agent/test_run_agent.py — 5 inspect.getsource(AIAgent.run_conversation) rerouted to agent.conversation_loop.run_conversation; TestAnthropicInterruptHandler rerouted to the extracted streaming/non-streaming callers; TestMemoryNudgeCounterPersistence / TestMemoryProviderTurnStart accept either self.X or agent.X
  • tests/run_agent/test_memory_nudge_counter_hydration.py — scan both run_agent.py and agent/conversation_loop.py
  • tests/run_agent/test_jsondecodeerror_retryable.py — scan both
  • tests/run_agent/test_tool_executor_contextvar_propagation.py — AST guard scans both run_agent.py and agent/tool_executor.py

Risk

  • Behaviour-preserving, but it's the largest single file edit in the project's history. Targeted suites pass; live E2E confirmed on three model paths and the streaming + tool-execution paths. The full tests/ suite was not run end-to-end (only tests/run_agent/ + tests/agent/); CI is the source of truth for cross-file regressions.
  • The _ra() lazy-reference pattern is the load-bearing piece that keeps test patches working. If a future test patches a new run_agent.X name, the extracted module's reference to that name may need routing.
  • 15 commits are atomic and individually mergeable in principle; rebase-merge preserves the history.

How to review

Easiest path: walk the commits in order. Each one tells you what was extracted and which tests it ran. The pattern is identical across all 15 — once you've reviewed the first two (sanitization, tool-dispatch helpers), the rest are mechanical applications of the same template.

teknium1 added 15 commits May 16, 2026 17:41
…nitization.py

Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates,
\_sanitize_structure_surrogates, \_sanitize_messages_surrogates,
\_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments,
\_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii,
\_strip_images_from_messages, \_sanitize_structure_non_ascii) and the
\_SURROGATE_RE constant out of run_agent.py into a new module.

These are stateless byte-walking helpers with no AIAgent dependency.

Backward compatibility: run_agent re-exports every name via a single
import block, so existing 'from run_agent import _sanitize_surrogates'
imports in tests and cli.py keep working unchanged. Same pattern the
file already uses for _summarize_user_message_for_log (codex_responses_adapter).

run_agent.py: 16077 -> 15682 lines (-395).
…atch_helpers.py

Pull module-level helpers used by the tool-execution path out of
run_agent.py:

* parallelism gating — _NEVER_PARALLEL_TOOLS, _PARALLEL_SAFE_TOOLS,
  _PATH_SCOPED_TOOLS, _DESTRUCTIVE_PATTERNS, _REDIRECT_OVERWRITE,
  _is_destructive_command, _should_parallelize_tool_batch,
  _extract_parallel_scope_path, _paths_overlap
* multimodal envelopes — _is_multimodal_tool_result,
  _multimodal_text_summary, _append_subdir_hint_to_multimodal
* file-mutation verifier inputs — _extract_file_mutation_targets,
  _extract_error_preview
* trajectory normalization — _trajectory_normalize_msg

All pure functions. run_agent re-exports every name so existing
'from run_agent import _is_multimodal_tool_result' callers in
tests/tools/, tests/run_agent/, and tools/file_state.py keep working.

tests/run_agent/: 1341 passed, 3 skipped.
run_agent.py: 15682 -> 15427 lines (-255).
Three small extractions into focused modules:

* agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import),
  \_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio,
  \_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap.
* agent/iteration_budget.py — IterationBudget class (thread-safe consume/
  refund counter shared by parent agent and subagents).

run_agent re-exports every name so existing test patches like
patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget'
keep working unchanged.  Verified the patch-rebinding contract for OpenAI
explicitly.

tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py:
1347 passed, 3 skipped.
run_agent.py: 15427 -> 15261 lines (-166).
…background_review.py

Move the background-review subsystem (the self-improvement loop — see the
README) out of run_agent.py into a dedicated module.

* summarize_background_review_actions — was the @staticmethod that builds
  the user-facing action summary
* spawn_background_review_thread — builds the thread target + prompt;
  the actual review loop body (forked AIAgent, runtime inheritance,
  tool whitelist, suppression, teardown) lives in _run_review_in_thread
* build_memory_write_metadata — provenance for external memory mirrors

AIAgent keeps thin wrappers for backward compatibility AND because tests
patch run_agent.threading.Thread to assert lifecycle behavior — the
threading.Thread construction stays in AIAgent._spawn_background_review,
the inner work moves out.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client.py::test_custom_endpoint... — confirmed failing
on main before this change). 3 skipped.

run_agent.py: 15272 -> 14972 lines (-300).
…n_compression.py

Move four compression-related methods to a dedicated module:

* check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor
* replay_compression_warning — re-emit stored warning through gateway status_callback
* compress_context — run compressor, split SQLite session, notify plugins+memory
* try_shrink_image_parts_in_messages — image-too-large recovery via re-encode

AIAgent keeps thin forwarder methods so existing call sites and tests
that patch run_agent.AIAgent methods keep working.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 15013 -> 14535 lines (-478).
…ompt.py

Four AIAgent methods move into a dedicated module:

* build_system_prompt_parts — three-tier stable/context/volatile dict
* build_system_prompt        — joiner used at session start
* invalidate_system_prompt   — drop cache + reload memory
* format_tools_for_system_message — trajectory-format tool dump

The extracted helpers look up patch-target names (load_soul_md,
build_skills_system_prompt, get_toolset_for_tool, build_environment_hints,
build_context_files_prompt, build_nous_subscription_prompt) through the
run_agent module via _ra() instead of importing them directly.  That
preserves the patch surface tests rely on
(patch('run_agent.load_soul_md', ...) and friends).

AIAgent keeps thin forwarder methods.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14555 -> 14292 lines (-263).
Move the two big tool-dispatch methods out of run_agent.py:

* execute_tool_calls_concurrent — 408-line concurrent path (interrupt
  pre-flight, guardrail+plugin block, callback fan-out, ContextVar-
  preserving ThreadPoolExecutor, periodic heartbeats for the gateway
  inactivity monitor, per-tool result handling with subdir hints +
  guardrail observations + checkpoint, /steer drain)
* execute_tool_calls_sequential — 441-line sequential path (the
  original behavior used for single-tool batches and interactive
  tools)

Both take the parent AIAgent as their first argument; AIAgent keeps
thin forwarders so call sites unchanged. handle_function_call is
routed through _ra() so tests that patch run_agent.handle_function_call
keep working. _set_interrupt likewise.

The AST guard in test_tool_executor_contextvar_propagation.py is
updated to scan both run_agent.py AND agent/tool_executor.py so it
still catches the executor.submit(_run_tool, ...) regression
regardless of which file the body lives in.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14309 -> 13461 lines (-848).
Move the five stream-drop diagnostic helpers + the headers tuple:

* STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc.
* stream_diag_init — fresh per-attempt diagnostic dict
* stream_diag_capture_response — snapshot upstream headers + HTTP status
* flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering
* log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb
* emit_stream_drop — user-facing status line + activity touch

AIAgent keeps thin forwarder methods (and exposes the headers tuple as
_STREAM_DIAG_HEADERS for back-compat).  All test patches and call sites
unchanged.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13470 -> 13227 lines (-243).
…mpletion_helpers.py

Six methods move into a new module — bodies live there, AIAgent keeps
thin forwarder methods so call sites and tests are unchanged.

* interruptible_api_call — non-streaming API call with interrupt handling
* build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs
* build_assistant_message — normalize assistant message dict (reasoning,
  tool_calls, codex passthrough fields, alibaba glm-4.7 quirk)
* try_activate_fallback — provider fallback chain activation
* handle_max_iterations — controlled stop when iteration budget exhausts
* cleanup_task_resources — per-turn VM + browser teardown (skipped for
  persistent environments)

Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed
through _ra() so the patch surface is preserved.

Two TestAnthropicInterruptHandler source-introspection tests were
updated to scan agent.chat_completion_helpers.interruptible_api_call
instead of AIAgent._interruptible_api_call — the body lives in the
extracted module now.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13282 -> 12253 lines (-1029).
…chat_completion_helpers.py

Move _interruptible_streaming_api_call out of run_agent.py — the biggest
single method in the file.  Body lives next to interruptible_api_call
in agent/chat_completion_helpers.py so streaming + non-streaming code
share one home.

Nested closures (_call_chat_completions, _call_anthropic, the codex
stream branch) all come along with the body and still capture the
parent function's locals as expected.

AIAgent keeps a thin forwarder method.  is_local_endpoint added to
the import block (used by the stream stale-timeout disable logic).

One source-introspection test in TestAnthropicInterruptHandler is
updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call
instead of AIAgent._interruptible_streaming_api_call.

tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 12277 -> 11385 lines (-892).
…cated modules

Two new modules:

* agent/codex_runtime.py — three Codex API-mode methods
  - run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver
  - run_codex_stream (125 LOC) — Codex Responses API stream
  - run_codex_create_stream_fallback (78 LOC) — fallback after Responses
    stream=true initial create failure

* agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers
  totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments
  (static), repair_message_sequence, strip_think_blocks,
  recover_with_credential_pool, try_recover_primary_transport,
  drop_thinking_only_and_merge_users (static), restore_primary_runtime,
  extract_reasoning, dump_api_request_debug,
  anthropic_prompt_cache_policy, create_openai_client

AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod
where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class
attrs) are routed through _ra() to honor the patch contract. The
_TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport
and is referenced as a module-level constant in the extracted code.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 11391 -> 9887 lines (-1504).
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT,
_SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move
out of the AIAgent class body and into agent/background_review.py where
they're consumed.

AIAgent re-exposes them as class attributes via 'from ... import' inside
the class body — Python binds those names into the class namespace so
existing AIAgent._MEMORY_REVIEW_PROMPT references keep working.
spawn_background_review_thread also falls back to the module-level
constants if an agent doesn't have the attribute (preserves the test
pattern of mocking these on the agent).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 9986 -> 9800 lines (-186).
…oop.py

The 3,877-line run_conversation body — the agent loop itself — moves out
of run_agent.py into a dedicated module.  AIAgent.run_conversation is
now a thin forwarder that delegates to agent.conversation_loop.run_conversation
with the AIAgent instance as the first argument.

This is the largest single extraction in the run_agent.py refactor.
The body keeps all 163 self.X references intact (rewritten as agent.X),
all nested closures, all retry/backoff/compression machinery.  Symbols
that tests or callers patch on run_agent (_set_interrupt,
handle_function_call, AIAgent class attrs) are resolved through _ra()
inside the extracted module so the patch surface is preserved.

Five tests doing inspect.getsource(AIAgent.run_conversation) updated to
scan agent.conversation_loop.run_conversation. Two source-introspection
tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart)
updated to accept either self.X (legacy) or agent.X (extracted
form) in the matched assertions.

Live E2E verified on three model paths:
  * openai/gpt-5.4 (OpenAI chat completions via OpenRouter)
  * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter)
  * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path)
Plus read_file tool execution, terminal tool, web_search.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client::test_custom_endpoint... — same as on main).

run_agent.py: 9800 -> 5944 lines (-3856).
Total reduction since baseline: 16083 -> 5944 (-10139, 63%).
The largest method left on AIAgent (60+ parameters, the entire startup
sequence — credential resolution, provider auto-detection, context
engine bootstrap, memory store hydration, plugin lifecycle hooks)
moves into agent/agent_init.py.

AIAgent.__init__ is now a thin wrapper that calls
agent.agent_init.init_agent(self, ...) with the original full
parameter list preserved.

Module-level run_agent names referenced in the body (_openrouter_prewarm_done,
_qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI,
get_tool_definitions, check_toolset_requirements) are resolved through
_ra() so test patches on those names keep working.  agent_init's logger
warnings are routed via _ra().logger so tests patching run_agent.logger
capture them (TestStringKSuffixContextLengthWarns,
TestCustomProvidersInvalidContextLengthWarns).

Live E2E reconfirmed on three model paths (openai/gpt-5.4,
anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 5944 -> 4564 lines (-1380).
Total reduction since baseline: 16083 -> 4564 (-11519, 72%).
…elpers.py

Final extraction pass — the methods left over after run_conversation
and __init__ moved out. Together these 10 cover ~813 LOC of medium-
sized helpers:

* switch_model (194 LOC) — model switching mid-session
* _invoke_tool (87) — central tool dispatch with overrides
* _repair_tool_call (72) — argument JSON repair entrypoint
* _sanitize_api_messages (71) — role-filter for API send
* _looks_like_codex_intermediate_ack (72) — codex transcript heuristic
* _copy_reasoning_content_for_api (70) — reasoning preservation
* _cleanup_dead_connections (70) — periodic dead-socket sweep
* _extract_api_error_context (65) — error-dump context builder
* _apply_pending_steer_to_tool_results (63) — /steer injection
* _force_close_tcp_sockets (59) — aggressive socket cleanup

AIAgent keeps thin forwarder methods for all 10 (staticmethods preserved
where present). Names tests patch on run_agent (handle_function_call,
AIAgent class attrs, logger) routed through _ra() so the patch surface
is preserved.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as on main).

run_agent.py: 4634 -> 3821 lines (-813).
Final total: 16083 -> 3821 (-12262, 76% reduction).

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a large-scale, mechanical refactor that splits run_agent.py (16,083 lines) into run_agent.py (3,821 lines) plus 14 new agent/*.py modules. The extraction pattern is uniform: method bodies become module-level functions name(agent, ...) that take the parent AIAgent as the first arg, while AIAgent retains thin forwarder methods so call sites and test patches (patch("run_agent.X", ...)) keep working through a _ra() lazy back-reference. The included test changes update source-introspection guards to point at the new module locations and accept both self.X and agent.X spellings.

Changes:

  • Extract 14 cohesive modules under agent/ covering sanitization, tool dispatch/execution, system prompt assembly, conversation compression, background review, Codex runtime, conversation loop, init, and helpers.
  • Preserve test-patch contract by routing test-patched names (OpenAI, _set_interrupt, handle_function_call, etc.) through a _ra() lazy reference to run_agent.
  • Update source-introspection-based tests to scan the new module locations alongside run_agent.py.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
agent/agent_init.py Houses extracted AIAgent.__init__ body (~1.4k lines of provider auto-detect, credential resolution, context-engine bootstrap).
agent/background_review.py Self-improvement-loop fork + three review-prompt strings.
agent/codex_runtime.py Codex App Server and Responses-API streaming paths.
agent/conversation_compression.py Compression feasibility probe, compress-and-rotate-session, image-too-large recovery.
agent/iteration_budget.py Thread-safe iteration-budget counter.
agent/message_sanitization.py Surrogate scrub, tool-arg JSON repair, ASCII fallback, image strip.
agent/process_bootstrap.py Lazy OpenAI proxy, crash-resistant stdio, proxy env resolution.
agent/stream_diag.py Per-attempt stream diagnostics + retry/drop logging.
agent/system_prompt.py Three-tier (stable/context/volatile) system-prompt builder.
agent/tool_dispatch_helpers.py Parallel-batch gating, multimodal envelopes, destructive-command detection, mutation tracking.
agent/tool_executor.py Sequential + concurrent tool dispatch.
tests/run_agent/test_run_agent.py Reroutes inspect.getsource(...) assertions to extracted modules; accepts agent.X spellings.
tests/run_agent/test_memory_nudge_counter_hydration.py Scans both run_agent.py and agent/conversation_loop.py for hydration markers.
tests/run_agent/test_jsondecodeerror_retryable.py Scans concatenated source from run_agent + agent.conversation_loop.
tests/run_agent/test_tool_executor_contextvar_propagation.py AST-walks both modules for executor.submit(...) matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/tool_executor.py
Comment on lines +332 to +340
_still_running = [
parsed_calls[futures.index(f)][1]
for f in not_done
if f in futures
]
agent._touch_activity(
f"concurrent tools running ({_conc_elapsed}s, "
f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
)
Comment thread agent/agent_init.py
Comment on lines +947 to +949
if not skip_memory:
try:
_mem_provider_name = mem_config.get("provider", "") if mem_config else ""
Comment thread agent/tool_executor.py
Comment on lines +329 to +340
_conc_elapsed = int(time.time() - _conc_start)
# Heartbeat every ~30s (6 × 5s poll intervals)
if _conc_elapsed > 0 and _conc_elapsed % 30 < 6:
_still_running = [
parsed_calls[futures.index(f)][1]
for f in not_done
if f in futures
]
agent._touch_activity(
f"concurrent tools running ({_conc_elapsed}s, "
f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
)
Comment thread agent/codex_runtime.py Outdated
Comment on lines +19 to +25
import contextvars
import json
import logging
import os
import threading
import time
import uuid
Comment thread agent/codex_runtime.py



def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
Comment thread agent/tool_executor.py
Comment on lines +782 to +808
if isinstance(function_result, str):
result_preview = function_result if agent.verbose_logging else (
function_result[:200] if len(function_result) > 200 else function_result
)
_result_len = len(function_result)
else:
# Multimodal dict result (_multimodal=True) — not sliceable as string
result_preview = function_result
_result_len = len(str(function_result))

# Log tool errors to the persistent error log so [error] tags
# in the UI always have a corresponding detailed entry on disk.
_is_error_result, _ = _detect_tool_failure(function_name, function_result)
if not _execution_blocked:
function_result = agent._append_guardrail_observation(
function_name,
function_args,
function_result,
failed=_is_error_result,
)
result_preview = function_result if agent.verbose_logging else (
function_result[:200] if len(function_result) > 200 else function_result
)
if _is_error_result:
logger.warning("Tool %s returned error (%.2fs): %s", function_name, tool_duration, result_preview)
else:
logger.info("tool %s completed (%.2fs, %d chars)", function_name, tool_duration, _result_len)
Comment on lines +152 to +158
"""Return True when two paths may refer to the same subtree."""
left_parts = left.parts
right_parts = right.parts
if not left_parts or not right_parts:
# Empty paths shouldn't reach here (guarded upstream), but be safe.
return bool(left_parts) == bool(right_parts) and bool(left_parts)
common_len = min(len(left_parts), len(right_parts))
Comment on lines +128 to +139
repo = Path(__file__).resolve().parents[2]
src_ra = (repo / "run_agent.py").read_text(encoding="utf-8")
src_cl = (repo / "agent" / "conversation_loop.py").read_text(encoding="utf-8")
content = src_ra + src_cl
# Anchor on the unique comment + the modulo line.
assert "Hydrate per-session nudge counters from persisted history" in content
assert "self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content
# The line uses ``self.`` in run_agent.py form and ``agent.`` in the
# extracted module, accept either.
assert (
"self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content
or "agent._turns_since_memory = prior_user_turns % agent._memory_nudge_interval" in content
)
Comment thread tests/run_agent/test_run_agent.py Outdated
Comment on lines +5213 to +5220
# After the run_agent.py refactor the body uses ``agent.X`` instead
# of ``self.X``, so accept either form.
preamble_end = src.index("iteration_budget = IterationBudget")
preamble = src[:preamble_end]
assert "self._turns_since_memory = 0" not in preamble
assert "self._iters_since_skill = 0" not in preamble
assert "agent._turns_since_memory = 0" not in preamble
assert "agent._iters_since_skill = 0" not in preamble
@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder labels May 17, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes #24335 (and the earlier #23978#24334 chain). Same goal — decompose run_agent.py — but with module extraction instead of mixins and broader scope (76% vs 63.9% reduction).

@alt-glitch

Copy link
Copy Markdown
Collaborator

@daimon-nous this is a big PR. very important to review. use /hermes-agent-dev /hermes-pr-reproduction /adversarial-code-review skills. also track how this refactor differs AND how it may potentially degrade from this tracking issue suggesting refactor/s: #14182.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 6 comments.

Comment thread agent/tool_executor.py
Comment on lines +329 to +340
_conc_elapsed = int(time.time() - _conc_start)
# Heartbeat every ~30s (6 × 5s poll intervals)
if _conc_elapsed > 0 and _conc_elapsed % 30 < 6:
_still_running = [
parsed_calls[futures.index(f)][1]
for f in not_done
if f in futures
]
agent._touch_activity(
f"concurrent tools running ({_conc_elapsed}s, "
f"{len(not_done)} remaining: {', '.join(_still_running[:3])})"
)
Comment thread agent/tool_executor.py
Comment on lines +342 to +346
if spinner:
# Build a summary message for the spinner stop
completed = sum(1 for r in results if r is not None)
total_dur = sum(r[3] for r in results if r is not None)
spinner.stop(f"⚡ {completed}/{num_tools} tools completed in {total_dur:.1f}s total")
Comment on lines +48 to +62
class _OpenAIProxy:
"""Module-level proxy that looks like ``openai.OpenAI`` but imports lazily."""

__slots__ = ()

def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)

def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())

def __repr__(self):
return "<lazy openai.OpenAI proxy>"


Comment thread agent/codex_runtime.py Outdated
Comment on lines +19 to +27
import contextvars
import json
import logging
import os
import threading
import time
import uuid
from types import SimpleNamespace
from typing import Any, Dict, List, Optional
Comment thread agent/tool_executor.py
response_preview = _preview_str[:agent.log_prefix_chars] + "..." if len(_preview_str) > agent.log_prefix_chars else _preview_str
print(f" ✅ Tool {i+1} completed in {tool_duration:.2f}s - {response_preview}")

agent._current_tool = None
Comment thread agent/agent_init.py
polluting trajectories with user-specific persona or project instructions.
load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary
identity even when skip_context_files=True. Project context files from the cwd
remain skipped.
@daimon-nous

daimon-nous Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Review — @daimon-nous

Audit scope: Full symbol re-export verification (50 externally-imported names), _ra() call-site tracing (62 sites across 7 modules), __init__ parameter parity check (64/64 match), self→agent rename completeness (0 stray self. in extracted code), targeted test run (342/342 pass).

Mechanically sound. The extraction is clean — re-exports are complete, the _ra() indirection correctly preserves all 285+ test patches against run_agent.*, and the 15 atomic commits are individually reviewable. The selfagent conversion was thorough (747 occurrences in conversation_loop.py alone, zero misses).


Two concrete improvements for before merge:

  1. __init__ positional forwarding is fragile. The forwarder passes all 64 params positionally — if anyone adds param fix(security): prevent shell injection in sudo password piping #65 on main, they must update THREE places in lockstep (signature, init_agent signature, forwarding call) or get silent wrong-value bugs. Should be keyword-forwarded:

    init_agent(self, base_url=base_url, api_key=api_key, provider=provider, ...)
  2. Dead _ra() in agent/codex_runtime.py (line 32) — defined, never called. Minor cleanup.


Architectural note re #14182: The _ra() pattern and 51 forwarder methods are tech debt the eventual package restructure would need to unwind. Whether this is acceptable depends on #14182's timeline — if near-term, this adds friction; if distant, the readability win (16k→3.8k) justifies it.

@rujbodyslam-art rujbodyslam-art left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruj

gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ad code, tighten guards

Four fixes from PR NousResearch#27248 review:

1. **__init__ forwarder is now keyword-forwarded** (daimon-nous review).
   Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64
   params positionally to agent.agent_init.init_agent, so adding a
   65th param on main would require three lockstep edits (signature,
   init_agent signature, forwarder call) or silently shift every value.
   Keyword forwarding makes this trivially safe — adding a param now
   only needs the two signatures and one extra keyword line.

2. **Drop dead _ra() in agent/codex_runtime.py** (daimon-nous + Copilot).
   The lazy run_agent reference was defined but never called inside
   this module — the codex paths use agent.* accessors only.

3. **Drop unused imports in agent/codex_runtime.py** (Copilot):
   contextvars, threading, time, uuid, Optional. Carried over from
   run_agent.py during the original extraction.

4. **Tighten three source-introspection test guards** (Copilot):
   - test_memory_nudge_counter_hydration.py — was scanning the
     concatenated source of run_agent.py + agent/conversation_loop.py
     and matching self.X or agent.X form.  Now asserts the
     hydration block lives in agent/conversation_loop.py specifically
     with the agent.X form — the body never moves back, so if it
     ever drifts a future re-introduction fails the guard.
   - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on
     agent.iteration_budget = IterationBudget exactly (was just
     iteration_budget = IterationBudget) so an unrelated identifier
     ending in iteration_budget can't match.
   - test_run_agent.py::TestMemoryProviderTurnStart — assert the
     agent._user_turn_count form directly (the extracted body uses
     agent.X, not self.X — accepting either was a transitional fudge).
   - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py
     only, not the concatenation.

Not addressed in this commit:

* Pre-existing bugs in agent/tool_executor.py (heartbeat index
  mismatch when calls are blocked, _current_tool clobber in result
  loop, blocked-counted-as-completed in spinner summary, dead
  result_preview computation). These were preserved byte-for-byte from
  the original _execute_tool_calls_concurrent — worth a separate
  follow-up PR with proper tests.
* _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged
  by any of the original test patches (nothing actually does
  isinstance(x, OpenAI) against the proxy instance).
* agent_init.py:949 mem_config potential NameError — pre-existing;
  only triggers if _agent_cfg.get('memory', {}) itself raises, which
  it can't with a stock dict.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing
test_auxiliary_client failure (unchanged).

run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded
init call's verbosity).  Final: 16083 -> 3937 (-12146, 75% reduction).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…-27dc9cc2

refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants