refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)#27248
Conversation
…nitization.py Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates, \_sanitize_structure_surrogates, \_sanitize_messages_surrogates, \_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments, \_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii, \_strip_images_from_messages, \_sanitize_structure_non_ascii) and the \_SURROGATE_RE constant out of run_agent.py into a new module. These are stateless byte-walking helpers with no AIAgent dependency. Backward compatibility: run_agent re-exports every name via a single import block, so existing 'from run_agent import _sanitize_surrogates' imports in tests and cli.py keep working unchanged. Same pattern the file already uses for _summarize_user_message_for_log (codex_responses_adapter). run_agent.py: 16077 -> 15682 lines (-395).
…atch_helpers.py Pull module-level helpers used by the tool-execution path out of run_agent.py: * parallelism gating — _NEVER_PARALLEL_TOOLS, _PARALLEL_SAFE_TOOLS, _PATH_SCOPED_TOOLS, _DESTRUCTIVE_PATTERNS, _REDIRECT_OVERWRITE, _is_destructive_command, _should_parallelize_tool_batch, _extract_parallel_scope_path, _paths_overlap * multimodal envelopes — _is_multimodal_tool_result, _multimodal_text_summary, _append_subdir_hint_to_multimodal * file-mutation verifier inputs — _extract_file_mutation_targets, _extract_error_preview * trajectory normalization — _trajectory_normalize_msg All pure functions. run_agent re-exports every name so existing 'from run_agent import _is_multimodal_tool_result' callers in tests/tools/, tests/run_agent/, and tools/file_state.py keep working. tests/run_agent/: 1341 passed, 3 skipped. run_agent.py: 15682 -> 15427 lines (-255).
Three small extractions into focused modules:
* agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import),
\_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio,
\_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap.
* agent/iteration_budget.py — IterationBudget class (thread-safe consume/
refund counter shared by parent agent and subagents).
run_agent re-exports every name so existing test patches like
patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget'
keep working unchanged. Verified the patch-rebinding contract for OpenAI
explicitly.
tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py:
1347 passed, 3 skipped.
run_agent.py: 15427 -> 15261 lines (-166).
…background_review.py Move the background-review subsystem (the self-improvement loop — see the README) out of run_agent.py into a dedicated module. * summarize_background_review_actions — was the @staticmethod that builds the user-facing action summary * spawn_background_review_thread — builds the thread target + prompt; the actual review loop body (forked AIAgent, runtime inheritance, tool whitelist, suppression, teardown) lives in _run_review_in_thread * build_memory_write_metadata — provenance for external memory mirrors AIAgent keeps thin wrappers for backward compatibility AND because tests patch run_agent.threading.Thread to assert lifecycle behavior — the threading.Thread construction stays in AIAgent._spawn_background_review, the inner work moves out. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client.py::test_custom_endpoint... — confirmed failing on main before this change). 3 skipped. run_agent.py: 15272 -> 14972 lines (-300).
…n_compression.py Move four compression-related methods to a dedicated module: * check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor * replay_compression_warning — re-emit stored warning through gateway status_callback * compress_context — run compressor, split SQLite session, notify plugins+memory * try_shrink_image_parts_in_messages — image-too-large recovery via re-encode AIAgent keeps thin forwarder methods so existing call sites and tests that patch run_agent.AIAgent methods keep working. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 15013 -> 14535 lines (-478).
…ompt.py
Four AIAgent methods move into a dedicated module:
* build_system_prompt_parts — three-tier stable/context/volatile dict
* build_system_prompt — joiner used at session start
* invalidate_system_prompt — drop cache + reload memory
* format_tools_for_system_message — trajectory-format tool dump
The extracted helpers look up patch-target names (load_soul_md,
build_skills_system_prompt, get_toolset_for_tool, build_environment_hints,
build_context_files_prompt, build_nous_subscription_prompt) through the
run_agent module via _ra() instead of importing them directly. That
preserves the patch surface tests rely on
(patch('run_agent.load_soul_md', ...) and friends).
AIAgent keeps thin forwarder methods.
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).
run_agent.py: 14555 -> 14292 lines (-263).
Move the two big tool-dispatch methods out of run_agent.py: * execute_tool_calls_concurrent — 408-line concurrent path (interrupt pre-flight, guardrail+plugin block, callback fan-out, ContextVar- preserving ThreadPoolExecutor, periodic heartbeats for the gateway inactivity monitor, per-tool result handling with subdir hints + guardrail observations + checkpoint, /steer drain) * execute_tool_calls_sequential — 441-line sequential path (the original behavior used for single-tool batches and interactive tools) Both take the parent AIAgent as their first argument; AIAgent keeps thin forwarders so call sites unchanged. handle_function_call is routed through _ra() so tests that patch run_agent.handle_function_call keep working. _set_interrupt likewise. The AST guard in test_tool_executor_contextvar_propagation.py is updated to scan both run_agent.py AND agent/tool_executor.py so it still catches the executor.submit(_run_tool, ...) regression regardless of which file the body lives in. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14309 -> 13461 lines (-848).
Move the five stream-drop diagnostic helpers + the headers tuple: * STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc. * stream_diag_init — fresh per-attempt diagnostic dict * stream_diag_capture_response — snapshot upstream headers + HTTP status * flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering * log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb * emit_stream_drop — user-facing status line + activity touch AIAgent keeps thin forwarder methods (and exposes the headers tuple as _STREAM_DIAG_HEADERS for back-compat). All test patches and call sites unchanged. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13470 -> 13227 lines (-243).
…mpletion_helpers.py Six methods move into a new module — bodies live there, AIAgent keeps thin forwarder methods so call sites and tests are unchanged. * interruptible_api_call — non-streaming API call with interrupt handling * build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs * build_assistant_message — normalize assistant message dict (reasoning, tool_calls, codex passthrough fields, alibaba glm-4.7 quirk) * try_activate_fallback — provider fallback chain activation * handle_max_iterations — controlled stop when iteration budget exhausts * cleanup_task_resources — per-turn VM + browser teardown (skipped for persistent environments) Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed through _ra() so the patch surface is preserved. Two TestAnthropicInterruptHandler source-introspection tests were updated to scan agent.chat_completion_helpers.interruptible_api_call instead of AIAgent._interruptible_api_call — the body lives in the extracted module now. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13282 -> 12253 lines (-1029).
…chat_completion_helpers.py Move _interruptible_streaming_api_call out of run_agent.py — the biggest single method in the file. Body lives next to interruptible_api_call in agent/chat_completion_helpers.py so streaming + non-streaming code share one home. Nested closures (_call_chat_completions, _call_anthropic, the codex stream branch) all come along with the body and still capture the parent function's locals as expected. AIAgent keeps a thin forwarder method. is_local_endpoint added to the import block (used by the stream stale-timeout disable logic). One source-introspection test in TestAnthropicInterruptHandler is updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call instead of AIAgent._interruptible_streaming_api_call. tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 12277 -> 11385 lines (-892).
…cated modules
Two new modules:
* agent/codex_runtime.py — three Codex API-mode methods
- run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver
- run_codex_stream (125 LOC) — Codex Responses API stream
- run_codex_create_stream_fallback (78 LOC) — fallback after Responses
stream=true initial create failure
* agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers
totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments
(static), repair_message_sequence, strip_think_blocks,
recover_with_credential_pool, try_recover_primary_transport,
drop_thinking_only_and_merge_users (static), restore_primary_runtime,
extract_reasoning, dump_api_request_debug,
anthropic_prompt_cache_policy, create_openai_client
AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod
where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class
attrs) are routed through _ra() to honor the patch contract. The
_TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport
and is referenced as a module-level constant in the extracted code.
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 11391 -> 9887 lines (-1504).
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT, _SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move out of the AIAgent class body and into agent/background_review.py where they're consumed. AIAgent re-exposes them as class attributes via 'from ... import' inside the class body — Python binds those names into the class namespace so existing AIAgent._MEMORY_REVIEW_PROMPT references keep working. spawn_background_review_thread also falls back to the module-level constants if an agent doesn't have the attribute (preserves the test pattern of mocking these on the agent). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 9986 -> 9800 lines (-186).
…oop.py The 3,877-line run_conversation body — the agent loop itself — moves out of run_agent.py into a dedicated module. AIAgent.run_conversation is now a thin forwarder that delegates to agent.conversation_loop.run_conversation with the AIAgent instance as the first argument. This is the largest single extraction in the run_agent.py refactor. The body keeps all 163 self.X references intact (rewritten as agent.X), all nested closures, all retry/backoff/compression machinery. Symbols that tests or callers patch on run_agent (_set_interrupt, handle_function_call, AIAgent class attrs) are resolved through _ra() inside the extracted module so the patch surface is preserved. Five tests doing inspect.getsource(AIAgent.run_conversation) updated to scan agent.conversation_loop.run_conversation. Two source-introspection tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart) updated to accept either self.X (legacy) or agent.X (extracted form) in the matched assertions. Live E2E verified on three model paths: * openai/gpt-5.4 (OpenAI chat completions via OpenRouter) * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter) * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path) Plus read_file tool execution, terminal tool, web_search. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client::test_custom_endpoint... — same as on main). run_agent.py: 9800 -> 5944 lines (-3856). Total reduction since baseline: 16083 -> 5944 (-10139, 63%).
The largest method left on AIAgent (60+ parameters, the entire startup sequence — credential resolution, provider auto-detection, context engine bootstrap, memory store hydration, plugin lifecycle hooks) moves into agent/agent_init.py. AIAgent.__init__ is now a thin wrapper that calls agent.agent_init.init_agent(self, ...) with the original full parameter list preserved. Module-level run_agent names referenced in the body (_openrouter_prewarm_done, _qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI, get_tool_definitions, check_toolset_requirements) are resolved through _ra() so test patches on those names keep working. agent_init's logger warnings are routed via _ra().logger so tests patching run_agent.logger capture them (TestStringKSuffixContextLengthWarns, TestCustomProvidersInvalidContextLengthWarns). Live E2E reconfirmed on three model paths (openai/gpt-5.4, anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 5944 -> 4564 lines (-1380). Total reduction since baseline: 16083 -> 4564 (-11519, 72%).
…elpers.py Final extraction pass — the methods left over after run_conversation and __init__ moved out. Together these 10 cover ~813 LOC of medium- sized helpers: * switch_model (194 LOC) — model switching mid-session * _invoke_tool (87) — central tool dispatch with overrides * _repair_tool_call (72) — argument JSON repair entrypoint * _sanitize_api_messages (71) — role-filter for API send * _looks_like_codex_intermediate_ack (72) — codex transcript heuristic * _copy_reasoning_content_for_api (70) — reasoning preservation * _cleanup_dead_connections (70) — periodic dead-socket sweep * _extract_api_error_context (65) — error-dump context builder * _apply_pending_steer_to_tool_results (63) — /steer injection * _force_close_tcp_sockets (59) — aggressive socket cleanup AIAgent keeps thin forwarder methods for all 10 (staticmethods preserved where present). Names tests patch on run_agent (handle_function_call, AIAgent class attrs, logger) routed through _ra() so the patch surface is preserved. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as on main). run_agent.py: 4634 -> 3821 lines (-813). Final total: 16083 -> 3821 (-12262, 76% reduction).
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Pull request overview
This PR is a large-scale, mechanical refactor that splits run_agent.py (16,083 lines) into run_agent.py (3,821 lines) plus 14 new agent/*.py modules. The extraction pattern is uniform: method bodies become module-level functions name(agent, ...) that take the parent AIAgent as the first arg, while AIAgent retains thin forwarder methods so call sites and test patches (patch("run_agent.X", ...)) keep working through a _ra() lazy back-reference. The included test changes update source-introspection guards to point at the new module locations and accept both self.X and agent.X spellings.
Changes:
- Extract 14 cohesive modules under
agent/covering sanitization, tool dispatch/execution, system prompt assembly, conversation compression, background review, Codex runtime, conversation loop, init, and helpers. - Preserve test-patch contract by routing test-patched names (
OpenAI,_set_interrupt,handle_function_call, etc.) through a_ra()lazy reference torun_agent. - Update source-introspection-based tests to scan the new module locations alongside
run_agent.py.
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| agent/agent_init.py | Houses extracted AIAgent.__init__ body (~1.4k lines of provider auto-detect, credential resolution, context-engine bootstrap). |
| agent/background_review.py | Self-improvement-loop fork + three review-prompt strings. |
| agent/codex_runtime.py | Codex App Server and Responses-API streaming paths. |
| agent/conversation_compression.py | Compression feasibility probe, compress-and-rotate-session, image-too-large recovery. |
| agent/iteration_budget.py | Thread-safe iteration-budget counter. |
| agent/message_sanitization.py | Surrogate scrub, tool-arg JSON repair, ASCII fallback, image strip. |
| agent/process_bootstrap.py | Lazy OpenAI proxy, crash-resistant stdio, proxy env resolution. |
| agent/stream_diag.py | Per-attempt stream diagnostics + retry/drop logging. |
| agent/system_prompt.py | Three-tier (stable/context/volatile) system-prompt builder. |
| agent/tool_dispatch_helpers.py | Parallel-batch gating, multimodal envelopes, destructive-command detection, mutation tracking. |
| agent/tool_executor.py | Sequential + concurrent tool dispatch. |
| tests/run_agent/test_run_agent.py | Reroutes inspect.getsource(...) assertions to extracted modules; accepts agent.X spellings. |
| tests/run_agent/test_memory_nudge_counter_hydration.py | Scans both run_agent.py and agent/conversation_loop.py for hydration markers. |
| tests/run_agent/test_jsondecodeerror_retryable.py | Scans concatenated source from run_agent + agent.conversation_loop. |
| tests/run_agent/test_tool_executor_contextvar_propagation.py | AST-walks both modules for executor.submit(...) matches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _still_running = [ | ||
| parsed_calls[futures.index(f)][1] | ||
| for f in not_done | ||
| if f in futures | ||
| ] | ||
| agent._touch_activity( | ||
| f"concurrent tools running ({_conc_elapsed}s, " | ||
| f"{len(not_done)} remaining: {', '.join(_still_running[:3])})" | ||
| ) |
| if not skip_memory: | ||
| try: | ||
| _mem_provider_name = mem_config.get("provider", "") if mem_config else "" |
| _conc_elapsed = int(time.time() - _conc_start) | ||
| # Heartbeat every ~30s (6 × 5s poll intervals) | ||
| if _conc_elapsed > 0 and _conc_elapsed % 30 < 6: | ||
| _still_running = [ | ||
| parsed_calls[futures.index(f)][1] | ||
| for f in not_done | ||
| if f in futures | ||
| ] | ||
| agent._touch_activity( | ||
| f"concurrent tools running ({_conc_elapsed}s, " | ||
| f"{len(not_done)} remaining: {', '.join(_still_running[:3])})" | ||
| ) |
| import contextvars | ||
| import json | ||
| import logging | ||
| import os | ||
| import threading | ||
| import time | ||
| import uuid |
|
|
||
|
|
||
|
|
||
| def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None): |
| if isinstance(function_result, str): | ||
| result_preview = function_result if agent.verbose_logging else ( | ||
| function_result[:200] if len(function_result) > 200 else function_result | ||
| ) | ||
| _result_len = len(function_result) | ||
| else: | ||
| # Multimodal dict result (_multimodal=True) — not sliceable as string | ||
| result_preview = function_result | ||
| _result_len = len(str(function_result)) | ||
|
|
||
| # Log tool errors to the persistent error log so [error] tags | ||
| # in the UI always have a corresponding detailed entry on disk. | ||
| _is_error_result, _ = _detect_tool_failure(function_name, function_result) | ||
| if not _execution_blocked: | ||
| function_result = agent._append_guardrail_observation( | ||
| function_name, | ||
| function_args, | ||
| function_result, | ||
| failed=_is_error_result, | ||
| ) | ||
| result_preview = function_result if agent.verbose_logging else ( | ||
| function_result[:200] if len(function_result) > 200 else function_result | ||
| ) | ||
| if _is_error_result: | ||
| logger.warning("Tool %s returned error (%.2fs): %s", function_name, tool_duration, result_preview) | ||
| else: | ||
| logger.info("tool %s completed (%.2fs, %d chars)", function_name, tool_duration, _result_len) |
| """Return True when two paths may refer to the same subtree.""" | ||
| left_parts = left.parts | ||
| right_parts = right.parts | ||
| if not left_parts or not right_parts: | ||
| # Empty paths shouldn't reach here (guarded upstream), but be safe. | ||
| return bool(left_parts) == bool(right_parts) and bool(left_parts) | ||
| common_len = min(len(left_parts), len(right_parts)) |
| repo = Path(__file__).resolve().parents[2] | ||
| src_ra = (repo / "run_agent.py").read_text(encoding="utf-8") | ||
| src_cl = (repo / "agent" / "conversation_loop.py").read_text(encoding="utf-8") | ||
| content = src_ra + src_cl | ||
| # Anchor on the unique comment + the modulo line. | ||
| assert "Hydrate per-session nudge counters from persisted history" in content | ||
| assert "self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content | ||
| # The line uses ``self.`` in run_agent.py form and ``agent.`` in the | ||
| # extracted module, accept either. | ||
| assert ( | ||
| "self._turns_since_memory = prior_user_turns % self._memory_nudge_interval" in content | ||
| or "agent._turns_since_memory = prior_user_turns % agent._memory_nudge_interval" in content | ||
| ) |
| # After the run_agent.py refactor the body uses ``agent.X`` instead | ||
| # of ``self.X``, so accept either form. | ||
| preamble_end = src.index("iteration_budget = IterationBudget") | ||
| preamble = src[:preamble_end] | ||
| assert "self._turns_since_memory = 0" not in preamble | ||
| assert "self._iters_since_skill = 0" not in preamble | ||
| assert "agent._turns_since_memory = 0" not in preamble | ||
| assert "agent._iters_since_skill = 0" not in preamble |
|
@daimon-nous this is a big PR. very important to review. use /hermes-agent-dev /hermes-pr-reproduction /adversarial-code-review skills. also track how this refactor differs AND how it may potentially degrade from this tracking issue suggesting refactor/s: #14182. |
| _conc_elapsed = int(time.time() - _conc_start) | ||
| # Heartbeat every ~30s (6 × 5s poll intervals) | ||
| if _conc_elapsed > 0 and _conc_elapsed % 30 < 6: | ||
| _still_running = [ | ||
| parsed_calls[futures.index(f)][1] | ||
| for f in not_done | ||
| if f in futures | ||
| ] | ||
| agent._touch_activity( | ||
| f"concurrent tools running ({_conc_elapsed}s, " | ||
| f"{len(not_done)} remaining: {', '.join(_still_running[:3])})" | ||
| ) |
| if spinner: | ||
| # Build a summary message for the spinner stop | ||
| completed = sum(1 for r in results if r is not None) | ||
| total_dur = sum(r[3] for r in results if r is not None) | ||
| spinner.stop(f"⚡ {completed}/{num_tools} tools completed in {total_dur:.1f}s total") |
| class _OpenAIProxy: | ||
| """Module-level proxy that looks like ``openai.OpenAI`` but imports lazily.""" | ||
|
|
||
| __slots__ = () | ||
|
|
||
| def __call__(self, *args, **kwargs): | ||
| return _load_openai_cls()(*args, **kwargs) | ||
|
|
||
| def __instancecheck__(self, obj): | ||
| return isinstance(obj, _load_openai_cls()) | ||
|
|
||
| def __repr__(self): | ||
| return "<lazy openai.OpenAI proxy>" | ||
|
|
||
|
|
| import contextvars | ||
| import json | ||
| import logging | ||
| import os | ||
| import threading | ||
| import time | ||
| import uuid | ||
| from types import SimpleNamespace | ||
| from typing import Any, Dict, List, Optional |
| response_preview = _preview_str[:agent.log_prefix_chars] + "..." if len(_preview_str) > agent.log_prefix_chars else _preview_str | ||
| print(f" ✅ Tool {i+1} completed in {tool_duration:.2f}s - {response_preview}") | ||
|
|
||
| agent._current_tool = None |
| polluting trajectories with user-specific persona or project instructions. | ||
| load_soul_identity (bool): If True, still use ~/.hermes/SOUL.md as the primary | ||
| identity even when skip_context_files=True. Project context files from the cwd | ||
| remain skipped. |
Review — @daimon-nousAudit scope: Full symbol re-export verification (50 externally-imported names), Mechanically sound. The extraction is clean — re-exports are complete, the Two concrete improvements for before merge:
Architectural note re #14182: The |
…ad code, tighten guards Four fixes from PR NousResearch#27248 review: 1. **__init__ forwarder is now keyword-forwarded** (daimon-nous review). Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64 params positionally to agent.agent_init.init_agent, so adding a 65th param on main would require three lockstep edits (signature, init_agent signature, forwarder call) or silently shift every value. Keyword forwarding makes this trivially safe — adding a param now only needs the two signatures and one extra keyword line. 2. **Drop dead _ra() in agent/codex_runtime.py** (daimon-nous + Copilot). The lazy run_agent reference was defined but never called inside this module — the codex paths use agent.* accessors only. 3. **Drop unused imports in agent/codex_runtime.py** (Copilot): contextvars, threading, time, uuid, Optional. Carried over from run_agent.py during the original extraction. 4. **Tighten three source-introspection test guards** (Copilot): - test_memory_nudge_counter_hydration.py — was scanning the concatenated source of run_agent.py + agent/conversation_loop.py and matching self.X or agent.X form. Now asserts the hydration block lives in agent/conversation_loop.py specifically with the agent.X form — the body never moves back, so if it ever drifts a future re-introduction fails the guard. - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on agent.iteration_budget = IterationBudget exactly (was just iteration_budget = IterationBudget) so an unrelated identifier ending in iteration_budget can't match. - test_run_agent.py::TestMemoryProviderTurnStart — assert the agent._user_turn_count form directly (the extracted body uses agent.X, not self.X — accepting either was a transitional fudge). - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py only, not the concatenation. Not addressed in this commit: * Pre-existing bugs in agent/tool_executor.py (heartbeat index mismatch when calls are blocked, _current_tool clobber in result loop, blocked-counted-as-completed in spinner summary, dead result_preview computation). These were preserved byte-for-byte from the original _execute_tool_calls_concurrent — worth a separate follow-up PR with proper tests. * _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged by any of the original test patches (nothing actually does isinstance(x, OpenAI) against the proxy instance). * agent_init.py:949 mem_config potential NameError — pre-existing; only triggers if _agent_cfg.get('memory', {}) itself raises, which it can't with a stock dict. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing test_auxiliary_client failure (unchanged). run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded init call's verbosity). Final: 16083 -> 3937 (-12146, 75% reduction).
…-27dc9cc2 refactor(run_agent): extract AIAgent internals into agent/ modules (16k→3.8k lines, 76% reduction)
Summary
run_agent.pyfrom 16,083 → 3,821 lines (-12,262, 76% reduction), redistributed across 14 cohesiveagent/*.pymodules. Behavior unchanged: every extraction keeps a thin forwarder method onAIAgentso call sites and test patches (patch("run_agent.OpenAI", ...),patch("run_agent.handle_function_call", ...), etc.) keep working.Why
run_agent.pyhad grown to 16k lines.run_conversationalone was 3,877 lines.__init__1,381. The file was painful to read, painful to grep, and painful to test-locate. AGENTS.md called it out as one of five "do not delegate edits on this file" load-bearing files for exactly that reason.This refactor leaves
AIAgentas the orchestrator — thin forwarder methods that delegate to focused modules inagent/.Changes
15 atomic commits, each individually green against
tests/run_agent/ + tests/agent/:agent/message_sanitization.pyagent/tool_dispatch_helpers.pyagent/process_bootstrap.py+agent/iteration_budget.pyagent/background_review.pyagent/conversation_compression.pyagent/system_prompt.pyagent/tool_executor.pyagent/stream_diag.pyagent/chat_completion_helpers.pyagent/codex_runtime.pyagent/agent_runtime_helpers.pyagent/conversation_loop.pyrun_conversation— the agent loopagent/agent_init.py__init__Architectural pattern
Every extraction follows the same shape, by design:
name(agent, ...)that takes the parentAIAgentas the first arg.self.Xbecomesagent.Xvia word-boundary regex (~163 occurrences inrun_conversationalone).AIAgentkeeps a thin forwarder method preserving the original signature.run_agent.X(_set_interrupt,handle_function_call,OpenAI,cleanup_vm,logger,AIAgent.Xclass attrs) resolve through a_ra()lazy reference inside each extracted module so the patch contract is preserved.inspect.getsource(AIAgent.run_conversation), etc.) updated to point at the new module location.The
_ra()indirection is doing real work — without it, dozens of test patches would silently miss the extracted code.Validation
run_agent.pylinestests/run_agent/+tests/agent/passingtests/run_agent/+tests/agent/failingThe one persistent failure (
tests/agent/test_auxiliary_client.py::TestGetTextAuxiliaryClient::test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api) was confirmed failing onmainbefore any commit in this branch — it's unrelated to this refactor.Live E2E
End-to-end-verified from the worktree against real providers:
openai/gpt-5.4via OpenRouter (chat-completions, streaming) — text response, terminal tool, read_file, write_file, web_search, multi-tool chain, session resumeanthropic/claude-sonnet-4.6via OpenRouter (native Anthropic format path)moonshotai/kimi-k2-thinkingvia OpenRouter (reasoning-content path)Plus AIAgent direct-instantiation smoke (
__init__, forwarder methods all present,IterationBudgetconsume/refund, sanitization roundtrips).Test-file updates
Six tests were updated alongside the refactor — all are structural guards that scan the agent loop source for known patterns, and the patterns moved:
tests/run_agent/test_run_agent.py— 5inspect.getsource(AIAgent.run_conversation)rerouted toagent.conversation_loop.run_conversation;TestAnthropicInterruptHandlerrerouted to the extracted streaming/non-streaming callers;TestMemoryNudgeCounterPersistence/TestMemoryProviderTurnStartaccept eitherself.Xoragent.Xtests/run_agent/test_memory_nudge_counter_hydration.py— scan bothrun_agent.pyandagent/conversation_loop.pytests/run_agent/test_jsondecodeerror_retryable.py— scan bothtests/run_agent/test_tool_executor_contextvar_propagation.py— AST guard scans bothrun_agent.pyandagent/tool_executor.pyRisk
tests/suite was not run end-to-end (onlytests/run_agent/ + tests/agent/); CI is the source of truth for cross-file regressions._ra()lazy-reference pattern is the load-bearing piece that keeps test patches working. If a future test patches a newrun_agent.Xname, the extracted module's reference to that name may need routing.How to review
Easiest path: walk the commits in order. Each one tells you what was extracted and which tests it ran. The pattern is identical across all 15 — once you've reviewed the first two (sanitization, tool-dispatch helpers), the rest are mechanical applications of the same template.