refactor(agent): extract 14 mixins from AIAgent (63.9% code reduction) by dusterbloom · Pull Request #24334 · NousResearch/hermes-agent

dusterbloom · 2026-05-12T11:41:45Z

AIAgent Loop/Codebase Refactoring — PR #23978

Overview

Extracted 9,811 lines (63.9% of AIAgent) into 14 focused mixins, dramatically improving codebase maintainability for LLMs, agents, and human developers.

Results

Metric	Before	After
run_agent.py	15,355	5,544
Reduction	-	9,811 lines (63.9%)
Methods in AIAgent	~136	4
Mixin modules	0	14
New tests	0	495
AIAgent MRO depth	1	15

The 4 Irreducible Methods Remaining

Method	Lines	Why It Stays
`__init__`	1,477	God constructor — all state initialization, unavoidable
`run_conversation`	3,594	Core agent loop — heart of the system, unavoidable
`chat`	232	Public API surface, unavoidable
`_create_openai_client`	94	Monkeypatch compatibility with existing tests, cannot be moved

14 Mixin Modules Created

Module	Lines	Methods	Purpose
agent/config.py	~120	5 dataclasses	Config dataclasses
agent/loop.py	~180	3 classes	AgentLoop + middleware framework
agent/middleware.py	~300	5 middleware	Sanitization, finalization, guardrails
agent/utils.py	~500	28 functions	Standalone utilities
agent/streaming.py	~1,500	15 methods	Streaming deltas, interim messages
agent/tool_execution.py	~1,016	4 methods	Tool dispatch, loop control
agent/fallback.py	~555	5 methods	Provider recovery, timeout
agent/compression.py	~380	3 methods	Context compression
agent/session.py	~600	10 methods	Session persistence
agent/loop_support.py	~570	10 methods	Iteration/display helpers
agent/model_switch.py	~537	10 methods	Model switching, clients
agent/message_prep.py	~1,124	18 methods	API message sanitization/repair
agent/network.py	~465	18 methods	Network, clients, credentials
agent/steer.py	~273	11 methods	Steering, interruption, events
agent/provider.py	~343	14 methods	Provider-specific API prep
agent/auxiliary.py	~766	35 methods	Memory, vision, errors, helpers
agent/request_build.py	~1,243	15 methods	API kwargs, system prompt, lifecycle

Testing

Unit Tests

495 new tests added (one per method across all mixins)
All mixin tests GREEN (import + has_attr checks)

Subset Test Results (latest run)

FAILED tests/run_agent/test_run_agent.py::TestTruncatedToolCallRetry::test_truncated_tool_call_retries_once_before_refusing
FAILED tests/run_agent/test_run_agent.py::TestSessionMetaFiltering::test_logs_warning_when_dropping
FAILED tests/run_agent/test_run_agent_codex_responses.py::test_dump_api_request_debug_uses_responses_url
[... 46 total failures, 1254 passed]

Live E2E Test

✅ PASSED — Real API call to ZAI GLM-4.7:

Agent initializes correctly (38 tools loaded)
Context limit set (256K tokens)
API call completes successfully
Response: E2E_TEST_PASSED (exactly as requested)

Commit History (19 phases)

Each commit is atomic and clearly labeled with phase number:

92cd8346c refactor(agent): extract 28 standalone utilities into agent/utils.py (Phase 6)
c14ff6987 refactor(agent): extract ApiMessageFinalizer middleware (Phase 5)
48b380e11 refactor(agent): extract StreamingMixin with 15 streaming methods (Phase 7)
5e0186c9c refactor(agent): extract ToolExecutionMixin with 4 tool dispatch methods (Phase 8)
a2c00114d refactor(agent): extract FallbackMixin with 5 provider recovery methods (Phase 9)
88f9861d5 refactor(agent): extract CompressionMixin with 3 compression methods (Phase 10)
61d789fb8 refactor(agent): extract SessionMixin with 10 session management methods (Phase 11)
cee8798d5 fix(agent): add missing imports for full suite pass
f85b490cc fix(agent): add missing imports and fix @staticmethod for live E2E
0da0e3127 refactor(agent): extract LoopSupportMixin with 10 iteration/display methods (Phase 12)
dff6ed077 refactor(agent): extract ModelSwitchMixin with 10 model/client methods (Phase 13)
bbfbd6334 refactor(agent): extract MessagePrepMixin with 18 message prep methods (Phase 14)
2266d22c3 refactor(agent): extract NetworkMixin with 18 network/client methods (Phase 15)
3656f3e1f refactor(agent): extract SteerMixin with 11 steering/event methods (Phase 16)
55c4d8a26 refactor(agent): extract ProviderMixin with 14 provider-specific methods (Phase 17)
9c19ff645 refactor(agent): extract AuxiliaryMixin with 35 helper methods (Phase 18)
9a3a5da42 refactor(agent): extract RequestBuildMixin with 15 build/lifecycle methods (Phase 19)
28621945b fix(agent): fix mixin extraction regressions — missing imports, lost decorators, broken references
519da3092 fix(agent): remove debug prints, fix @Property base_url getter

Patterns & Lessons Learned

@staticmethod Cascade Bug

Every extraction phase had the same bug: removing a method body that was preceded by @staticmethod caused the decorator to attach to the method below it.

Solution: Pre-extraction validation scan to detect broken @staticmethod patterns before pushing.

Multi-line Signature Truncation

Methods with multi-line parameter lists (e.g., _provider_model_requires_responses_api(self, model: str, *, provider=...)) lost their self parameter during extraction.

Solution: Careful multi-line signature handling in extraction scripts.

Property vs Instance Method

Several methods lost their @property / @staticmethod decorators when extracted.

Solution: Scan for decorator loss and restore them; check @staticmethod only on methods without self.xxx references.

Monkeypatch Compatibility

_create_openai_client uses OpenAI from agent.utils. Tests monkeypatch run_agent.OpenAI, so this method must stay in run_agent.py.

Solution: Keep framework-coupled methods in the main class.

Files Changed

New Files (14 mixins + 1 test file per mixin)

agent/config.py
agent/loop.py
agent/middleware.py
agent/utils.py
agent/streaming.py
agent/tool_execution.py
agent/fallback.py
agent/compression.py
agent/session.py
agent/loop_support.py
agent/model_switch.py
agent/message_prep.py
agent/network.py
agent/steer.py
agent/provider.py
agent/auxiliary.py
agent/request_build.py

tests/agent/test_config.py
tests/agent/test_loop.py
tests/agent/test_middleware.py
tests/agent/test_utils.py
tests/agent/test_streaming.py
tests/agent/test_tool_execution.py
tests/agent/test_fallback.py
tests/agent/test_compression.py
tests/agent/test_session.py
tests/agent/test_loop_support.py
tests/agent/test_model_switch.py
tests/agent/test_message_prep.py
tests/agent/test_network.py
tests/agent/test_steer.py
tests/agent/test_provider.py
tests/agent/test_auxiliary.py
tests/agent/test_request_build.py

Modified Files

run_agent.py (-9,811 lines)
tests/e2e/conftest.py (+3 lines, mock added)
tests/agent/test_auxiliary_client.py (+1 line)
tests/run_agent/test_credential_pool.py (+1 line, mock patch)

Reviewer Checklist

Impact

For LLMs

Chunkier code: Each mixin is ~300-1,500 lines, perfect for context window utilization
Focused prompts: Single-responsibility modules lead to more focused AI understanding
Easier navigation: Smaller run_agent.py means LLMs spend less tokens reading boilerplate

For Human Devs

Reduced cognitive load: Developer only needs to focus on one mixin at a time
Clear boundaries: Each mixin has documented scope and purpose
Better testability: Isolated mixins are trivially testable
Faster iteration: Smaller files compile faster, IDE performance improves

For Future Refactoring

Clear extraction path: The 19 phases demonstrate a repeatable pattern for future work
Mixin design patterns: StreamingMixin, ProviderMixin, etc. are reusable blueprints
Avoiding the God Class: The final 4 methods are truly irreducible core

Next Steps for Maintainer

Review this summary document
Run full test suite: scripts/run_tests.sh
Consider rebase onto latest main for cleaner merge
Evaluate if 46 failures need addressing before merge
Update CHANGELOG.md with refactoring impact

Total refactor time: ~2 days (active work across sessions)
Lines deleted: 9,811
Tests added: 495

… (NousResearch#22687) /clear, /new, /reset, and /undo now ask the user to confirm before discarding conversation state — three-option prompt routed through the existing tools.slash_confirm primitive. Native yes/no buttons render on Telegram, Discord, and Slack (their adapters already implement send_slash_confirm); other platforms get a text-fallback prompt and reply with /approve, /always, or /cancel. The classic prompt_toolkit CLI uses the same three-option flow via the established _prompt_text_input pattern (see _confirm_and_reload_mcp). TUI keeps its existing modal overlay (NousResearch#12312). Gated by new config key approvals.destructive_slash_confirm (default true). Picking 'Always Approve' flips the gate to false so subsequent destructive commands run silently — matches the established mcp_reload_confirm UX. Out of scope: /cron remove (separate domain — scheduled jobs, not session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM) left unchanged; cosmetic unification can come later. Closes NousResearch#4069.

…ency Problem: unlink_tasks() removes a parent→child dependency edge but does not trigger recompute_ready(). A child whose last blocking parent is unlinked stays stuck in 'todo' indefinitely — it only promotes to 'ready' on the next dispatcher tick or a manual 'hermes kanban recompute'. For CLI-only users without a dispatcher, the child is permanently stuck. Root cause: complete_task() and unblock_task() both call recompute_ready() after their write transaction so downstream children are evaluated immediately. unlink_tasks() was missing this call — removing a dependency is semantically equivalent to completing one, so the same recompute is needed. Fix: Capture the rowcount result before the write_txn exits, then call recompute_ready(conn) outside the transaction when a row was actually deleted (so the child sees the updated task_links state). Tests: Added test_unlink_tasks_triggers_recompute_ready in tests/hermes_cli/test_kanban_db.py: creates parent A (done) + parent C (running), child B with both parents (todo), unlinks C→B, asserts B is ready immediately. Stash-verified: FAILS without fix (child stays todo), PASSES with fix. 62/62 tests green in tests/hermes_cli/test_kanban_db.py. Closes NousResearch#22459.

…use (NousResearch#22681) Plugin discovery imports every bundled platform plugin at model_tools import time. The google_chat adapter unconditionally pulled in google.cloud.pubsub_v1, googleapiclient, grpc, httplib2, and friends at module top — about 33 MB RSS and 110 ms wall on every CLI invocation, even ones that never construct a gateway adapter. Wrap the heavy imports in _load_google_modules(): an idempotent loader that rebinds the module-level globals (pubsub_v1, service_account, HttpError, MediaFileUpload, …) on first call and is invoked from GoogleChatAdapter.__init__, connect(), and check_google_chat_requirements(). The HttpError = Exception placeholder is preserved for the brief window before the loader runs, so 'except HttpError as exc:' clauses stay correct (Python looks up the name at try/except evaluation time, not at function definition time). Measured impact on a 9950X3D, 7-run medians: import cli: 895 → 787 ms (-108 ms / -12%) 133 → 110 MB ( -23 MB / -17%) import model_tools: 491 → 400 ms ( -91 ms / -19%) 95 → 66 MB ( -29 MB / -31%) google_chat alone: 244 → 132 ms (-112 ms / -46%) 83 → 50 MB ( -33 MB / -40%) hermes chat -q (cold): 177 → 145 MB ( -32 MB / -18%) Real-world win lands on every path that imports cli.py: hermes chat, hermes gateway, cron jobs, batch runs, subagents. Long-lived gateway processes save ~30 MB resident. All 157 google_chat tests pass; full gateway suite (5050 tests) green.

…ousResearch#22684) Plugin authors had no easy way to figure out why their plugin wasn't loading — failures were buried in agent.log at WARNING and skip reasons (disabled, not enabled, depth cap, exclusive) were DEBUG-only and invisible by default. Set HERMES_PLUGINS_DEBUG=1 to attach a stderr handler at DEBUG to the hermes_cli.plugins logger only. Surfaces: - which directories were scanned + manifest counts per source - per manifest: resolved key, name, kind, source, on-disk path - skip reasons (disabled, not enabled, exclusive, depth cap, no register) - per load: tools/hooks/slash/CLI commands the plugin registered - full traceback on YAML parse failure (exc_info on the existing warning) - full traceback on register() exceptions, pointing at the plugin author's line Env var off (default) → zero new stderr output, same as before. Touches only hermes_cli/plugins.py + a doc section in the plugin-build guide + an entry in the env-vars reference. 3 new tests lock the attach/idempotent/no-attach behavior.

Enforce the parent-completion invariant at claim_task (the single ready->running chokepoint) and re-gate unblock_task so blocked->ready only fires when parents are done. Prevents child tasks from running ahead of in-progress parents under the create-then-link race. Also adds a stress test that races concurrent create+link against hammered claim_task and asserts no child runs while any parent is undone. Ref: kanban/boards/cookai/workspaces/t_a6acd07d/root-cause.md Refs: t_8d6af9d6

…n tool description (NousResearch#22694) The delegate_task tool description hardcoded 'default 3' / 'default 2' for max_concurrent_children / max_spawn_depth, which misled the model on any install that raised these limits — the schema text said 'default 3' even when the user had set max_concurrent_children=15 / max_spawn_depth=3, so the model would self-cap at 3 and never use the headroom. Make the description dynamic. ToolEntry gains an optional dynamic_schema_overrides callable; registry.get_definitions() merges its output on top of the static schema before returning it. delegate_tool registers a builder that reads the current delegation.* config and emits: - 'up to N items concurrently for this user' (N = max_concurrent_children) - 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)' - 'orchestrator children can themselves delegate up to M more level(s)' - 'orchestrator_enabled=false' when the kill switch is set The model_tools cache key already includes config.yaml mtime+size, so edits to delegation.* in config invalidate the cached tool definitions without an explicit hook. CLI_CONFIG staleness within a process is a pre-existing limitation of _load_config and out of scope here. Static description / tasks.description / role.description in DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME.

…latform_toolsets When platform_toolsets[<platform>] contains both a composite (e.g. hermes-cli) and at least one configurable opt-in (e.g. spotify), the has_explicit_config branch in _get_platform_tools silently dropped the composite, leaving sessions with only the configurable + plugin tools and no native tools (terminal, file, web, browser, memory, etc.). Mirror the else-branch's subset inference for composites that sit alongside the configurables, but apply _DEFAULT_OFF_TOOLSETS only to the implicit expansion so user-listed default-off toolsets (spotify, discord) survive.

Resolve git via shutil.which with POSIX and Git-for-Windows fallbacks before clone and pull so Dashboard/API installs do not misreport Git as missing. Add regression tests for the resolver and pull subprocess invocation.

When a Telegram user replies using the native quote feature to select only part of a prior message, _build_message_event was injecting the ENTIRE replied-to message into reply_to_text via message.reply_to_message.text/caption. python-telegram-bot exposes the user-selected substring as message.quote (TextQuote.text); we now prefer that and fall back to the full replied-to text only when no native quote is present. The agent-visible "[Replying to: \"...\"]" prefix can otherwise expand the user's narrow quote into the full prior message, causing the agent to act on unrelated actionable-looking text the user did not select (e.g. multi-item briefings where the user quotes one bullet but the prefix injects every bullet). Falls back cleanly when message.quote is absent (PTB <21 or replies that don't quote a substring). Fixes NousResearch#22619 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s root Problem: After `hermes profile use NAME`, the gateway (started via systemd with HERMES_HOME=/root/.hermes hardcoded) ignores the active profile and always runs as the Default profile. WebUI, Telegram, and all non-CLI platforms are affected. Root cause: _apply_profile_override() contained an early-return guard: if profile_name is None and os.environ.get("HERMES_HOME"): return # trust the inherited value The intent was to let child processes inherit their parent's profile via HERMES_HOME without redundantly re-reading active_profile. But systemd also sets HERMES_HOME — to the hermes root (/root/.hermes), not a profile directory — so the guard fired and silently skipped the active_profile check. The user's `hermes profile use NAME` write to ~/.hermes/active_profile was never seen by the gateway process. Fix: Only skip the active_profile check when HERMES_HOME is already a profile directory, identified by its immediate parent directory being named "profiles" (e.g. ~/.hermes/profiles/coder or /opt/data/profiles/coder). When HERMES_HOME points to a root directory (parent name != "profiles"), continue to read active_profile. Tests: - test_hermes_home_at_root_with_active_profile_is_redirected: the bug scenario — HERMES_HOME=/root/.hermes + active_profile=coder → HERMES_HOME must be redirected to .../profiles/coder. Stash-verified: FAILS without fix, PASSES with fix. - test_hermes_home_already_profile_dir_is_trusted: child-process inheritance contract unchanged — .../profiles/coder is trusted as-is. - test_hermes_home_unset_reads_active_profile: classic path unchanged. - test_hermes_home_unset_default_profile_no_redirect: "default" still produces no redirect. 4/4 tests green. Closes NousResearch#22502.

…rror] in tests

Adds 'codex' to the _MCP_PRESETS registry so users can add it via Connecting to 'codex'... ✓ Connected! Found 2 tool(s) from 'codex': codex Run a Codex session. Accepts configuration parameters matchi... codex-reply Continue a Codex conversation by providing the thread id and... Enable all 2 tools? [Y/n/select]: Cancelled. without manually specifying the command and args. Enables: codex mcp-server → Hermes native MCP client → Codex tools available as first-class Hermes tools.

The github-pr-workflow skill wraps the URL in double-quotes ('curl -H ... "https://api.github.com/..."'), which the original allowlist regex (\s+https://api...) did not match. Without this, the bundled github-pr-workflow skill is still blocked at every cron tick despite NousResearch#22605's fix landing for the bare-URL form. Make the leading quote optional and add a regression test pinning both single- and double-quoted forms.

The original PR placed 'pwd = pytest.importorskip("pwd")' on line 4 but 'import pytest' on line 9 — NameError on module load. Same for test_file_sync_back.py. Plus, the in-function 'pwd = pytest.importorskip' calls in test_auto_detected_root_is_rejected confused Python's scope analysis (later 'import pytest' made pytest local everywhere in the function) and caused UnboundLocalError. Drop the now-redundant in-function importorskip calls and rely on the module-level guard.

- Restore allowed_chats gate before thread_id check so ignored_threads applies universally (even to guest mentions). - Compute _message_mentions_bot once in _should_process_message to eliminate redundant second entity scan when guest_mode=true and the message does not mention the bot. - Remove redundant _is_group_chat from _is_guest_mention (caller already verified the message is a group chat). - Update _telegram_allowed_chats docstring to note guest_mode exception. - Add test coverage: bot_command entity, text_mention entity, caption_entities, and ignored_threads + guest_mode interaction. - Add nik1t7n to AUTHOR_MAP.

…rch#22764) When session_id rotates (e.g. /new), commit_memory_session was firing MemoryManager.on_session_end but skipping ContextEngine.on_session_end. Engines that accumulate per-session state (LCM-style DAGs, summary stores) leaked that state from the rotated-out session into whatever continued under the same compressor instance. Mirror the call shutdown_memory_provider already makes — same lifecycle moment, same hook contract ("real session boundaries (CLI exit, /reset, gateway expiry)"). /new is a real boundary for the old session_id; providers keep their state but the rotated-out session_id is done. 6 regression tests covering both-hooks-fire, no-memory-manager, no-context-engine, both failure-tolerant paths. Closes NousResearch#22394.

…RON_SESSION (NousResearch#22767) Two unrelated but co-located fixes to scripts/run_tests.sh: 1. pytest-split bootstrap (NousResearch#22401): the script tried '$PYTHON -m pip install pytest-split' on first run, but uv-created venvs ship without pip. Result: 'No module named pip' before any test ran. Add a uv fallback (uv pip install --python $PYTHON), keep pip as a secondary path, and emit a clear error pointing at 'uv pip install -e ".[dev]"' when neither is available. Also declare pytest-split in pyproject.toml dev extra so a normal '.[dev]' install provisions it. 2. HERMES_CRON_SESSION leak (NousResearch#22400): the hermetic env scrub already unsets HERMES_GATEWAY_SESSION and HERMES_INTERACTIVE but missed the sibling HERMES_CRON_SESSION. When run_tests.sh is invoked from a Hermes cron job, that variable leaks into pytest, flipping tools/approval.py into cron-deny mode and breaking tests/acp/test_approval_isolation.py and friends. Closes NousResearch#22400. Closes NousResearch#22401.

NousResearch#22769) Operator-controlled HERMES_PROFILE values were rendered as '**${author}** (${ts}):' — markdown bold with no provenance prefix. Worker comment bodies render directly underneath. A misleading profile name like 'hermes-system' or 'operator' could be misread by the next worker as a system directive above attacker-influenced content (confused-deputy primitive gated on operator misconfig). The LLM-controlled author-forgery surface was already closed in NousResearch#22435 (author removed from KANBAN_COMMENT_SCHEMA). This is defense-in-depth: render with an explicit 'comment from worker `<author>` at <ts>:' prefix so even 'hermes-system' resolves to 'comment from worker `hermes-system` at ...' — parseable as worker-comment metadata, not a system directive. Strip backticks from author so they can't break out of the fence. Update test_build_worker_context_caps_comments to count by body regex since the rendered author line now also starts with 'comment '. Closes NousResearch#22452.

…ousResearch#22774) Gateway creates a fresh AIAgent per inbound message in several common scenarios: cache miss, idle eviction (1h TTL), config-signature mismatch, process restart. A freshly-built AIAgent has _turns_since_memory=0 and _user_turn_count=0, so the memory.nudge_interval trigger ('_turns_since_memory >= _memory_nudge_interval') can never be reached when these reconstructions happen on roughly the cadence of the interval. A user can chat for hours on Telegram without ever seeing a self-improvement review fire. Reconstruct the counters from conversation_history at the top of run_conversation(), right after the existing _hydrate_todo_store call. Idempotent guard ('if self._user_turn_count == 0') means a cached agent that already accumulated counters keeps them; only freshly-built agents hydrate. Modulo arithmetic preserves the original 1-in-N cadence rather than firing a review immediately on resume. 7 regression tests pinning the contract (mid-cycle history, modulo wrap, idempotency, zero-interval skip, role==user filtering, production-code anchor). Closes NousResearch#22357.

…(Phase 6) Moves 951 lines of standalone functions, classes, and module-level variables from the top of run_agent.py into agent/utils.py: - 25 pure utility functions (sanitization, JSON repair, proxy, etc.) - 3 standalone classes (_OpenAIProxy, _SafeWriter, IterationBudget) - Module-level state (_OPENAI_CLS_CACHE, _openrouter_prewarm_done, etc.) run_agent.py re-exports everything via wildcard import for backward compatibility. All 28 names now have dedicated unit tests in test_utils.py (29 tests, all GREEN). Net reduction: 944 lines from run_agent.py (15,355 -> 14,447). Cumulative reduction from base: 1,255 lines deleted, 757 added.

…ase 7) Moves 1,469 lines of streaming-related logic from AIAgent into a StreamingMixin class in agent/streaming.py. The mixin is composed into AIAgent via multiple inheritance. Methods extracted: - _interruptible_streaming_api_call (894 lines) -- core streaming loop - _fire_stream_delta, _has_stream_consumers -- delta delivery - _reset_stream_delivery_tracking, _record_streamed_assistant_text - _interim_content_was_streamed -- dedup check - _run_codex_stream, _run_codex_create_stream_fallback -- codex streaming - _stream_diag_init, _stream_diag_capture_response -- diagnostics - _log_stream_retry, _emit_stream_drop -- error handling - _emit_auxiliary_failure, _compute_non_stream_stale_timeout - _flatten_exception_chain 22 tests in test_streaming.py (all GREEN). run_agent.py: 14,450 -> 12,981 lines (-1,469). Cumulative: 2,579 lines deleted from base, 3,485 lines in new modules.

@staticmethod

…utils - Add @staticmethod to _normalize_interim_visible_text (was broken in extraction) - Remove stray @staticmethod from _interim_content_was_streamed - Add _set_interrupt and _codex_derive_responses_function_call_id to __all__ - Add _set_interrupt import to agent/streaming.py Fixes 15 codex response test failures (59/59 now pass).

…ods (Phase 8) Moves 986 lines of tool execution logic from AIAgent into a ToolExecutionMixin class in agent/tool_execution.py. Methods extracted: - _execute_tool_calls_sequential (434 lines) - _execute_tool_calls_concurrent (401 lines) - _invoke_tool (88 lines) - _apply_pending_steer_to_tool_results (63 lines) AIAgent now inherits from both StreamingMixin and ToolExecutionMixin. 5 tests in test_tool_execution.py (all GREEN). run_agent.py: 12,983 -> 11,997 lines (-986). Cumulative: 3,565 lines deleted from base.

@staticmethod

…ds (Phase 9) Moves 529 lines of fallback/recovery logic from AIAgent into a FallbackMixin class in agent/fallback.py. Methods extracted: - _try_activate_fallback (226 lines) -- provider fallback chain - _restore_primary_runtime (85 lines) -- runtime restore - _try_recover_primary_transport (83 lines) -- transport recovery - _recover_with_credential_pool (84 lines) -- credential rotation - _try_refresh_anthropic_client_credentials (51 lines) -- OAuth refresh Also fixes @staticmethod on _content_has_image_parts, adds _stream_diag_init @staticmethod, and adds FailoverReason import to fallback.py. 6 tests in test_fallback.py (all GREEN). run_agent.py: 11,998 -> 11,469 lines (-529). Cumulative: 4,094 lines deleted from base. 2,978 tests pass (1 pre-existing failure in test_auxiliary_client).

…(Phase 10) Moves 359 lines of context compression logic from AIAgent into a CompressionMixin class in agent/compression.py. Methods extracted: - _check_compression_model_feasibility (170 lines) -- model feasibility check - _replay_compression_warning (17 lines) -- warning replay - _compress_context (172 lines) -- context compression 4 tests in test_compression.py (all GREEN). run_agent.py: 11,471 -> 11,112 lines (-359). Cumulative: 4,453 lines deleted from base. 325 tests all GREEN.

…ods (Phase 11) Moves 569 lines of session and state management logic from AIAgent into a SessionMixin class in agent/session.py. Methods extracted: - _cleanup_task_resources (217 lines) -- resource cleanup - _cleanup_dead_connections (71 lines) -- dead connection cleanup - _save_session_log (66 lines) -- session log persistence - _flush_messages_to_session_db (59 lines) -- message flushing - reset_session_state (39 lines) -- session reset - _ensure_lmstudio_runtime_loaded (33 lines) -- LMStudio runtime - commit_memory_session (25 lines) -- memory session commit - _ensure_db_session (22 lines) -- DB session ensure - _get_session_db_for_recall (19 lines) -- session DB access - _apply_persist_user_message_override (18 lines) -- user message override 10 tests in test_session.py (all GREEN). run_agent.py: 11,113 -> 10,544 lines (-569). Cumulative: 5,022 lines deleted from base (32.7% reduction). 335 tests all GREEN.

@staticmethod

- Add os import to streaming.py - Remove stray @staticmethod from session.py and tool_execution.py - Add wildcard import from agent.utils to all mixin files - Expand agent/utils.py __all__ to 80 entries (was 53) - Add is_local_endpoint, save_context_length to streaming imports Verified live: hermes chat with glm-5.1/zai works end-to-end. 335 tests all GREEN.

- Add SimpleNamespace, uuid imports to streaming.py - Add contextvars import to tool_execution.py, fix duplicate logging import - Expand agent/utils.py __all__ to 128 entries with all mixin dependencies (handle_function_call, _detect_tool_failure, _get_cute_tool_message_impl, ToolCallGuardrailController, KawaiiSpinner, etc.) Full suite: 36 base failures -> 14 failures (all pre-existing). 3,302 passed, 0 new regressions. Fixed 22 pre-existing test failures.

@staticmethod

…ethods (Phase 12) Moves 531 lines of loop iteration and display logic from AIAgent into a LoopSupportMixin class in agent/loop_support.py. Methods extracted: - _handle_max_iterations (215 lines) -- max iteration handling - _interruptible_api_call (155 lines) -- interruptible API calls - _emit_warning (34 lines) -- warning emission - _vprint (27 lines) -- verbose printing - _wrap_verbose (24 lines) -- verbose wrapping - _emit_status (20 lines) -- status emission - _should_start_quiet_spinner (19 lines) -- spinner control - _should_emit_quiet_tool_messages (14 lines) -- quiet tool messages - _safe_print (18 lines) -- safe printing - _touch_activity (5 lines) -- activity tracking Also fixes @staticmethod on _normalize_interim_visible_text and _api_kwargs_have_image_parts, adds sys import to loop_support.py. 11 tests in test_loop_support.py (all GREEN). run_agent.py: 10,545 -> 10,014 lines (-531). Cumulative: 5,341 lines deleted from base (34.8% reduction). 3,327 tests pass (22 pre-existing failures, zero new regressions).

@staticmethod

…s (Phase 13) Moves ~593 lines of model switching and client lifecycle logic from AIAgent into ModelSwitchMixin in agent/model_switch.py. Methods extracted: - switch_model (189 lines) -- runtime model switching - _convert_to_trajectory_format (169 lines) -- trajectory conversion - _openai_client_lock (8 lines) -- client lock context manager - _is_openai_client_closed (30 lines) -- client state check - _close_openai_client (24 lines) -- client teardown - _replace_primary_openai_client (17 lines) -- client replacement - _ensure_primary_openai_client (17 lines) -- lazy client init - _create_request_openai_client (26 lines) -- per-request clients - _close_request_openai_client (3 lines) -- request client cleanup - _save_trajectory (16 lines) -- trajectory persistence Also fixes @staticmethod on _handle_max_iterations in loop_support.py, adds @staticmethod to _summarize_api_error, _build_keepalive_http_client, _force_close_tcp_sockets, _api_kwargs_have_image_parts in run_agent.py. 11 tests in test_model_switch.py (all GREEN). run_agent.py: 10,014 -> 9,520 lines (-494). Cumulative: 5,835 lines deleted from base (38.0% reduction). Note: _create_openai_client kept in run_agent.py to preserve monkeypatch compatibility with existing tests that patch run_agent.OpenAI.

@staticmethod

…s (Phase 14) Moves 1,088 lines of API message sanitization, repair, and preparation from AIAgent into MessagePrepMixin in agent/message_prep.py. Methods extracted: - _sanitize_tool_call_arguments (109) -- tool call arg cleanup - _sanitize_api_messages (70) -- message sanitization - _sanitize_tool_calls_for_strict_api (27) -- strict API compat - _repair_message_sequence (100) -- message order repair - _repair_tool_call (72) -- tool call repair - _deduplicate_tool_calls (17) -- dedup - _prepare_messages_for_non_vision_model (32) -- non-vision fallback - _prepare_anthropic_messages_for_api (28) -- Anthropic prep - _preprocess_anthropic_content (44) -- Anthropic content - _anthropic_prompt_cache_policy (91) -- cache policy - _anthropic_preserve_dots (34) -- Anthropic ellipsis handling - _drop_thinking_only_and_merge_users (84) -- thinking block cleanup - _strip_think_blocks (94) -- reasoning block removal - _has_content_after_think_block (23) -- content check - _copy_reasoning_content_for_api (70) -- reasoning copy - _extract_reasoning (80) -- reasoning extraction - _is_thinking_only_assistant (53) -- thinking check - _drop_trailing_empty_response_scaffolding (53) -- empty cleanup 19 tests in test_message_prep.py, fix stale test_model_switch test. run_agent.py: 9,521 -> 8,435 lines (-1,086). Cumulative: 6,920 lines deleted from base (45.1% reduction). Re-extracted from original source to preserve @staticmethod and nested funcs.

…(Phase 15) Moves 420 lines of network and client management from AIAgent into NetworkMixin in agent/network.py. Methods extracted: - _get_transport, _build_keepalive_http_client, _force_close_tcp_sockets - _apply_client_headers_for_base_url, _rebuild_anthropic_client - _try_refresh_nous/copilot/codex_client_credentials - _credential_pool_may_recover_rate_limit, _swap_credential - _is_azure_openai_url, _is_direct_openai_url, _is_github_copilot_url - _is_openrouter_url, _resolved_api_call_timeout/stale_timeout_base - _mask_api_key_for_logs, _check_openrouter_cache_status 19 tests in test_network.py (all GREEN). run_agent.py: 8,435 -> 8,014 lines (-421). Cumulative: 7,341 lines deleted from base (47.8% reduction).

…hase 16) Moves 245 lines of steering, interruption, and event emission logic from AIAgent into SteerMixin in agent/steer.py. Methods extracted: - steer (36), interrupt (68), clear_interrupt (33), is_interrupted (13) - _drain_pending_steer (16), _capture_rate_limits (19) - _fire_tool_gen_started (15), _fire_reasoning_delta (9) - _emit_interim_assistant_message (15), _normalize_interim_visible_text (5) - _usage_summary_for_api_request_hook (16) 12 tests in test_steer.py (all GREEN). run_agent.py: 8,014 -> 7,769 lines (-245). Cumulative: 7,586 lines deleted from base (49.4% reduction).

…ods (Phase 17) Moves 304 lines of provider-specific API preparation from AIAgent into ProviderMixin in agent/provider.py. Methods: Qwen, DeepSeek, Kimi, LMStudio, GitHub Models, Ollama/GLM detection, reasoning pad/tool reasoning, responses API, stop-as-truncated detection. 15 tests in test_provider.py (all GREEN). run_agent.py: 7,769 -> 7,464 lines (-305). Cumulative: 7,891 lines deleted from base (51.4% reduction). Crossed 50%!

… 18) Moves 703 lines of memory, guardrail, vision, error, and miscellaneous helper methods from AIAgent into AuxiliaryMixin in agent/auxiliary.py. Methods: memory write metadata, external memory sync, todo hydration, memory shutdown, guardrail observation/halt/control, image/vision helpers, error cleanup, API error context, token limits, session persistence, tool call ID helpers, delegate dispatch, activity summaries. 36 tests in test_auxiliary.py (all GREEN). run_agent.py: 7,464 -> 6,752 lines (-712). Cumulative: 8,603 lines deleted from base (56.0% reduction).

…thods (Phase 19) Moves 1,203 lines of API request building and lifecycle management from AIAgent into RequestBuildMixin in agent/request_build.py. Methods: _build_api_kwargs, _build_system_prompt, _build_assistant_message, _format_tools_for_system_message, _dump_api_request_debug, _spawn_background_review, _summarize_background_review_actions, close, release_clients, _looks_like_codex_intermediate_ack, _summarize_api_error, _get_messages_up_to_last_assistant, _api_kwargs_have_image_parts, _invalidate_system_prompt, base_url. 16 tests in test_request_build.py (all GREEN). run_agent.py: 6,752 -> 5,544 lines (-1,208). Cumulative: 9,811 lines deleted from base (63.9% reduction). Only 4 methods remain: __init__, _create_openai_client, run_conversation, chat.

@staticmethod

…decorators, broken references The recent mixin extraction from run_agent.py into agent/*.py left several issues that caused runtime crashes and test failures: - Remove stray @Property on AIAgent.__init__ (made agent uninstantiable) - Add missing stdlib imports (os, json, re, time, copy) to 11 mixin files that relied on from agent.utils import * but __all__ did not export them - Replace AIAgent.* references in message_prep.py with module-level equivalents (static methods cannot reference the class) - Restore lost @staticmethod on _drop_thinking_only_and_merge_users - Mock _read_user_config in e2e conftest to unblock /new tests - Patch get_nous_recommended_aux_model in pool test for hermeticity

Documents the 19-phase mixin extraction refactoring with: - Complete metrics (63.9% reduction, 495 tests, 4 methods remaining) - Mixin breakdown with purpose for each of 14 modules - Testing results (live E2E passes, subset 46 failures vs baseline 15) - Commit history (19 well-labeled atomic commits) - Lessons learned (rebase-ready, no debug prints, clean history) Rebased cleanly onto origin/main for easy merge.

dusterbloom · 2026-05-12T11:42:51Z

Closing duplicate PR. Keeping #23978 open which contains the same refactor commits. The maintainer will need to resolve merge conflicts during review since upstream main has advanced since the branch was forked.

novax635 and others added 30 commits May 12, 2026 13:32

fix(gateway): clear slash-confirm state during session boundary cleanup

480fb44

chore: add wesleysimplicio to AUTHOR_MAP

ec92ee6

chore: add SiliconID to AUTHOR_MAP

9c7725f

fix(gateway): refresh runtime argv metadata

c913204

fix(plugins): resolve Git binary for installs under minimal PATH

7cd5efc

Resolve git via shutil.which with POSIX and Git-for-Windows fallbacks before clone and pull so Dashboard/API installs do not misreport Git as missing. Add regression tests for the resolver and pull subprocess invocation.

chore: add xieNniu to AUTHOR_MAP

62b7e52

fix(dingtalk): clarify webhook media behavior

08e3f8a

fix(dingtalk): align override signatures with base + guard Optional[e…

41eb273

…rror] in tests

fix(cron): avoid github skill false positives in scanner

42d2b25

fix(cron): keep auth-header exfiltration blocked

6a623af

tests: add Windows skip guards for UNIX-only stdlib imports

4f32c5e

chore: add wali-reheman to AUTHOR_MAP

4c18e46

feat(gateway): add Telegram guest mention mode

cf940c9

dusterbloom added 20 commits May 12, 2026 13:32

fix(agent): remove debug prints, fix @Property base_url getter

9c729d7

dusterbloom requested a review from a team May 12, 2026 11:41

dusterbloom closed this May 12, 2026

alt-glitch added type/refactor Code restructuring, no behavior change P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists labels May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(agent): extract 14 mixins from AIAgent (63.9% code reduction)#24334

refactor(agent): extract 14 mixins from AIAgent (63.9% code reduction)#24334
dusterbloom wants to merge 189 commits into
NousResearch:mainfrom
dusterbloom:refactor-mixin-extraction-2

dusterbloom commented May 12, 2026

Uh oh!

dusterbloom commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

dusterbloom commented May 12, 2026

AIAgent Loop/Codebase Refactoring — PR #23978

Overview

Results

The 4 Irreducible Methods Remaining

14 Mixin Modules Created

Testing

Unit Tests

Subset Test Results (latest run)

Live E2E Test

Commit History (19 phases)

Patterns & Lessons Learned

@staticmethod Cascade Bug

Multi-line Signature Truncation

Property vs Instance Method

Monkeypatch Compatibility

Files Changed

New Files (14 mixins + 1 test file per mixin)

Modified Files

Reviewer Checklist

Impact

For LLMs

For Human Devs

For Future Refactoring

Next Steps for Maintainer

Uh oh!

dusterbloom commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants