perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage#17046
Merged
Conversation
Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:
cli 1200 → 958 ms (-242)
run_agent 1220 → 901 ms (-319)
tools.web_tools 711 → 423 ms (-288)
agent.anthropic_adapter 230 → 15 ms (-215)
agent.auxiliary_client 253 → 68 ms (-185)
Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):
1. tools/web_tools.py:
'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
which is only called when backend='firecrawl'. Users on Exa/Tavily/
Parallel pay zero firecrawl cost.
2. cli.py + gateway/run.py:
'from agent.account_usage import ...' moved into the /limits handlers.
account_usage transitively pulls the OpenAI SDK chain; only needed
when the user runs /limits.
3. agent/anthropic_adapter.py:
'try: import anthropic as _anthropic_sdk' replaced with a cached
'_get_anthropic_sdk()' accessor. The three usage sites
(build_anthropic_client, build_anthropic_bedrock_client,
read_claude_code_credentials_from_keychain) now resolve via the
accessor. All pre-existing test patches of
'agent.anthropic_adapter._anthropic_sdk' keep working because the
accessor respects any value already in module globals.
4. agent/auxiliary_client.py AND run_agent.py:
'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
level object that looks like the OpenAI class but imports the SDK on
first call/isinstance check. This preserves:
- 15+ in-module OpenAI(...) construction sites in auxiliary_client
and the single site in run_agent's _create_openai_client (Python's
function-scope name lookup finds the proxy, forwards the call);
- 'patch("agent.auxiliary_client.OpenAI", ...)' and
'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
files (patch replaces the module attribute as usual).
Tried two alternatives first:
- 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
(the audit's hypothesis here was wrong).
- Module-level __getattr__ — works for external access but Python
function-scope name resolution skips __getattr__, so in-module
OpenAI(...) calls NameError.
Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.
Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
test_api_max_retries_config.py,
tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
no longer pulls 'anthropic' or 'firecrawl' at all.
…urface
CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:
AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'
The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.
Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:
1. the gateway is a long-running daemon — boot cost is paid once per
install, not per turn;
2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
account_usage) remain in place and still account for the bulk of
the savings reported in the PR body;
3. preserving the patch surface keeps the established
'gateway.run.fetch_account_usage' monkeypatch pattern working
without touching tests.
Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.
cluricaun28
referenced
this pull request
in cluricaun28/Logos
Apr 28, 2026
…e (#17046)
* perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage
Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:
cli 1200 → 958 ms (-242)
run_agent 1220 → 901 ms (-319)
tools.web_tools 711 → 423 ms (-288)
agent.anthropic_adapter 230 → 15 ms (-215)
agent.auxiliary_client 253 → 68 ms (-185)
Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):
1. tools/web_tools.py:
'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
which is only called when backend='firecrawl'. Users on Exa/Tavily/
Parallel pay zero firecrawl cost.
2. cli.py + gateway/run.py:
'from agent.account_usage import ...' moved into the /limits handlers.
account_usage transitively pulls the OpenAI SDK chain; only needed
when the user runs /limits.
3. agent/anthropic_adapter.py:
'try: import anthropic as _anthropic_sdk' replaced with a cached
'_get_anthropic_sdk()' accessor. The three usage sites
(build_anthropic_client, build_anthropic_bedrock_client,
read_claude_code_credentials_from_keychain) now resolve via the
accessor. All pre-existing test patches of
'agent.anthropic_adapter._anthropic_sdk' keep working because the
accessor respects any value already in module globals.
4. agent/auxiliary_client.py AND run_agent.py:
'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
level object that looks like the OpenAI class but imports the SDK on
first call/isinstance check. This preserves:
- 15+ in-module OpenAI(...) construction sites in auxiliary_client
and the single site in run_agent's _create_openai_client (Python's
function-scope name lookup finds the proxy, forwards the call);
- 'patch("agent.auxiliary_client.OpenAI", ...)' and
'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
files (patch replaces the module attribute as usual).
Tried two alternatives first:
- 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
(the audit's hypothesis here was wrong).
- Module-level __getattr__ — works for external access but Python
function-scope name resolution skips __getattr__, so in-module
OpenAI(...) calls NameError.
Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.
Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
test_api_max_retries_config.py,
tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
no longer pulls 'anthropic' or 'firecrawl' at all.
* fix(gateway): restore top-level account_usage import for test-patch surface
CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:
AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'
The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.
Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:
1. the gateway is a long-running daemon — boot cost is paid once per
install, not per turn;
2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
account_usage) remain in place and still account for the bulk of
the savings reported in the PR body;
3. preserving the patch surface keeps the established
'gateway.run.fetch_account_usage' monkeypatch pattern working
without touching tests.
Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
teknium1
added a commit
that referenced
this pull request
Apr 29, 2026
#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR #17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
cluricaun28
referenced
this pull request
in cluricaun28/Logos
Apr 30, 2026
…s (#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR #17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
NousResearch#17046) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/*, anthropic/*, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
NousResearch#17098) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR NousResearch#17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports:
clirun_agenttools.web_toolsagent.anthropic_adapteragent.auxiliary_clientChanges
1.
tools/web_tools.py—from firecrawl import Firecrawlmoved into_get_firecrawl_client(). Users on Exa/Tavily/Parallel pay zero firecrawl cost.2.
cli.py+gateway/run.py—from agent.account_usage import ...moved into the/limitshandlers. account_usage transitively pulls the OpenAI SDK chain.3.
agent/anthropic_adapter.py—try: import anthropic as _anthropic_sdkreplaced with a cached_get_anthropic_sdk()accessor. All pre-existing test patches ofagent.anthropic_adapter._anthropic_sdkkeep working because the accessor respects any value already in module globals.4.
agent/auxiliary_client.py+run_agent.py—from openai import OpenAIreplaced with a module-level_OpenAIProxy()object that looks like the OpenAI class but imports the SDK on first call/isinstance check. Preserves:OpenAI(...)construction sites in auxiliary_client and the single site in run_agent's_create_openai_client(function-scope name lookup finds the proxy, forwards the call)patch("agent.auxiliary_client.OpenAI", ...)andpatch("run_agent.OpenAI", ...)test patterns used by 28+ test files (patch replaces the module attribute as usual)Alternatives tried and rejected:
from openai._client import OpenAI— still triggers full openai/init.py (the original audit's hypothesis here was wrong).__getattr__— works for external access but Python function-scope name resolution skips__getattr__, so in-moduleOpenAI(...)calls NameError.Known remaining eager import
openaistill loads onimport cliviacli.py → neuter_async_httpx_del() → openai._base_client, and viarun_agent.py → code_execution_tool.py's module-levelbuild_execute_code_schema()→_load_config()→from cli import CLI_CONFIG. Deferring those is a larger change — out of scope for this PR. The savings above all come from avoiding the heavy type-tree imports on paths that don't need them.Validation
hermes chatsmoke: 2 turns + /model switch + tool callsimport run_agentstandalone pullsanthropic/firecrawlPhase 1 item 3 of the optimization sweep.