Skip to content

fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)#17889

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-a26af027
Apr 30, 2026
Merged

fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)#17889
teknium1 merged 1 commit into
mainfrom
hermes/hermes-a26af027

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvages #17337 by @Sanjays2402 onto current main. Closes #17335.

Long-lived Gateway processes were sending duplicate tool names to providers that enforce uniqueness (DeepSeek, Xiaomi MiMo, Moonshot/Kimi → HTTP 400). TUI was unaffected because it runs with quiet_mode=False and skips the cache.

Root cause (two layered bugs)

  1. model_tools.get_tool_definitions(quiet_mode=True) aliased its cached list on the first uncached call. The cache-hit path already returned list(cached), but the first call stored and returned the same object. run_agent then mutates self.tools in place, so agent init Terminal tool #1 poisoned the cache and every subsequent init re-appended LCM schemas.
  2. run_agent.py LCM context-engine injection had no dedup, unlike the memory-tools injection right above it.

Fix (defense in depth)

  • model_tools.py — cache the result then return list(result) on the uncached branch, mirroring the cache-hit path
  • run_agent.py — build _existing_tool_names from self.tools and skip already-present schemas, mirroring memory-tools dedup

Validation

scripts/run_tests.sh tests/test_get_tool_definitions_cache_isolation.py tests/test_model_tools.py
29 passed in 3.98s

5 new regression tests pin the behavior; 23 existing test_model_tools.py tests still pass. Authorship preserved for @Sanjays2402.

… injection (#17335)

Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:

  - DeepSeek:        'Tool names must be unique.'
  - Xiaomi MiMo:     'tools contains duplicate names: lcm_expand'
  - Moonshot/Kimi:   'function name lcm_grep is duplicated'

TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.

Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
  in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
  but the FIRST uncached call stored and returned the SAME object.
  run_agent.py mutates self.tools (memory + LCM context-engine schemas)
  in-place, so the very first agent init in a Gateway process
  poisoned the cache, and every subsequent init appended LCM schemas
  again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
  lcm_expand) had no dedup, unlike the memory-tools injection right
  above it which already skips already-present names.

Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
  computed list but return list(result) to the caller. Same pattern as
  the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
  schemas whose names are already present, mirroring the memory-tools
  block. This also defends against plugin paths that may register the
  same schemas via ctx.register_tool().

Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
  it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
  appending lcm_grep / lcm_expand to the returned list and asserts the
  next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
  long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
  was fine.

5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/tools Tool registry, model_tools, toolsets comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 30, 2026
@teknium1 teknium1 merged commit e0fa2cf into main Apr 30, 2026
12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-a26af027 branch April 30, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway

3 participants