Skip to content

[Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway #17335

@alansong63

Description

@alansong63

Symptoms

Gateway (Feishu, etc.) returns HTTP 400 on API calls with errors like:

  • DeepSeek: Error from provider (DeepSeek): Tool names must be unique.
  • Xiaomi MiMo: tools contains duplicate names: lcm_expand
  • Moonshot/Kimi: function name lcm_grep is duplicated

TUI sessions are unaffected.

Environment:

Root Cause

Commit #17098 (perf(tools): memoize get_tool_definitions + TTL-cache check_fn results) introduced a quiet_mode cache in model_tools.py. The caching logic on cache-hit (line 278 return list(cached)) is safe; but the first uncached call (lines 280-283) returns the same object that gets stored into the cache. Any caller that mutates the returned list also mutates the cache.

Later, run_agent.py line 1986-1993 appends LCM context engine tool schemas to self.tools without checking for duplicates:

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

Note: memory tools injection (lines 1728-1748) already has dedup logic — context engine tools injection does NOT.

Why TUI is fine: TUI either uses quiet_mode=False (no caching) or short-lived processes.

Reproduction

Verified with a local script. Clear cache, call get_tool_definitions(quiet_mode=True), mutate the returned list, call again — the mutation is visible in the cache. In a long-lived Gateway process, each agent init appends LCM tools again, causing accumulation.

Suggested Fix

Two places to harden:

  1. model_tools.py: On the first non-cached call, return a shallow copy and cache the original — same defensive pattern as the cache-hit path (line 278).

  2. run_agent.py: Add dedup logic to context engine tool injection (lines 1986-1993), mirroring the memory tools dedup (lines 1728-1748).

Either fix alone stops the symptom; both together provide defense-in-depth.

Workaround

Restarting the Gateway temporarily clears the in-process cache, but the bug will recur as agents accumulate LCM tool appends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/gatewayGateway runner, session dispatch, deliverycomp/toolsTool registry, model_tools, toolsetstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions