Skip to content

[Reliability] Repeated tool-call errors (path/retry patterns) inflate latency; need stronger fail-fast + dedupe #35114

@Urfread

Description

@Urfread

[Reliability] Repeated tool-call errors (path/retry patterns) inflate latency; need stronger fail-fast + dedupe

Summary

Repeated tool-call failures in a single turn can cascade into extra retries/attempts and significantly increase latency.

Observed symptoms (local logs)

Top tool errors:

  • terminal: 41 errors
  • read_file: 12 errors
  • search_files: 5 errors
  • patch: 4 errors

Common signature classes:

  • path not found / bad path normalization
  • repeated "tool returned error" in same turn

Expected behavior

  • Error-class aware retry policy (non-retryable/path errors should fail-fast).
  • Duplicate-error suppressor within a turn.
  • Better remediation hints after first failure (especially path normalization on Windows).

Actual behavior

  • Repeated failures can continue in same turn, adding avoidable token and wall-clock cost.

Related existing discussions (possible overlap)

Why file this anyway

Even if some sub-problems are tracked, this is still reproducible in real usage and has high user-cost impact.

Environment

  • OS: Windows 11
  • Hermes profile: default

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildercomp/toolsTool registry, model_tools, toolsetstype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions