Skip to content

Agent mode: add fallback strategy when tool calls fail #1641

@yitang777888

Description

@yitang777888

Problem

When Agent executes tasks that depend on external services (web search, API calls),
and those services fail due to anti-bot protection or timeout, Agent keeps retrying
the same tool call until the task fails entirely.

It does not automatically switch to alternatives, degrade gracefully, or pre-plan
fallback paths. Users running automated/scheduled tasks are hit hardest — the task
silently fails with no output.

Proposed solution

  1. Threshold-triggered fallback — after N consecutive failures of the same tool
    (e.g., same search provider), stop retrying and switch to a declared fallback path.

  2. Pre-declared degradation list— during planning phase, Agent declares fallback
    options (e.g., search → direct fetch → ask user), and executes them automatically
    when the primary path fails.

  3. URL pattern inference — when search is blocked, Agent tries common site URL
    patterns (e.g., /news, /announcements) based on domain knowledge,
    instead of giving up.

Use case

Running a daily automated task that searches university websites for announcements
and generates a report. When search engines block the request, the entire task stops
and no document is generated — even though the information could have been retrieved
by fetching the target URLs directly.

Alternatives considered

  • System prompt workaround: adding "if a tool fails, try an alternative" to the
    system prompt helps, but is not reliable in unattended scheduled runs.
  • Manual URL list: hardcoding target URLs into the task works, but defeats the
    purpose of an autonomous agent.
  • Claude Code: handles this automatically — reads fallback paths from the task
    document, switches to direct URL fetch, and completes the task. The gap is not
    model capability but the absence of a built-in "if this fails, try that" decision
    branch.

Impact

This affects every automated/scheduled Agent task that touches external services.
Without fallback support, unattended tasks are unreliable by default. Adding this
would make scheduled Agent tasks production-grade rather than best-effort.

Additional context

Claude Code behavior for reference: when search was blocked, it automatically
switched to direct URL fetching using known site structures, completed the task,
and generated the full output document. The key difference is the presence of
a fallback decision branch, not model intelligence.

(I don't know how to describe the problems I met, so I just ask deepseek and claude for help. They edited this paragraph above.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestquestionFurther information is requested

    Projects

    Status
    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions