Skip to content

Make flash->pro escalation threshold configurable (default 3 fires too early on long subagent runs) #628

@esengine

Description

@esengine

Problem

The auto-escalation threshold from flash -> pro is hard-coded to 3 in src/loop/turn-failure-tracker.ts:3. The TurnFailureTracker counts repair signals per turn (scavenged, truncated, repeat-loop), and the third one crosses into pro for the rest of the turn.

3 is reasonable for the average case but routinely fires "too early" on long subagent runs where flash naturally hits 2-3 minor tool-call argument truncations on big edit blocks without actually being stuck. Once it crosses to pro, the rest of the turn costs ~3x more, so the false-positive rate matters for cost.

There's no way to tune this short of editing the file. A user who's comfortable paying for one extra repair before escalating should be able to set it.

Proposed change

Make the threshold configurable, with the current value as the default. Two surfaces (mirrors the --budget / config.budgetUsd pattern):

  1. Config key: escalation.failureThreshold?: number in ReasonixConfig (default 3).
  2. Optional CLI flag on chat / code: --escalate-after <n> for one-shot overrides.

Plumb the resolved value into TurnFailureTracker's constructor (currently parameterless) and let the loop pass it through.

If the value is out of a sane range (e.g. < 1 or > 20), warn and fall back to the default — same lenient parsing as parseBudgetFlag.

Scope

  • src/loop/turn-failure-tracker.ts — accept threshold via constructor; export the default constant.
  • src/loop.ts — read config.escalation?.failureThreshold once at construction and pass into the tracker.
  • src/config.ts — add the optional config shape.
  • src/cli/index.ts--escalate-after flag, lenient parse, resolution chain (flag > config > default).
  • tests/ — one case: threshold=5 takes 5 signals before crossing; threshold=3 (default) preserves current behavior; invalid value warns + falls back.

Out of scope:

  • Per-failure-kind weighting (e.g. "truncated counts 0.5, repeat-loop counts 2").
  • Slash-command runtime toggle.
  • Telemetry on actual escalation rate.

Good first issue

Mechanical plumbing across three files (config -> loop -> tracker) with a clear test surface. The existing --budget / parseBudgetFlag pattern is the template. Comment-policy rules in CLAUDE.md apply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions