Skip to content

TUI: retry-with-feedback paradigm for transient channel-resolution failures (follow-on to #1368) #1418

@Aaronontheweb

Description

@Aaronontheweb

Summary

Follow-on to #1368. That PR made the channel allow-list store the platform's immutable channel ID and resolve human display names dynamically — and, per the rule "we don't persist a display name we can't map to a real channel ID," it drops an unmappable display name (with a loud warning) rather than persisting an inert allow-list entry.

That's correct for deterministic failures (channel not found, bot lacks channels:read, invalid token). But it's a blunt response to a transient failure (network blip, timeout, 429, 5xx): we drop a just-typed name that would have resolved a second later, because in that instant we can't obtain its ID. We can't keep the name (no ID = no stable ACL key), but immediately discarding it on a recoverable error is poor UX.

This issue proposes a standard "resolve-with-retry" UI paradigm for any input whose persisted form must be obtained from a remote that may be transiently unavailable — channels today, but also provider endpoints and search backends.

The paradigm: a "Resolving" pending state with bounded auto-retry

A reference that requires remote resolution to obtain its canonical/stable form moves through four visible states instead of resolve-or-drop:

State Meaning Rendered as Persisted?
Resolved canonical ID known #display-name (display looked up live from the ID) ✅ the ID
Resolving submitted, awaiting the probe ⠋ resolving #netclaw-test… (spinner) ❌ pending
Retrying transient failure, auto-retry pending ⠋ #netclaw-test — network error, retry 2/4 in 3s… (spinner + live countdown) ❌ pending
Unmappable deterministic failure, or retries exhausted warning row + [r] Retry affordance + reason ❌ dropped

Transitions

  • submit → Resolving
  • Resolving + success → Resolved (persist the ID)
  • Resolving + deterministic failure (not-found / auth / scope) → Unmappable (drop + reason) — fail fast, today's behavior
  • Resolving + transient failure (network / timeout / 429 / 5xx) → Retrying
  • Retrying countdown elapses → Resolving (next attempt)
  • Retrying, attempts exhausted → Unmappable ("couldn't reach after N tries")
  • Unmappable + [r]Resolving (reset attempts)

Backoff: small bounded exponential (e.g. 1s → 2s → 4s → 8s cap, ~4 attempts), with the countdown shown so the wait is legible. A pending reference is treated as not in the ACL until it resolves — never an inert name on disk.

Requirements this exposes

  1. Structured failure classification on the probe. Today the resolution result carries a free-text ErrorMessage; the editor cannot tell transient from deterministic without string-matching. Add a category to SlackChannelResolutionResult / DiscordChannelResolutionResult / MattermostChannelResolutionResult (e.g. ResolutionFailureKind { Transient, Deterministic } or a bool IsRetryable), set from the transport layer (HTTP 5xx/timeout/connection = transient; missing_scope/invalid_auth/not-found = deterministic; 429 = transient with Retry-After honored).
  2. A tri-state channel reference in the editor model. Replace the flat List<string> of ids with something that can hold Resolved(id) / Pending(typedName, attempt, nextRetryAt) / Unmappable(typedName, reason), so a pending entry is renderable and cancellable but not persistable.
  3. Termina-native, non-blocking implementation (see the termina-tui-patterns skill added in feat(init,config): simplified init, rebuilt config TUI, canonical channel-ID resolution #1368):
    • the Resolving/Retrying row uses a self-animating SpinnerNode (bubbles invalidation → RequestRedraw; no manual tick);
    • the countdown is an IAnimatedTextSegment like ElapsedTimeSegment (1 Hz tick → invalidation → redraw);
    • the retry loop is a tracked task + owned CancellationTokenSource, off the loop thread, publishing via RequestRedrawno .GetAwaiter().GetResult();
    • navigating away / saving cancels-and-awaits the in-flight retry; pending refs are excluded from the saved ACL.

Where it plugs in

  • src/Netclaw.Cli/Tui/Config/ChannelsConfigViewModel.csReconcileResolvedChannels is where the drop currently happens on any non-mapping outcome (including transient errors). This is the seam to introduce the pending/retry state.
  • ISlackProbe.ResolveChannelNamesAsync / IDiscordProbe.ResolveChannelIdsAsync / IMattermostProbe.ResolveChannelIdsAsync need the structured failure kind.

Generalize it

The same "remote-resolved input that may be transiently unavailable" shape appears for inference provider endpoint probes (ProviderStepViewModel) and search backend probes (SearchConfigEditorViewModel). Propose extracting a small reusable Termina convention/helper (a RemoteResolution<T> state + a standard Resolving/Retrying/Failed row view) rather than hand-rolling it per editor.

Acceptance criteria

Non-goals

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions