Skip to content

fix(providers): stop persisting guessed modalities; fix probe timeout & visibility (#1290, #1292)#1311

Merged
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:fix/1290-1292-provider-discovery
Jun 3, 2026
Merged

fix(providers): stop persisting guessed modalities; fix probe timeout & visibility (#1290, #1292)#1311
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:fix/1290-1292-provider-discovery

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Fixes #1290 and #1292. Both are symptoms of one pattern in the openai-compatible discovery → probe → persist pipeline: a silent fallback ordered ahead of the source that should win. Fixed on one shared write path.

#1290 — modalities silently default to Text, then become a permanent override

  • DiscoveredModel.{Input,Output}Modalities are now ModelModality? (default null = "provider didn't report it") instead of non-nullable defaulting to Text, so "unknown" is distinguishable from "known text-only".
  • New shared ModelEntryWriter.BuildModelEntry is the single persist path; model set, the init wizard, and the TUI model manager all route through it and write modalities only when known. (Previously three hand-rolled copies — which is how the same bug landed in multiple places.)
  • ParseCodexModels no longer substitutes Text for an absent output_modalities.
  • Net effect: an openai-compatible/Ollama model persists no modalities → the daemon resolves real capabilities at runtime instead of being short-circuited by a baked-in Text.

#1292 — probe blocks silently / hard-fails at 10s

  • Ordering bug: the wizard stacked a 20s .WaitAsync over ExecuteProbeAsync's inner 10s CancelAfter; the inner timeout returned a normal failed result first, so the outer deadline + its TimeoutException handler were dead code (advertised 20s, failed at 10s). The same 20s wrapper also lived in ModelManagerViewModel and ProviderManagerViewModel, where — after the self-hosted timeout was raised — it would have truncated the longer budget.
  • Centralized all probe budgets in one ProbeTimeouts (Default 10s, SelfHosted 30s, InteractiveWallClock 45s, documented to stay > SelfHosted). Self-hosted descriptors use the 30s per-request budget; the three interactive callers apply the 45s wall-clock, which bounds pre-request work (e.g. OAuth token exchange) without truncating a slow /models probe.
  • Honest timeout message that names the actual endpoint and reads "may be slow/loading" rather than a hard failure; model discover/set surface the endpoint being probed and an elapsed-time line.

Review-driven follow-ups (2nd commit)

A max-effort review caught that the timeout de-stacking was initially incomplete (only the wizard) and that ProbeProgressReporter had disposal/erase issues; both addressed.

Validation

  • Netclaw.Configuration.Tests (366), Netclaw.Cli.Tests (787), Netclaw.Daemon.Tests provider/capability (100) — all pass. dotnet slopwatch analyze: 0 issues. Copyright headers verified.
  • Interactive smoke (init-wizard tape, isolated Linux container): passes end-to-end. netclaw doctor reports [PASS] Config Schema and [PASS] Context Window: Auto-detected 32,768 tokens for qwen2:0.5b (from running daemon) — confirming the openai-compatible discovery defaults modalities to Text, then the CLI persists it as a permanent override #1290 fix: with no modalities baked into config, the daemon resolves capabilities at runtime as intended.

Not included

Cosmetic probe-spinner animation improvements were intentionally dropped from this PR and tracked separately.

…meout (netclaw-dev#1290, netclaw-dev#1292)

Both issues are symptoms of the same pattern: a silent fallback ordered
ahead of the source that should win. Fix them on one shared write path.

netclaw-dev#1290 — modalities silently default to Text, then become a permanent override
- DiscoveredModel.{Input,Output}Modalities are now ModelModality? (default null
  = "provider didn't report it") instead of non-nullable defaulting to Text, so
  "unknown" is distinguishable from "known text-only".
- New shared ModelEntryWriter.BuildModelEntry writes modalities only when known;
  model set, the init wizard, and the TUI model manager all route through it
  instead of hand-rolling three drifting copies. openai-compatible/Ollama now
  persist no modalities, so the daemon resolves real capabilities at runtime
  instead of being short-circuited by a baked-in Text.

netclaw-dev#1292 — probe blocks silently and hard-fails at 10s
- The wizard stacked a 20s WaitAsync over ExecuteProbeAsync's inner 10s
  CancelAfter; the inner timeout returned a normal failed result first, so the
  outer deadline and its TimeoutException handler were dead code (advertised
  20s, failed at 10s). Removed the stacking — one authoritative timeout.
- Self-hosted descriptors now pass a 30s probe timeout (cold vLLM/llama.cpp can
  legitimately need it); cloud stays at 10s. The timeout message names the
  actual endpoint and elapsed seconds and reads "may be slow/loading" rather
  than a hard failure.
- model discover/set print the endpoint being probed (surfacing the silent
  localhost default) and a stderr elapsed-time ticker so slow != stuck.

Regression guards: parser leaves modalities null; shared writer omits unknown
modalities (and a flipped ModelManagerViewModel test that previously asserted
the bug); wizard persists none for openai-compatible; Codex still persists real
modalities; probe honors the supplied timeout and names the endpoint.
…w-dev#1290, netclaw-dev#1292)

A max-effort review caught that the timeout de-stacking was incomplete and
the same drift this PR set out to kill.

Timeout (the high-severity miss):
- The 20s outer .WaitAsync lived in THREE callers; the first pass de-stacked
  only the wizard, so ModelManagerViewModel and ProviderManagerViewModel still
  wrapped the probe in a 20s deadline that, after the inner self-hosted timeout
  was raised to 30s, actively truncated it — re-introducing the exact false
  "timed out" failure netclaw-dev#1292 fixed, in two TUIs.
- De-stacking the wizard also dropped the only wall-clock bound covering
  pre-request work (OAuth token exchange) that the inner per-request timeout
  does not cover, leaving it bounded only by HttpClient (~100s).
- Fix: centralize all probe budgets in one public ProbeTimeouts (Default 10s,
  SelfHosted 30s, InteractiveWallClock 45s, documented to stay > SelfHosted).
  All three interactive callers apply InteractiveWallClock as a whole-probe
  wall-clock that bounds pre-request work without truncating a slow self-hosted
  /models probe. No more per-view-model copies to drift.

Codex modalities:
- ParseCodexModels still substituted Text for an absent output_modalities — the
  same "persist a guessed Text" pathology the nullable contract forbids. Now
  passes modalities through as reported (input stays guarded by the fail-loud
  missing check); output omitted -> null -> daemon resolves at runtime.

ProbeProgressReporter robustness/cleanup:
- DisposeAsync now disposes the CTS in a finally and swallows terminal I/O
  faults, so a window/SSH closed mid-probe can't mask the real probe result.
- Erase blanks exactly the width drawn (tracked) instead of a fixed 44, so long
  endpoints leave no residue.
- Dropped the dead err/time injection seams (gated on Console.IsErrorRedirected,
  unreachable under any test runner).

ModelCommand.ResolveProbeEndpoint now TrimEnd('/')s so the surfaced endpoint
matches the one ExecuteProbeAsync actually probes.
@Aaronontheweb

Copy link
Copy Markdown
Collaborator Author

Follow-up tracked in #1312 — standardizing a self-animating Termina spinner component for layout-node views. The cosmetic probe-spinner animation work (and the re-entrant-redraw footgun surfaced here) lives there, intentionally kept out of this PR.

@Aaronontheweb Aaronontheweb added tui Terminal UI (Termina) issues bug Something isn't working config Configuration issues, netclaw doctor, schema validation. labels Jun 3, 2026
Match OllamaDescriptor, which already passes timeout: ProbeTimeouts.SelfHosted
by name. The OpenAI-compatible call site passed it positionally as a bare
trailing arg after ct, which read as two ambiguous timeout-ish values. No
behavior change.

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but will need to put it through its paces in our beta channel first

public class ModelEntryWriterTests
{
[Fact]
public void BuildModelEntry_UnknownModalities_OmitsModalityKeys()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) June 3, 2026 12:35
@Aaronontheweb Aaronontheweb merged commit 87e51cc into netclaw-dev:dev Jun 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working config Configuration issues, netclaw doctor, schema validation. tui Terminal UI (Termina) issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openai-compatible discovery defaults modalities to Text, then the CLI persists it as a permanent override

1 participant