fix(providers): stop persisting guessed modalities; fix probe timeout & visibility (#1290, #1292)#1311
Merged
Aaronontheweb merged 3 commits intoJun 3, 2026
Conversation
…meout (netclaw-dev#1290, netclaw-dev#1292) Both issues are symptoms of the same pattern: a silent fallback ordered ahead of the source that should win. Fix them on one shared write path. netclaw-dev#1290 — modalities silently default to Text, then become a permanent override - DiscoveredModel.{Input,Output}Modalities are now ModelModality? (default null = "provider didn't report it") instead of non-nullable defaulting to Text, so "unknown" is distinguishable from "known text-only". - New shared ModelEntryWriter.BuildModelEntry writes modalities only when known; model set, the init wizard, and the TUI model manager all route through it instead of hand-rolling three drifting copies. openai-compatible/Ollama now persist no modalities, so the daemon resolves real capabilities at runtime instead of being short-circuited by a baked-in Text. netclaw-dev#1292 — probe blocks silently and hard-fails at 10s - The wizard stacked a 20s WaitAsync over ExecuteProbeAsync's inner 10s CancelAfter; the inner timeout returned a normal failed result first, so the outer deadline and its TimeoutException handler were dead code (advertised 20s, failed at 10s). Removed the stacking — one authoritative timeout. - Self-hosted descriptors now pass a 30s probe timeout (cold vLLM/llama.cpp can legitimately need it); cloud stays at 10s. The timeout message names the actual endpoint and elapsed seconds and reads "may be slow/loading" rather than a hard failure. - model discover/set print the endpoint being probed (surfacing the silent localhost default) and a stderr elapsed-time ticker so slow != stuck. Regression guards: parser leaves modalities null; shared writer omits unknown modalities (and a flipped ModelManagerViewModel test that previously asserted the bug); wizard persists none for openai-compatible; Codex still persists real modalities; probe honors the supplied timeout and names the endpoint.
…w-dev#1290, netclaw-dev#1292) A max-effort review caught that the timeout de-stacking was incomplete and the same drift this PR set out to kill. Timeout (the high-severity miss): - The 20s outer .WaitAsync lived in THREE callers; the first pass de-stacked only the wizard, so ModelManagerViewModel and ProviderManagerViewModel still wrapped the probe in a 20s deadline that, after the inner self-hosted timeout was raised to 30s, actively truncated it — re-introducing the exact false "timed out" failure netclaw-dev#1292 fixed, in two TUIs. - De-stacking the wizard also dropped the only wall-clock bound covering pre-request work (OAuth token exchange) that the inner per-request timeout does not cover, leaving it bounded only by HttpClient (~100s). - Fix: centralize all probe budgets in one public ProbeTimeouts (Default 10s, SelfHosted 30s, InteractiveWallClock 45s, documented to stay > SelfHosted). All three interactive callers apply InteractiveWallClock as a whole-probe wall-clock that bounds pre-request work without truncating a slow self-hosted /models probe. No more per-view-model copies to drift. Codex modalities: - ParseCodexModels still substituted Text for an absent output_modalities — the same "persist a guessed Text" pathology the nullable contract forbids. Now passes modalities through as reported (input stays guarded by the fail-loud missing check); output omitted -> null -> daemon resolves at runtime. ProbeProgressReporter robustness/cleanup: - DisposeAsync now disposes the CTS in a finally and swallows terminal I/O faults, so a window/SSH closed mid-probe can't mask the real probe result. - Erase blanks exactly the width drawn (tracked) instead of a fixed 44, so long endpoints leave no residue. - Dropped the dead err/time injection seams (gated on Console.IsErrorRedirected, unreachable under any test runner). ModelCommand.ResolveProbeEndpoint now TrimEnd('/')s so the surfaced endpoint matches the one ExecuteProbeAsync actually probes.
Collaborator
Author
|
Follow-up tracked in #1312 — standardizing a self-animating Termina spinner component for layout-node views. The cosmetic probe-spinner animation work (and the re-entrant-redraw footgun surfaced here) lives there, intentionally kept out of this PR. |
Match OllamaDescriptor, which already passes timeout: ProbeTimeouts.SelfHosted by name. The OpenAI-compatible call site passed it positionally as a bare trailing arg after ct, which read as two ambiguous timeout-ish values. No behavior change.
Aaronontheweb
commented
Jun 3, 2026
Aaronontheweb
left a comment
Collaborator
Author
There was a problem hiding this comment.
LGTM but will need to put it through its paces in our beta channel first
| public class ModelEntryWriterTests | ||
| { | ||
| [Fact] | ||
| public void BuildModelEntry_UnknownModalities_OmitsModalityKeys() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1290 and #1292. Both are symptoms of one pattern in the openai-compatible discovery → probe → persist pipeline: a silent fallback ordered ahead of the source that should win. Fixed on one shared write path.
#1290 — modalities silently default to
Text, then become a permanent overrideDiscoveredModel.{Input,Output}Modalitiesare nowModelModality?(default null = "provider didn't report it") instead of non-nullable defaulting toText, so "unknown" is distinguishable from "known text-only".ModelEntryWriter.BuildModelEntryis the single persist path;model set, the init wizard, and the TUI model manager all route through it and write modalities only when known. (Previously three hand-rolled copies — which is how the same bug landed in multiple places.)ParseCodexModelsno longer substitutesTextfor an absentoutput_modalities.Text.#1292 — probe blocks silently / hard-fails at 10s
.WaitAsyncoverExecuteProbeAsync's inner 10sCancelAfter; the inner timeout returned a normal failed result first, so the outer deadline + itsTimeoutExceptionhandler were dead code (advertised 20s, failed at 10s). The same 20s wrapper also lived inModelManagerViewModelandProviderManagerViewModel, where — after the self-hosted timeout was raised — it would have truncated the longer budget.ProbeTimeouts(Default 10s, SelfHosted 30s, InteractiveWallClock 45s, documented to stay > SelfHosted). Self-hosted descriptors use the 30s per-request budget; the three interactive callers apply the 45s wall-clock, which bounds pre-request work (e.g. OAuth token exchange) without truncating a slow/modelsprobe.model discover/setsurface the endpoint being probed and an elapsed-time line.Review-driven follow-ups (2nd commit)
A max-effort review caught that the timeout de-stacking was initially incomplete (only the wizard) and that
ProbeProgressReporterhad disposal/erase issues; both addressed.Validation
Netclaw.Configuration.Tests(366),Netclaw.Cli.Tests(787),Netclaw.Daemon.Testsprovider/capability (100) — all pass.dotnet slopwatch analyze: 0 issues. Copyright headers verified.init-wizardtape, isolated Linux container): passes end-to-end.netclaw doctorreports[PASS] Config Schemaand[PASS] Context Window: Auto-detected 32,768 tokens for qwen2:0.5b (from running daemon)— confirming the openai-compatible discovery defaults modalities to Text, then the CLI persists it as a permanent override #1290 fix: with no modalities baked into config, the daemon resolves capabilities at runtime as intended.Not included
Cosmetic probe-spinner animation improvements were intentionally dropped from this PR and tracked separately.