fix(stability): session skills snapshot, tool-loop guard, TUI watchdog, LM Studio preload backoff#67401
Conversation
Greptile SummaryFour focused stability fixes for self-hosted LM Studio sessions: skills-snapshot staleness on config change, unknown-tool loop guard gated behind an opt-in flag, TUI streaming indicator getting stuck after lost Confidence Score: 5/5Safe to merge — all remaining findings are P2 style suggestions that do not affect correctness. All four fixes address real, reproducible failure modes with no false-positive surface. Tests are comprehensive (189/189 passing per the PR), type-check and lint pass, and no public SDK or session-format contracts were changed. The only finding is a minor code-duplication/unreachable-fallback in config-reload.ts. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/config-reload.ts
Line: 43-55
Comment:
**Duplicate prefix-matching logic + unreachable fallback**
`firstSkillsChangedPath` replicates the exact predicate from `shouldInvalidateSkillsSnapshotForPaths`, so when the outer `if` is true the inner call is guaranteed to return a defined value — the `?? "skills.*"` on the log line is dead code. A single helper that returns the first matching path (or `undefined`) would remove the duplication and make the guarantee explicit:
```ts
function firstSkillsChangedPath(paths: string[]): string | undefined {
return paths.find((p) =>
SKILLS_INVALIDATION_PREFIXES.some(
(prefix) => p === prefix || p.startsWith(`${prefix}.`),
),
);
}
// replace shouldInvalidateSkillsSnapshotForPaths:
export function shouldInvalidateSkillsSnapshotForPaths(paths: string[]): boolean {
return firstSkillsChangedPath(paths) !== undefined;
}
```
With this shape the `?? "skills.*"` fallback inside `applySnapshot` is no longer needed (and can be removed or kept as a defensive note, but at least it's clearly intentional).
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Extensions/lmstudio: back off inference ..." | Re-trigger Greptile |
|
The I reproduced this on a pristine Same Root cause is the obfuscated dynamic specifier in export function loadPrivateQaCliModule(): Promise<Record<string, unknown>> {
const specifier = ["../../plugin-sdk/", "qa", "-lab.js"].join("");
return import(specifier) as Promise<Record<string, unknown>>;
}
The obfuscation ( Happy to open a follow-up PR that resolves the specifier against the bundled output (either drop the obfuscation and let tsdown rewrite it, or pin the target via All other checks on the PR ( |
|
Addressed Greptile's P2 on
|
|
Linked this PR to the relevant open issues in the description. Mapping for reviewers:
Not listed: I checked for open issues on LM Studio preload / memory guardrail and didn't find any — Bug 4 in this PR appears to be a novel report. |
a037a22 to
bbbd16d
Compare
|
Follow-up pushed for the TUI watchdog:
Commit: 8e87ea539c ( Verification:
|
8e87ea5 to
d4103cc
Compare
obviyus
left a comment
There was a problem hiding this comment.
Verified the four stability fixes on the latest head: config-driven skills snapshot invalidation, the always-on unknown-tool guard, the TUI streaming watchdog, and LM Studio preload backoff.
Maintainer follow-up: bound the TUI watchdog to the active run only, added the stale-run regression test, and fixed the changelog entries in the active Unreleased block with (#67401) / Thanks @xantorres attribution.
Local gate: pnpm test src/tui/tui-event-handlers.test.ts and pnpm check.
Summary
Four stability fixes for issues hit during a single self-hosted OpenClaw + LM Studio debugging session. All are reproducible, low-surface, and include unit tests.
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
Skills snapshot staleness. `getSkillsSnapshotVersion` is only bumped on filesystem changes to the skills tree (watched via chokidar in `src/agents/skills/refresh.ts`). Config changes to `skills.allowBundled` / `skills.entries.*` / `skills.profile` go through `config-reload` and `config.apply` without ever bumping the version. Sessions keep reading the cached `skillsSnapshot` from `sessions.json`, advertising skills that no longer exist in the active config.
Unknown-tool guard gated behind opt-in flag. Historical commit introduced `tools.loopDetection.enabled` as a coarse toggle for repetition detectors (genericRepeat / pingPong / pollNoProgress). The unknown-tool guard is a different concern — it only triggers when the model is calling a tool that is objectively not registered in this run — but it was wired through the same gate. Default install effectively had no protection against `Tool X not found` loops.
TUI streaming state. `tui-event-handlers.ts` transitions `activityStatus` to `streaming` on `evt.state === "delta"` and back to `idle` / `error` only on the corresponding `final` / `aborted` / `error` event. The gateway does emit a `final` event, but delivery over the WebSocket is not atomic with the run lifecycle — a connection flap or gateway restart between completion and delivery leaves the indicator pinned. There was no client-side fallback.
LM Studio preload. The wrapper dedupes concurrent preloads via `preloadInFlight`, but on failure it immediately drops the entry in `.finally()`, so the next chat request creates a brand-new preload. Combined with no circuit breaker on the failure, a guardrail rejection (memory pressure, model not loadable) produces one failed preload per chat request for as long as the user stays in the session.
Test Plan
Scoped tests cover: skills-version bump wiring + path-prefix helper, threshold override / default / invalid-input / fractional-floor for the unknown-tool guard, watchdog arm / disarm / rearm / dispose / zero-disabled cases, preload cooldown skip + expiry-based retry.
Reviewer notes