Skip to content

fix(auto-reply): clear runtime model cache on reset#77339

Open
mjamiv wants to merge 1 commit into
openclaw:mainfrom
mjamiv:fix/77322-auto-reply-reset-model-cache
Open

fix(auto-reply): clear runtime model cache on reset#77339
mjamiv wants to merge 1 commit into
openclaw:mainfrom
mjamiv:fix/77322-auto-reply-reset-model-cache

Conversation

@mjamiv

@mjamiv mjamiv commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Why

  • the transient reset entry was clean, but the persisted store merge kept old model/modelProvider fields and could keep using the prior model after defaults changed

Real behavior proof

Behavior or issue addressed: /new and /reset should start a fresh channel session without carrying stale runtime model cache fields from the previous run, so a defaults change or a model retirement actually takes effect on the next turn.

Real environment tested: rebased patched OpenClaw source checkout at /tmp/openclaw-77339, Node v24.14.0, running a standalone tsx driver script that imports the production initSessionState from src/auto-reply/reply/session.ts and exercises a real on-disk persisted session store. No vitest, no mocks — just the real auto-reply session-state code path against a real sessions.json file in /tmp.

Exact steps or command run after this patch:

  1. Build a small driver script that imports initSessionState from the patched src/auto-reply/reply/session.ts, seeds a real sessions.json with a session entry that has modelProvider: "openai", model: "gpt-5.4-mini", contextTokens: 400_000, and verboseLevel: "on", calls initSessionState with Body: "/new" (and again with Body: "/reset"), then prints the live persisted store contents and the returned sessionEntry fields. The driver is not part of the diff.
  2. pnpm exec tsx scratch-77322-demo.mts

Driver script:

import * as fs from "node:fs/promises";
import * as os from "node:os";
import * as path from "node:path";
import { initSessionState } from "./src/auto-reply/reply/session.js";

async function main() {
  const tmpRoot = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-77322-demo-"));
  const storePath = path.join(tmpRoot, "sessions.json");
  const sessionKey = "agent:main:telegram:direct:demo";
  const seedEntry = {
    sessionId: "session-before-reset",
    updatedAt: Date.now(),
    modelProvider: "openai",
    model: "gpt-5.4-mini",
    contextTokens: 400_000,
    verboseLevel: "on",
  };
  await fs.writeFile(storePath, JSON.stringify({ [sessionKey]: seedEntry }, null, 2));
  console.log("[before /new] sessions.json:");
  console.log((await fs.readFile(storePath, "utf-8")).trim());

  const cfg = { session: { store: storePath, idleMinutes: 999 } } as any;
  const r1 = await initSessionState({
    ctx: { Body: "/new", RawBody: "/new", CommandBody: "/new",
           From: "user", To: "bot", ChatType: "direct",
           SessionKey: sessionKey, Provider: "telegram", Surface: "telegram" } as any,
    cfg, commandAuthorized: true,
  });
  console.log("\n[after /new] sessionEntry.modelProvider:", r1.sessionEntry.modelProvider);
  console.log("[after /new] sessionEntry.model:", r1.sessionEntry.model);
  console.log("[after /new] sessionEntry.verboseLevel:", r1.sessionEntry.verboseLevel);
  console.log("[after /new] sessions.json:");
  console.log((await fs.readFile(storePath, "utf-8")).trim());

  await fs.writeFile(storePath, JSON.stringify({ [sessionKey]: seedEntry }, null, 2));
  const r2 = await initSessionState({
    ctx: { Body: "/reset", RawBody: "/reset", CommandBody: "/reset",
           From: "user", To: "bot", ChatType: "direct",
           SessionKey: sessionKey, Provider: "telegram", Surface: "telegram" } as any,
    cfg, commandAuthorized: true,
  });
  console.log("\n[after /reset] sessionEntry.modelProvider:", r2.sessionEntry.modelProvider);
  console.log("[after /reset] sessionEntry.model:", r2.sessionEntry.model);
  console.log("[after /reset] sessionEntry.verboseLevel:", r2.sessionEntry.verboseLevel);
}
main().catch((e) => { console.error(e); process.exit(1); });

Evidence after fix: copied live stdout from the tsx driver, with uuids/timestamps as emitted by the runtime (anonymized identifiers only):

[before /new] sessions.json:
{
  "agent:main:telegram:direct:demo": {
    "sessionId": "session-before-reset",
    "updatedAt": 1778436597524,
    "modelProvider": "openai",
    "model": "gpt-5.4-mini",
    "contextTokens": 400000,
    "verboseLevel": "on"
  }
}

[after /new] sessionEntry.modelProvider: undefined
[after /new] sessionEntry.model: undefined
[after /new] sessionEntry.verboseLevel: on
[after /new] sessions.json:
{
  "agent:main:telegram:direct:demo": {
    "sessionId": "fa90f97a-7512-41fb-a75f-74d4a0f4aa2f",
    "updatedAt": 1778436649018,
    "verboseLevel": "on",
    "sessionStartedAt": 1778436649016,
    "lastInteractionAt": 1778436649016,
    "systemSent": false,
    "abortedLastRun": false,
    "usageFamilyKey": "agent:main:telegram:direct:demo",
    "usageFamilySessionIds": [
      "session-before-reset",
      "fa90f97a-7512-41fb-a75f-74d4a0f4aa2f"
    ],
    "chatType": "direct",
    "deliveryContext": { "channel": "telegram", "to": "bot" },
    "lastChannel": "telegram",
    "lastTo": "bot",
    "origin": { "label": "user", "provider": "telegram", "surface": "telegram",
                "chatType": "direct", "from": "user", "to": "bot" },
    "sessionFile": "<state-dir>/agents/main/sessions/fa90f97a-7512-41fb-a75f-74d4a0f4aa2f.jsonl",
    "compactionCount": 0
  }
}

[after /reset] sessionEntry.modelProvider: undefined
[after /reset] sessionEntry.model: undefined
[after /reset] sessionEntry.verboseLevel: on

Observed result after fix:

  • sessionId rotates to a fresh uuid (true reset; usageFamilySessionIds keeps both entries for cost tracking).
  • modelProvider, model, and contextTokens are absent from the persisted entry — the stale runtime cache is gone, so the next turn resolves from current defaults or explicit preserved overrides.
  • verboseLevel: "on" (an unrelated behavior override) is preserved.
  • Same shape on both /new and /reset.

What was not tested: full Telegram network round-trip; the proof exercises the production auto-reply session-state code path that channel commands route through (the same initSessionState call that runs in production for /new and /reset).

Validation

  • pnpm install
  • pnpm test -- src/auto-reply/reply/session.test.ts
  • pnpm test src/gateway/sessions-patch.test.ts src/gateway/server.sessions.reset-models.test.ts
  • pnpm exec oxfmt --check --threads=1 src/auto-reply/reply/session.ts src/auto-reply/reply/session.test.ts CHANGELOG.md
  • git diff --check
  • pnpm check:changed -- --base upstream/main --head HEAD
  • pnpm exec tsx scratch-77322-demo.mts (the live driver above)

Fixes #77322

@clawsweeper

clawsweeper Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 11:46 PM ET / 03:46 UTC.

Summary
The PR clears cached auto-reply modelProvider/model fields during /new and /reset, adds persisted-store regression coverage, and adds one changelog entry.

PR surface: Source +5, Tests +59, Docs +1. Total +65 across 3 files.

Reproducibility: yes. source-level reproduction is clear: current main resets the session but does not clear stale modelProvider/model, and the PR body supplies a real on-disk session-store driver showing the fields gone after /new and /reset. I did not run tests because this review is read-only.

Review metrics: 1 noteworthy metric.

  • Release-owned changelog touch: 1 added entry. CHANGELOG.md is release-owned in this repo, so maintainers may choose to drop or keep that release note during landing without changing the code verdict.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Next step before merge

  • [P2] No repair lane is needed; the branch already contains the focused code change, regression coverage, and sufficient proof, leaving ordinary maintainer review and landing.

Security
Cleared: The diff touches only auto-reply session code, colocated tests, and a changelog entry; it does not change dependencies, workflows, secrets, permissions, or code download/execution paths.

Review details

Best possible solution:

Land the focused auto-reply reset cleanup after maintainer review, with release-note handling left to the repository release flow.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level reproduction is clear: current main resets the session but does not clear stale modelProvider/model, and the PR body supplies a real on-disk session-store driver showing the fields gone after /new and /reset. I did not run tests because this review is read-only.

Is this the best way to solve the issue?

Yes, this is the narrow maintainable fix: clear only runtime cache fields in the auto-reply reset path while existing preserved-selection logic continues to protect explicit user overrides.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e7fb8cabb681.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live terminal output from production initSessionState against a real disk-backed session store, which directly exercises the changed behavior.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority channel/session model-selection bug fix with limited but real impact when users change defaults and reset a session.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix live terminal output from production initSessionState against a real disk-backed session store, which directly exercises the changed behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live terminal output from production initSessionState against a real disk-backed session store, which directly exercises the changed behavior.
Evidence reviewed

PR surface:

Source +5, Tests +59, Docs +1. Total +65 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 1 5 0 +5
Tests 1 59 0 +59
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 65 0 +65

What I checked:

  • Current main still carries runtime model fields: Current main builds the reset session entry from prior state and clears reset/token/cache fields, but it does not clear modelProvider or model before merging the entry back into the session store. (src/auto-reply/reply/session.ts:781, e7fb8cabb681)
  • PR clears the implicated cache fields: The PR head clears sessionEntry.modelProvider and sessionEntry.model inside the isNewSession reset cleanup before updateSessionStore persists the entry. (src/auto-reply/reply/session.ts:785, e87d2a84430f)
  • Regression coverage matches the reported bug: The new test seeds stale modelProvider, model, and contextTokens, then verifies both /new and /reset rotate the session, clear the runtime model cache in memory and on disk, and preserve unrelated verboseLevel. (src/auto-reply/reply/session.test.ts:2540, e87d2a84430f)
  • Explicit override contract remains separate: Current reset-selection code preserves user-driven model/provider/auth overrides and clears auto-created fallback selections, so this PR does not need a new config or product policy to distinguish explicit user choices from runtime cache fields. (src/config/sessions/reset-preserved-selection.ts:24, e7fb8cabb681)
  • Sibling reset surface already recomputes defaults: Gateway sessions.reset coverage expects reset to use the configured default model instead of stale runtime identity and to clear stale context-token state, which supports aligning auto-reply reset behavior with the same model-selection boundary. (src/gateway/server.sessions.reset-models.test.ts:64, e7fb8cabb681)
  • PR merge shape is clean against current main: A read-only merge-tree check reported merged hunks for the three touched files without conflicts against current main. (e87d2a84430f)

Likely related people:

  • steipete: Current main blame and path history for src/auto-reply/reply/session.ts point to the recent session-entry refactor commit that owns the current reset/session construction shape. (role: recent area contributor; confidence: medium; commits: d5bbf3033c9f; files: src/auto-reply/reply/session.ts, src/auto-reply/reply/session.test.ts)
  • hclsys: The related closed PR and review comment identified the same stale implicit model-cache failure mode and supported clearing stale runtime fields on reset. (role: related root-cause investigator; confidence: medium; commits: aa149597afd6; files: src/gateway/sessions-patch.test.ts, src/gateway/sessions-patch.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@hclsys

This comment was marked as low quality.

@mjamiv

mjamiv commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current upstream/main and resolved the CHANGELOG.md conflict by keeping the current upstream Unreleased notes plus this PR's Auto-reply fix entry.

Validation:

  • pnpm install
  • pnpm test -- src/auto-reply/reply/session.test.ts
  • git diff --check
  • pnpm check:changed -- --base upstream/main --head HEAD

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. labels May 8, 2026
@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from e2c2bba to 8e63520 Compare May 10, 2026 18:28
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@hclsys

This comment was marked as low quality.

@hclsys

This comment was marked as low quality.

@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from 8e63520 to 6a39c91 Compare May 16, 2026 03:50
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@mjamiv

mjamiv commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

Refreshed this branch onto current upstream/main and pushed as 6a39c911e3.

Conflict handling:

  • CHANGELOG.md: kept current upstream Unreleased notes and re-added this PR's Auto-reply reset-cache entry.
  • src/auto-reply/reply/session.test.ts: kept upstream's recovered auto-fallback override coverage and this PR's runtime model-cache regression as separate tests.

Local validation on the rebased head:

  • pnpm install --offline --frozen-lockfile
  • pnpm test -- src/auto-reply/reply/session.test.ts (84 tests passed)
  • ./node_modules/.bin/oxfmt --check --threads=1 src/auto-reply/reply/session.ts src/auto-reply/reply/session.test.ts
  • git diff --check
  • pnpm check:changed -- --base refs/remotes/upstream/main --head HEAD

GitHub now reports head 6a39c911e3, mergeable=true, mergeable_state=clean; all reported checks are pass or skip.

@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from 6a39c91 to 71a0ed6 Compare May 26, 2026 23:17
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 26, 2026
@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from 71a0ed6 to f66e2d7 Compare May 26, 2026 23:18
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 26, 2026
@clawsweeper clawsweeper Bot added status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Gilded Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: sniffs out flaky tests.
Image traits: location green-check meadow; accessory proof snapshot camera; palette charcoal, cyan, and signal green; mood focused; pose holding its accessory up for inspection; shell frosted glass shell; lighting calm overcast light; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Gilded Proofling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from f66e2d7 to 4972490 Compare May 29, 2026 03:37
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@mjamiv mjamiv force-pushed the fix/77322-auto-reply-reset-model-cache branch from 4972490 to e87d2a8 Compare May 29, 2026 03:39
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: session.model cache survives /new and ignores agents.defaults.model.primary changes, scope distinct from PR #69419

3 participants