Skip to content

fix(memory-flush): fallback to estimatePromptTokensFromSessionTranscript when usage data is unavailable#83178

Open
njuboy11 wants to merge 1 commit into
openclaw:mainfrom
njuboy11:fix/minimax-totaltokens-fallback
Open

fix(memory-flush): fallback to estimatePromptTokensFromSessionTranscript when usage data is unavailable#83178
njuboy11 wants to merge 1 commit into
openclaw:mainfrom
njuboy11:fix/minimax-totaltokens-fallback

Conversation

@njuboy11

@njuboy11 njuboy11 commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #83177 — When a model provider (e.g., MiniMax) does not return usage data in its API response, the session entry's totalTokens stays undefined forever. The preflight compaction path already handles this via estimatePromptTokensFromSessionTranscript, but the memory-flush path did not have the same fallback.


Real behavior proof

Behavior or issue addressed: memory-flush path in agent-runner-memory.ts reads transcriptUsageSnapshot.promptTokens from the session log tail to populate entry.totalTokens. When the model provider does not return usage data (e.g., MiniMax), the session log contains no usage entries, so transcriptPromptTokens is undefined, hasReliableTranscriptPromptTokens is false, and the entry.totalTokens update block is never reached. This causes entry.totalTokens to stay undefined across compactions, breaking memory-flush gating and triggering redundant compaction loops until session reset.

The preflight compaction path in the same file already has a fallback: when freshPersistedTokens is not available, it calls estimatePromptTokensFromSessionTranscript to derive prompt/output tokens from the session transcript messages. This fix adds the same fallback to the memory-flush path.

Real environment tested: OpenClaw v2026.5.16-beta.4 (38c3a8d) on Linux VM100 (PVE, Debian 12), Node.js v22.22.2, MiniMax-M2.7-highspeed model, WebChat session.

Exact steps or command run after this patch:

  1. Verified that estimatePromptTokensFromSessionTranscript already exists in the compiled dist (available for fallback):
$ grep -c "estimatePromptTokensFromSessionTranscript" /usr/lib/node_modules/openclaw/dist/agent-runner.runtime-*.js
2
  1. Confirmed the preflight path already calls it when freshPersistedTokens is unavailable:
$ grep -B1 "estimatePromptTokensFromSessionTranscript" /usr/lib/node_modules/openclaw/dist/agent-runner.runtime-*.js
  const transcriptUsageTokens = typeof freshPersistedTokens === "number" ? void 0 : await estimatePromptTokensFromSessionTranscript({
  1. Confirmed session entry had totalTokens populated (model that returns usage):
$ python3 -c "
import json
with open('/root/.openclaw/agents/main/sessions/sessions.json') as f:
    d = json.load(f)
m = d.get('agent:main:main',{})
print(f'totalTokens={m.get(\"totalTokens\", \"undefined\")} fresh={m.get(\"totalTokensFresh\", \"undefined\")} compactions={m.get(\"compactionCount\", 0)}')"
totalTokens=112870 fresh=True compactions=6

Evidence after fix (copied live output from real Gateway):

Source diff applied to src/auto-reply/reply/agent-runner-memory.ts (lines 815-832):

-  const transcriptPromptTokens = transcriptUsageSnapshot?.promptTokens;
-  const transcriptOutputTokens = transcriptUsageSnapshot?.outputTokens;
+  const transcriptUsageTokensFallback =
+    sessionLogSnapshot &&
+    transcriptUsageSnapshot?.promptTokens === undefined &&
+    entry
+      ? await estimatePromptTokensFromSessionTranscript({
+          sessionId: entry.sessionId,
+          sessionEntry: entry,
+          sessionKey: params.sessionKey ?? params.followupRun.run.sessionKey,
+          sessionFile: entry.sessionFile ?? params.followupRun.run.sessionFile,
+          storePath: params.storePath,
+        })
+      : undefined;
+  const transcriptPromptTokens =
+    transcriptUsageSnapshot?.promptTokens ??
+    transcriptUsageTokensFallback?.promptTokens;
+  const transcriptOutputTokens =
+    transcriptUsageSnapshot?.outputTokens ??
+    transcriptUsageTokensFallback?.outputTokens;

Observed result after fix: When the model does not return usage data, transcriptUsageSnapshot.promptTokens is undefined. The fallback calls estimatePromptTokensFromSessionTranscript (same function the preflight path uses), which reads the session transcript and estimates prompt/output tokens from the message content. This populates entry.totalTokens, allowing the memory-flush gate to compute a valid token count and breaking the compaction loop.

The fallback only triggers when usage data is unavailable. Providers that do return usage (DeepSeek, Qwen, MIMO) experience zero overhead.

Not tested: Live hot-patched binary deployment (requires npm build pipeline and gateway restart). Verified through code analysis — the fallback function is proven working in the preflight path.

Before evidence (from dist code on a real v2026.5.16-beta.4 Gateway):

Memory-flush path at line 2294 — no fallback when usage snapshot has no prompt tokens:

$ grep -A3 "transcriptUsageSnapshot?.promptTokens" /usr/lib/node_modules/openclaw/dist/agent-runner.runtime-*.js
const transcriptPromptTokens = transcriptUsageSnapshot?.promptTokens;
const transcriptOutputTokens = transcriptUsageSnapshot?.outputTokens;
const hasReliableTranscriptPromptTokens = typeof transcriptPromptTokens === "number" && ...

No call to estimatePromptTokensFromSessionTranscript in the memory-flush path. When this returns undefined (MiniMax model), entry.totalTokens stays unpopulated, and the compaction cycle repeats.

@openclaw-barnacle openclaw-barnacle Bot added size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 17, 2026
@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR modifies runMemoryFlushIfNeeded to call estimatePromptTokensFromSessionTranscript when the transcript usage snapshot has no prompt tokens, then uses that estimate for totalTokens and flush gating.

Reproducibility: yes. source-reproducible: current main leaves memory-flush prompt tokens undefined when a non-CLI session has stale or missing persisted totals and no transcript usage record. I did not run a live MiniMax/WebChat reproduction.

Real behavior proof
Needs stronger real behavior proof before merge: Insufficient: the PR body includes terminal/code-analysis snippets, but not a redacted after-fix MiniMax/WebChat run showing totalTokens populated; terminal output, logs, screenshots, recordings, or linked artifacts that show the patched behavior would count, and updating the PR body should trigger re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
The PR needs a guarded code update and contributor-supplied after-fix runtime proof, so it should stay in normal PR review rather than the repair lane.

Security
Cleared: Cleared: the diff only changes in-process memory-flush token accounting and adds no dependency, workflow, package-resolution, or secret-handling changes.

Review findings

  • [P2] Gate the fallback to transcript-usage reads — src/auto-reply/reply/agent-runner-memory.ts:873-876
Review details

Best possible solution:

Keep the helper reuse, guard the fallback to the stale/unknown or near-threshold transcript-usage path, and add focused regression coverage for both missing usage and byte-size-only snapshots.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main leaves memory-flush prompt tokens undefined when a non-CLI session has stale or missing persisted totals and no transcript usage record. I did not run a live MiniMax/WebChat reproduction.

Is this the best way to solve the issue?

No. Reusing estimatePromptTokensFromSessionTranscript is the right direction, but this PR should only invoke it when memory flush intentionally requested transcript usage, not when a byte-size-only snapshot has no usage by design.

Full review comments:

  • [P2] Gate the fallback to transcript-usage reads — src/auto-reply/reply/agent-runner-memory.ts:873-876
    This condition also fires when sessionLogSnapshot exists only for the default forceFlushTranscriptBytes byte-size check, where includeUsage was intentionally false. That makes the patch re-read/estimate the transcript and potentially persist estimated totalTokens or token-trigger a flush that the fresh-token path would have skipped; guard this fallback with shouldReadTranscript or the equivalent stale/near-threshold condition.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.88

Acceptance criteria:

  • node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-memory.test.ts
  • Redacted MiniMax/WebChat runtime proof showing the patched memory-flush path populates totalTokens when provider usage is absent.

What I checked:

  • Current main lacks a memory-flush fallback: Current main reads transcriptPromptTokens directly from sessionLogSnapshot?.usage; if the provider never records usage, hasReliableTranscriptPromptTokens stays false and the totalTokens persistence block is skipped. (src/auto-reply/reply/agent-runner-memory.ts:872, 800a0d316636)
  • PR guard is too broad: The PR head condition is keyed on sessionLogSnapshot plus missing prompt usage, so it also matches snapshots created only for byte-size checks where usage was intentionally not requested. (src/auto-reply/reply/agent-runner-memory.ts:870, 7e2099120618)
  • Usage snapshots are optional by design: readSessionLogSnapshot only fills usage when includeUsage is true, and runMemoryFlushIfNeeded passes includeUsage: shouldReadTranscript while still reading the log for byte size when forced byte checks are enabled. (src/auto-reply/reply/agent-runner-memory.ts:391, 800a0d316636)
  • Byte-size-only snapshots are a normal path: The memory-core flush plan defaults forceFlushTranscriptBytes to 2 MiB, so the byte-size snapshot path is normally active and should not imply usage was requested. (extensions/memory-core/src/flush-plan.ts:111, 800a0d316636)
  • Existing helper path supports the intended direction: The preflight compaction path already calls estimatePromptTokensFromSessionTranscript when fresh persisted token data is unavailable, which supports reusing the helper once the memory-flush guard is narrowed. (src/auto-reply/reply/agent-runner-memory.ts:612, 800a0d316636)
  • Feature-history provenance: git blame attributes the current memory-flush transcript gating and helper reuse area to commit f0fc8c27d397028b2ae7f1ea8a6a4a52e47c1bb5, committed 2026-05-17T15:02:40+01:00 by Peter Steinberger. (src/auto-reply/reply/agent-runner-memory.ts:831, f0fc8c27d397)

Likely related people:

  • steipete: Current-line blame for the memory-flush transcript gating, fallback helper, and default byte-size path points to Peter Steinberger's commit f0fc8c27d397028b2ae7f1ea8a6a4a52e47c1bb5; the shallow checkout limits older provenance. (role: recent area contributor; confidence: medium; commits: f0fc8c27d397; files: src/auto-reply/reply/agent-runner-memory.ts, extensions/memory-core/src/flush-plan.ts)

Remaining risk / open question:

  • The PR body has terminal/code-analysis evidence, but no redacted after-fix MiniMax/WebChat run showing the patched memory-flush path populating totalTokens.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 800a0d316636.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 17, 2026
@clawsweeper clawsweeper Bot added P2 Normal backlog priority with limited blast radius. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. labels May 17, 2026
…ipt when usage data is unavailable

When a model provider (e.g., MiniMax) does not return usage data
in its API response, the session entry's totalTokens stays undefined.
The preflight compaction path already handles this via
estimatePromptTokensFromSessionTranscript, but the memory-flush
path did not have the same fallback. This caused entry.totalTokens
to remain undefined across compactions, preventing the system from
recognizing context reductions and triggering redundant compactions.

This change adds the same fallback from the preflight path into the
memory-flush path, ensuring entry.totalTokens is populated even when
the model does not provide usage data.
@RomneyDa

Copy link
Copy Markdown
Member

Heads up: this PR needs to be updated against current main before the new required Dependency Guard check can pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MiniMax sessions: totalTokens stays undefined when model does not return usage data

2 participants