fix(agents): reevaluate preflight compaction on fresh totals by neeravmakwana · Pull Request #65622 · openclaw/openclaw

neeravmakwana · 2026-04-13T01:04:09Z

Summary

Problem: preflight compaction returned early whenever a session already had a fresh cached totalTokens value, so later turns never re-checked the threshold after the first successful reply.
Why it matters: safeguard preflight compaction could silently stop triggering on long-running sessions even when the cached context snapshot already showed the session was near the limit.
What changed: removed the fresh-token early return in runPreflightCompactionIfNeeded() and added a regression test that proves fresh cached totals can still trigger preflight compaction.
What did NOT change (scope boundary): memory-flush gating, transcript fallback rules, and session usage persistence semantics.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: totalTokensFresh permanently set to true after first API response — pre-flight compaction skipped on all subsequent turns #65600
Related [Bug]: totalTokensFresh permanently set to true after first API response — pre-flight compaction skipped on all subsequent turns #65600
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: runPreflightCompactionIfNeeded() treated fresh persisted totals as a reason to skip the preflight check entirely, even though SessionEntry.totalTokens is the cached context snapshot that preflight should use for its threshold gate.
Missing detection / guardrail: there was helper-level coverage for shouldRunPreflightCompaction(), but no regression test covering the outer runtime path where fresh cached totals short-circuited before the helper could run.
Contributing context (if known): the stale-token transcript-fallback guard accidentally became a full early return instead of only deciding whether transcript-derived tokens were needed.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/auto-reply/reply/agent-runner-memory.test.ts
Scenario the test should lock in: a session with totalTokensFresh: true and a cached total already above threshold still runs preflight compaction on the next turn.
Why this is the smallest reliable guardrail: the bug lives in the runtime wrapper around the helper, so the regression must exercise runPreflightCompactionIfNeeded() itself rather than only the pure threshold helper.
Existing test that already covers this (if any): src/auto-reply/reply/reply-state.test.ts already covers the helper-level threshold behavior.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Preflight safeguard compaction can trigger again on later turns when a session's fresh cached context total is already over the configured threshold, instead of only working before the first successful reply.

Diagram (if applicable)

Before:
[fresh cached total above threshold] -> [preflight returns early] -> [no compaction]

After:
[fresh cached total above threshold] -> [preflight evaluates threshold] -> [compaction can run]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 15 / darwin 25.3.0
Runtime/container: Node 24 + pnpm workspace
Model/provider: anthropic/claude-opus-4-6 and openai/gpt-5.4 in focused tests
Integration/channel (if any): N/A
Relevant config (redacted): agents.defaults.compaction.mode: safeguard, agentCfgContextTokens: 100000

Steps

Start a session that persists a fresh totalTokens context snapshot after a successful reply.
Let the next turn begin with that cached total already above the preflight threshold.
Observe whether runPreflightCompactionIfNeeded() evaluates the threshold or returns early.

Expected

Fresh cached totals still participate in preflight compaction gating on later turns.

Actual

Before this change, the runtime returned early and skipped preflight compaction whenever totalTokensFresh was already true.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: reproduced the early return directly against the current runtime path with a fresh cached total above threshold; confirmed the focused regression now compacts in that case; ran pnpm test src/auto-reply/reply/agent-runner-memory.test.ts; ran pnpm test src/auto-reply/reply/reply-state.test.ts.
Edge cases checked: preserved the existing transcript-fallback behavior for stale/unknown totals; preserved the helper-level threshold behavior covered by reply-state.test.ts; kept post-compaction token persistence behavior intact in the new runtime test.
What you did not verify: pnpm build is currently failing in an unrelated existing extensions/acpx/src/runtime.ts export-resolution path (Could not resolve 'acpx/runtime'); pnpm check advanced past the initial missing-madge environment issue after pnpm install but then stalled during check:madge-import-cycles; pnpm test progressed through multiple shards and then stalled later in vitest.extension-channels outside this touched area.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Risks and Mitigations

Risk: fresh cached totals could now trigger preflight compaction more often if some callers relied on the old early return.
- Mitigation: the regression test exercises the intended runtime path, and the change only removes a short-circuit so existing helper thresholds and transcript-fallback rules still govern whether compaction actually runs.

Additional Notes

AI-assisted: yes
Testing level: focused local validation
Session log or prompt transcript can be shared if a reviewer wants it.

Made with Cursor

aisle-research-bot · 2026-04-13T01:04:16Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟠 High	Denial of Service via full transcript read/parse during preflight compaction token estimation
2	🟠 High	Arbitrary file read via unvalidated `sessionFile` when estimating tokens from session transcript

1. 🟠 Denial of Service via full transcript read/parse during preflight compaction token estimation

Property	Value
Severity	High
CWE	CWE-400
Location	`src/auto-reply/reply/agent-runner-memory.ts:387-401`

Description

runPreflightCompactionIfNeeded may fall back to estimating prompt tokens by reading and parsing the full session transcript when cached token totals are non-positive/non-finite. The fallback path ultimately calls readSessionMessages(...), which performs a synchronous fs.readFileSync(...).split(...) and JSON.parse over every line.

Impact:

An attacker who can cause very large session transcripts (e.g., by sending large messages) can trigger expensive CPU + memory work on the server.
If totalTokens/fresh totals become 0, NaN, or otherwise unusable, the system repeatedly re-reads and parses the entire transcript to estimate tokens, amplifying cost per request and risking process OOM / event-loop blocking.

Vulnerable code (fallback trigger):

const transcriptPromptTokens = hasUsableFreshPersistedTokens
  ? undefined
  : estimatePromptTokensFromSessionTranscript({
      sessionId: entry.sessionId,
      storePath: params.storePath,
      sessionFile: entry.sessionFile ?? params.followupRun.run.sessionFile,
    });

Data-flow context:

Input: untrusted chat content accumulates in the session transcript file (JSONL)
Condition: cached token totals unusable (<= 0 or non-finite)
Sink: full transcript read/parse + token estimation (readSessionMessages uses fs.readFileSync and parses all JSON lines)

Even though the heavy read/parse occurs in readSessionMessages, the changed logic in runPreflightCompactionIfNeeded makes this fallback path easier to reach (e.g., fresh cached totals of 0 now force transcript estimation).

Recommendation

Avoid full transcript reads/parses on the request path.

Recommended mitigations (combine as appropriate):

Enforce transcript size limits before attempting estimation; if exceeded, skip estimation and force compaction or return an error.
Stream the JSONL file and maintain a running token estimate instead of readFileSync().split().
Prefer tail-based usage snapshots (already present via readLastNonzeroUsageFromSessionLog) rather than re-parsing all messages.
Add rate limiting / debouncing so preflight estimation/compaction cannot be repeatedly triggered for the same session.

Example (size cap + fast-path):

import fs from "node:fs";

function safeEstimateFromTranscript(logPath: string, maxBytes = 5_000_000) {
  const stat = fs.statSync(logPath);
  if (stat.size > maxBytes) return undefined; // or force compaction
  // ...stream parse instead of readFileSync
}

Also ensure cached totalTokens is always normalized to a positive finite integer when persisted, so the fallback condition is rare.

2. 🟠 Arbitrary file read via unvalidated `sessionFile` when estimating tokens from session transcript

Property	Value
Severity	High
CWE	CWE-22
Location	`src/auto-reply/reply/agent-runner-memory.ts:395-401`

Description

runPreflightCompactionIfNeeded may fall back to estimating prompt tokens by reading the session transcript from disk. When freshPersistedTokens is non-positive/non-finite, it calls estimatePromptTokensFromSessionTranscript() with sessionFile taken from session metadata.

Input: entry.sessionFile ?? params.followupRun.run.sessionFile (potentially attacker-influenced session metadata)
Sink: readSessionMessages() reads the resolved transcript path via fs.readFileSync(filePath, "utf-8")
Path validation gap: if storePath is not provided (it is optional), resolveSessionTranscriptCandidates() will accept and resolve an arbitrary path from sessionFile via path.resolve(trimmed) without constraining it to an allowed sessions directory.

This can enable arbitrary file read / information disclosure (e.g., reading /etc/passwd, application secrets, API keys, etc.) when an attacker can control sessionFile and trigger the fallback path.

Vulnerable call site:

const transcriptPromptTokens = hasUsableFreshPersistedTokens
  ? undefined
  : estimatePromptTokensFromSessionTranscript({
      sessionId: entry.sessionId,
      storePath: params.storePath,
      sessionFile: entry.sessionFile ?? params.followupRun.run.sessionFile,
    });

Path resolution issue when storePath is undefined:

// resolveSessionTranscriptCandidates(...)
const trimmed = sessionFile.trim();
if (trimmed) {
  candidates.push(path.resolve(trimmed));
}

Recommendation

Constrain transcript reads to an allowed directory and reject/ignore arbitrary sessionFile paths.

Recommended defense-in-depth:

Always resolve sessionFile through resolveSessionFilePath() (which enforces containment in the sessions directory) and require a base directory (via storePath or agentId).
If storePath is not available, do not fall back to path.resolve(sessionFile); instead, ignore sessionFile and only use safe derived locations.
Consider additionally rejecting absolute paths, .. segments, and symlink escapes when opening transcript files.

Example fix (approach: require storePath to honor sessionFile):

const safeSessionFile = params.storePath
  ? resolveSessionFilePath(entry.sessionId, { sessionFile: entry.sessionFile ?? params.followupRun.run.sessionFile }, { sessionsDir: path.dirname(params.storePath) })
  : undefined;

const transcriptPromptTokens = hasUsableFreshPersistedTokens
  ? undefined
  : estimatePromptTokensFromSessionTranscript({
      sessionId: entry.sessionId,
      storePath: params.storePath,
      sessionFile: safeSessionFile,
    });

And in resolveSessionTranscriptCandidates, remove the unsafe branch:

// Do not accept arbitrary path.resolve(sessionFile) when storePath is missing.

Analyzed PR: #65622 at commit be1d0ce

_{Last updated on: 2026-04-13T01:17:47Z}

greptile-apps · 2026-04-13T01:07:53Z

Greptile Summary

This PR fixes a long-standing silent gap in preflight compaction: the previous code returned early whenever totalTokensFresh was true, so later turns never re-evaluated the threshold against a cached-but-already-over-limit context total. The implementation fix is correct — removing the early return lets shouldRunPreflightCompaction naturally fall back to resolveFreshSessionTotalTokens when tokenCount is undefined.

The new regression test (incrementCompactionCountMock) does not apply tokensAfter when updating the session store, so expect(entry?.totalTokens).toBe(12_000) will fail at runtime; the fix is to add a tokensAfter branch to the mock mirroring session-updates.ts:258–266.

Confidence Score: 4/5

Safe to merge after fixing the mock — the runtime fix is correct but the regression test will fail as written.

The core implementation change is correct and well-reasoned. One P1 issue: the new test's mock doesn't apply tokensAfter to update totalTokens, causing an assertion failure that blocks CI. Fix the mock and the PR is ready.

src/auto-reply/reply/agent-runner-memory.test.ts — the incrementCompactionCountMock needs a tokensAfter branch

Comments Outside Diff (1)

src/auto-reply/reply/agent-runner-memory.test.ts, line 96-116 (link)

Mock missing tokensAfter — totalTokens assertion will fail

The incrementCompactionCountMock spreads previous into nextEntry but never applies tokensAfter, so sessionStore.main.totalTokens stays at 80_000. The assertions on lines 162–163 (expect(entry?.totalTokens).toBe(12_000)) depend on the session store being updated with the post-compaction total, exactly as the real incrementCompactionCount does in session-updates.ts:258–266. The test will fail as-is.

Add the tokensAfter branch to the mock, mirroring the real implementation:

// After the newSessionId block, before params.sessionStore[sessionKey] = nextEntry:
if (params.tokensAfter != null && params.tokensAfter > 0) {
  nextEntry.totalTokens = params.tokensAfter;
  nextEntry.totalTokensFresh = true;
}

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-memory.test.ts
Line: 96-116

Comment:
**Mock missing `tokensAfter` — totalTokens assertion will fail**

The `incrementCompactionCountMock` spreads `previous` into `nextEntry` but never applies `tokensAfter`, so `sessionStore.main.totalTokens` stays at `80_000`. The assertions on lines 162–163 (`expect(entry?.totalTokens).toBe(12_000)`) depend on the session store being updated with the post-compaction total, exactly as the real `incrementCompactionCount` does in `session-updates.ts:258–266`. The test will fail as-is.

Add the `tokensAfter` branch to the mock, mirroring the real implementation:

```typescript
// After the newSessionId block, before params.sessionStore[sessionKey] = nextEntry:
if (params.tokensAfter != null && params.tokensAfter > 0) {
  nextEntry.totalTokens = params.tokensAfter;
  nextEntry.totalTokensFresh = true;
}
```

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-memory.test.ts
Line: 96-116

Comment:
**Mock missing `tokensAfter` — totalTokens assertion will fail**

The `incrementCompactionCountMock` spreads `previous` into `nextEntry` but never applies `tokensAfter`, so `sessionStore.main.totalTokens` stays at `80_000`. The assertions on lines 162–163 (`expect(entry?.totalTokens).toBe(12_000)`) depend on the session store being updated with the post-compaction total, exactly as the real `incrementCompactionCount` does in `session-updates.ts:258–266`. The test will fail as-is.

Add the `tokensAfter` branch to the mock, mirroring the real implementation:

```typescript
// After the newSessionId block, before params.sessionStore[sessionKey] = nextEntry:
if (params.tokensAfter != null && params.tokensAfter > 0) {
  nextEntry.totalTokens = params.tokensAfter;
  nextEntry.totalTokensFresh = true;
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Agents/compaction: reevaluate preflight ..." | Re-trigger Greptile}

neeravmakwana · 2026-04-13T01:15:54Z

Follow-up on the AI review comments:

I addressed the two real edge cases from the Aisle pass.
- Preflight now treats non-positive "fresh" cached totals as unusable for gating, so it falls back to transcript estimation instead of suppressing compaction on a corrupted or degenerate totalTokens: 0 snapshot.
- incrementCompactionCount() now normalizes tokensAfter and ignores non-finite values instead of persisting them into session metadata.
I added focused regression coverage for both cases.
- src/auto-reply/reply/agent-runner-memory.test.ts now covers the totalTokens: 0, totalTokensFresh: true preflight path and proves transcript fallback still compacts.
- src/auto-reply/reply/reply-state.test.ts now covers non-finite tokensAfter and proves cached totals are left unchanged.
I did not change the test mock per the Greptile suggestion, because that part was a false positive. The preflight code path under test calls the real incrementCompactionCount() from session-updates.ts, not the injectable incrementCompactionCountMock used by the memory-flush tests. I re-ran the focused suites after the follow-up commit to confirm that behavior.

Verification rerun after the follow-up changes:

pnpm test src/auto-reply/reply/agent-runner-memory.test.ts
pnpm test src/auto-reply/reply/reply-state.test.ts

Pushed in be1d0ce71d.

prtags · 2026-04-23T12:20:48Z

Related work from PRtags group infinite-dodo-7d3e

Title: Preflight compaction skips fresh token totals

Number	Title
#63892	Title unavailable
#64384	fix(reply): gate preflight compaction fast-path on token threshold (#63892)
#65600	Title unavailable
#65622*	fix(agents): reevaluate preflight compaction on fresh totals
#66520	Title unavailable
#66716	fix: auto-compaction fires on fresh cached token counts (#66520)

* This PR

Agents/compaction: reevaluate preflight on fresh totals

f580071

neeravmakwana mentioned this pull request Apr 13, 2026

[Bug]: totalTokensFresh permanently set to true after first API response — pre-flight compaction skipped on all subsequent turns #65600

Closed

openclaw-barnacle Bot added the size: S label Apr 13, 2026

Agents/compaction: harden preflight token guards

be1d0ce

github-actions Bot mentioned this pull request Apr 13, 2026

🦞 OpenClaw 生态日报 2026-04-13 gsscsd/big_model_radar#178

Open

neeravmakwana closed this Apr 16, 2026

This was referenced Apr 23, 2026

Proactive compaction scheduler never re-fires after first checkpoint (compactionCount latched at 1) #63892

Closed

fix(reply): gate preflight compaction fast-path on token threshold (#63892) #64384

Open

This was referenced Apr 23, 2026

[Bug] Auto-compaction never fires when Anthropic prompt cache hit rate is ~100% #66520

Closed

fix: auto-compaction fires on fresh cached token counts (#66520) #66716

Open

openclaw-clownfish Bot mentioned this pull request Apr 29, 2026

fix(compaction): respect effective reserve tokens in compaction gates #74010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): reevaluate preflight compaction on fresh totals#65622

fix(agents): reevaluate preflight compaction on fresh totals#65622
neeravmakwana wants to merge 2 commits into
openclaw:mainfrom
neeravmakwana:fix/preflight-compaction-fresh-totals

neeravmakwana commented Apr 13, 2026

Uh oh!

aisle-research-bot Bot commented Apr 13, 2026 •

edited

Loading

Description

Recommendation

Description

Recommendation

Uh oh!

greptile-apps Bot commented Apr 13, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

neeravmakwana commented Apr 13, 2026

Uh oh!

prtags Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

neeravmakwana commented Apr 13, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Additional Notes

Uh oh!

aisle-research-bot Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Uh oh!

greptile-apps Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

neeravmakwana commented Apr 13, 2026

Uh oh!

prtags Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aisle-research-bot Bot commented Apr 13, 2026 •

edited

Loading

greptile-apps Bot commented Apr 13, 2026 •

edited

Loading