Skip to content

fix(agents): keep state.messages intact across z.ai-style provider turns in embedded runs#76056

Merged
openperf merged 4 commits intoopenclaw:mainfrom
openperf:fix/75799-zai-silent-overflow-pi-compaction-guard
May 2, 2026
Merged

fix(agents): keep state.messages intact across z.ai-style provider turns in embedded runs#76056
openperf merged 4 commits intoopenclaw:mainfrom
openperf:fix/75799-zai-silent-overflow-pi-compaction-guard

Conversation

@openperf
Copy link
Copy Markdown
Member

@openperf openperf commented May 2, 2026

Summary

  • Problem: For Telegram (and other channel) sessions running against z.ai-style OpenAI-compatible providers (z.ai direct, openrouter z-ai/glm-*), conversational context is lost on consecutive turns. The HTTP request body that reaches the provider contains only the system prompt and the just-submitted user message, even though the runner's prompt.submitted trajectory event captured the full prior transcript with messagesLen matching the number of stored messages. Pronouns lose their referent, and continuation requests can ship a bare tool result without its preceding assistant tool call.
  • Root Cause: pi-coding-agent's Session.prompt() runs _checkCompaction(lastAssistant, false) before it appends the new user message. For z.ai-style providers, pi-ai's isContextOverflow (Case 2 — "Silent overflow z.ai style") returns true whenever the last successful assistant turn has usage.input + usage.cacheRead > contextWindow, even with stopReason: "stop". That triggers Pi's _runAutoCompaction("overflow", true) which reassigns agent.state.messages to the post-compaction view — typically a single compaction summary, sometimes empty. When agent.prompt(messages) then runs the agent loop, the snapshot of state.messages is the now-tiny array, so streamFn ships a near-empty transcript to the provider. The runner's prompt.submitted event already deep-cloned state.messages via safeJsonStringify before _checkCompaction mutated it, so the trajectory still shows the pre-compaction count — making the trajectory look correct while the actual provider call is starved of context. OpenClaw's runner already drives compaction itself (shouldPreemptivelyCompactBeforePrompt and the tool-result-context-guard mid-turn precheck), so Pi's parallel auto-compaction inside Session.prompt() is redundant for the failure cases this fix targets.
  • Fix: Extend applyPiAutoCompactionGuard so setCompactionEnabled(false) also fires when the active model matches z.ai-style silent-overflow accounting. Detection routes through structured channels already present in the codebase — the ProviderEndpointClass typed enum (matching endpointClass === "zai-native" for direct api.z.ai), the existing normalizeProviderId normalizer (matching the zai family for a config-set z.ai provider), and a z-ai/ or openrouter/z-ai/ model-id prefix as a documented string fallback for openrouter routing — any one signal is sufficient. The compaction-LLM call site reads effectiveModel.baseUrl (the post-applyAuthHeaderOverride view that matches every other reference in that scope), so an api.z.ai baseUrl injected via auth profile is still visible to the detector. The existing contextEngineInfo.ownsCompaction === true short-circuit is preserved unchanged. Default-mode runs against non-z.ai providers are not touched — Pi's auto-compaction stays on for them, preserving the existing baseline. To survive DefaultResourceLoader.reload() rehydrating settings from disk and silently re-enabling auto-compaction (the same rehydration the existing applyPiCompactionSettingsFromConfig re-call sites already document), the guard is invoked a second time after the reload in both the embedded runner and the runner-driven compaction LLM session, mirroring the pattern already used for compaction config rehydration.
  • What changed:
    • src/agents/pi-settings.ts: new exported isSilentOverflowProneModel({provider, modelId, baseUrl}) helper using resolveProviderEndpoint's endpointClass, normalizeProviderId, and the z-ai/ / openrouter/z-ai/ model-id prefix. shouldDisablePiAutoCompaction and applyPiAutoCompactionGuard accept a new silentOverflowProneProvider?: boolean in addition to the existing contextEngineInfo.
    • src/agents/pi-embedded-runner/run/attempt.ts: the existing applyPiAutoCompactionGuard call now computes silentOverflowProneProvider from params.model. A second applyPiAutoCompactionGuard call is added immediately after resourceLoader.reload(), alongside the existing post-reload applyPiCompactionSettingsFromConfig re-call.
    • src/agents/pi-embedded-runner/compact.ts: imports applyPiAutoCompactionGuard and isSilentOverflowProneModel; adds a guard call after the local resourceLoader.reload(). This site previously had no guard call at all, leaving the nested compaction LLM session vulnerable to the same _checkCompaction re-entry on z.ai-style runs.
    • src/agents/pi-settings.test.ts: nine new cases appended to the existing colocated test file. Five cases for isSilentOverflowProneModel (z-ai-prefixed model ids in both bare and qualified forms, config-set z.ai/z-ai provider, direct api.z.ai baseUrl, anthropic/openai/google routes negative, missing fields negative) and four cases for applyPiAutoCompactionGuard (silent-overflow-prone disables, ownsCompaction === true still disables, default-mode + non-z.ai is left enabled, missing setCompactionEnabled reports unsupported).
  • What did NOT change (scope boundary):
    • No vendored dependency edits — pi-coding-agent and pi-ai are untouched. Their compaction and overflow-detection logic is correct in isolation; this fix only stops them from firing in parallel with OpenClaw's runner-owned compaction in the cases listed above.
    • No change to default-mode runs against non-z.ai providers. They retain Pi's auto-compaction inside Session.prompt() as before.
    • No change to other documented-unreliable providers (e.g. ollama silent truncation). Reporter's repro is z.ai-specific; extending the guard without an actual repro would be speculation.
    • No change to shouldPreemptivelyCompactBeforePrompt, the tool-result-context-guard mid-turn precheck, context-engine assemble, the trajectory event's snapshot timing, or any provider transport.
    • No change to the CLI /compact entrypoint surface — its nested runner-driven compaction LLM session is protected through the compact.ts site this PR modifies.
    • No any types introduced.
    • No edits to baseline / inventory / snapshot fixtures.

Reproduction

  1. Run OpenClaw with the bundled legacy LegacyContextEngine (default).
  2. Configure an OpenAI-compatible provider that points at a z.ai-style model — for example, provider: openrouter with model: z-ai/glm-5.1 via a custom baseUrl for an in-house model gateway. Anything where the previous successful assistant turn has usage.input + usage.cacheRead > model.contextWindow works (z.ai-family accounting is documented as silent-overflow style in pi-ai's overflow.ts).
  3. Open a Telegram session and send: "first off - there's a folder full of my projects at /root/code - these all live in github. They're my main focus". The agent calls a tool, observes /root/code, and replies.
  4. Send a follow-up: "just note where they are - remember that. They'll be needed in future".
  5. Capture the outbound openai-completions request body at the gateway (correlate with the run id from the OpenClaw trajectory). Without the fix, the body contains only the system message and the just-sent user message, while the trajectory's prompt.submitted event records the full pre-existing transcript. The agent's reply ignores the earlier /root/code context and asks what "they" refers to.
  6. With this fix, applyPiAutoCompactionGuard sets setCompactionEnabled(false) for z.ai-style runs both before and after resourceLoader.reload(), Pi's _checkCompaction short-circuits inside Session.prompt(), state.messages keeps its prior contents, and the model receives the full transcript. Coreference resolves correctly.

Targeted test: pnpm test src/agents/pi-settings.test.ts — 25/25 pass (16 existing + 9 new), including the explicit silent-overflow-prone disable regression case for #75799 and the explicit baseline-preservation case for default-mode + non-z.ai providers.

Risk / Mitigation

  • Risk 1 — Real overflow recovery on z.ai: With Pi's auto-compaction off for z.ai-style runs, a genuine context overflow on the previous turn previously had two recovery paths (Pi's _runAutoCompaction and OpenClaw's shouldPreemptivelyCompactBeforePrompt). After this PR only OpenClaw's path runs for z.ai.
    • Mitigation: OpenClaw's preemptive check already runs unconditionally on every turn and skips submission with a preemptive-overflow signal when compaction is needed; the tool-result-context-guard mid-turn precheck handles overflows mid-tool-loop. The runner's compaction recovery path then drives compaction. Default-mode runs against non-z.ai providers are not changed — they keep Pi's auto-compaction as the secondary recovery.
  • Risk 2 — z.ai detection scope: isSilentOverflowProneModel uses three signals (typed endpointClass enum, normalized provider id, model-id prefix). A future provider rename or an unusual proxy that re-exposes z.ai under a different model-id namespace could miss detection.
    • Mitigation: All three signals are documented in JSDoc with their intended channel; the new tests pin each branch; and the helper is exported so follow-up changes can extend it once a reproducible repro for additional providers exists. Detection is intentionally conservative — false negatives leave the existing buggy behavior, but never invent a regression for unrelated providers. The auth-profile-injected baseUrl case is exercised at the compaction-LLM site by reading the post-override effectiveModel.baseUrl.
  • Risk 3 — Rehydration regression: The post-reload re-call adds two new call sites. If applyPiAutoCompactionGuard ever gains expensive side effects, those sites will execute twice per run.
    • Mitigation: applyPiAutoCompactionGuard is a small decision plus at most one setCompactionEnabled(false) call; that write is idempotent (sets the same field to the same value when already disabled). The two applyPiCompactionSettingsFromConfig calls already follow the same "before + after reload" pattern at the same files.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Agents (pi-embedded-runner, pi-settings)
  • Tests

Linked Issue/PR

Fixes #75799

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

Codex review: needs changes before merge.

Summary
The PR adds z.ai/GLM silent-overflow detection to the Pi auto-compaction guard, reapplies that guard after embedded-runner and compaction-session reloads, and extends focused pi-settings tests and mocks.

Reproducibility: yes. The linked bug gives a concrete two-turn Telegram/OpenAI-compatible gateway reproduction, and Pi 0.71.1 source confirms the pre-prompt compaction path can replace state.messages before the provider call.

Next step before merge
Only a narrow changelog omission blocks the otherwise focused runtime/test fix, so an automated repair can add the release note without product or security judgment.

Security
Cleared: Cleared: the diff changes agents runtime guard logic and tests only, with no dependency, workflow, permission, package metadata, secret-handling, or artifact-execution changes.

Review findings

  • [P2] Add the required changelog entry — src/agents/pi-settings.ts:143
Review details

Best possible solution:

Land the z.ai/GLM auto-compaction guard and post-reload reapply pattern with the focused tests after adding one active changelog Fixes bullet.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked bug gives a concrete two-turn Telegram/OpenAI-compatible gateway reproduction, and Pi 0.71.1 source confirms the pre-prompt compaction path can replace state.messages before the provider call.

Is this the best way to solve the issue?

No in its current form. The runtime approach is the narrow maintainable fix for the confirmed z.ai/GLM silent-overflow path, but the required changelog entry is still missing.

Full review comments:

  • [P2] Add the required changelog entry — src/agents/pi-settings.ts:143
    This is a user-facing fix for channel sessions losing prior context with z.ai/GLM-style providers, but the PR still does not update CHANGELOG.md. Add a single-line bullet under the active Unreleased Fixes section so the regression fix is included in release notes.
    Confidence: 0.96

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md
  • pnpm test src/agents/pi-settings.test.ts
  • pnpm check:changed

What I checked:

  • Protected PR state: GitHub PR metadata shows this PR is open, authored by a MEMBER, and labeled maintainer, so this cleanup pass should not close it. (0a52294843c4)
  • Current main lacks the z.ai guard: Current main only disables Pi auto-compaction when the context engine owns compaction; it has no silent-overflow-prone provider detector. (src/agents/pi-settings.ts:125, 1d5c77c4439d)
  • PR adds the runtime guard: The PR head adds isSilentOverflowProneModel and threads silentOverflowProneProvider into applyPiAutoCompactionGuard, covering z.ai provider IDs, zai-native endpoints, z-ai model prefixes, and bare glm-* IDs. (src/agents/pi-settings.ts:143, 0a52294843c4)
  • PR reapplies after reload: The PR reapplies the auto-compaction guard after DefaultResourceLoader.reload() in the embedded runner and compaction LLM path, matching the existing settings reapply pattern. (src/agents/pi-embedded-runner/run/attempt.ts:1498, 0a52294843c4)
  • Dependency behavior supports the fix: pi-coding-agent 0.71.1 calls _checkCompaction before appending the new user message and can later reassign agent.state.messages during auto-compaction; pi-ai 0.71.1 treats stop responses with usage.input + usage.cacheRead greater than contextWindow as silent overflow.
  • OpenClaw precheck path exists: Current main already has a runner-owned preemptive compaction decision before prompt submission, plus a mid-turn tool-result context guard path. (src/agents/pi-embedded-runner/run/preemptive-compaction.ts:41, 1d5c77c4439d)

Likely related people:

  • steipete: Recent current-main history touches the central Pi settings, embedded runner, and compaction-session paths involved in this PR. (role: recent maintainer; confidence: high; commits: 06fe78e4c4f5, 4407c317f38e, f7fe6ad55eb0; files: src/agents/pi-settings.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/compact.ts)
  • jalehman: The context-engine compaction ownership seam that this guard extends appears tied to the custom context management work and follow-up compaction-model fixes. (role: introduced adjacent behavior; confidence: high; commits: fee91fefceb4, c09884614864; files: src/context-engine, src/agents/pi-settings.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • openperf: Prior merged history from this author touches pi-settings compaction reserve behavior, so they are relevant beyond being the current PR proposer. (role: adjacent prior contributor; confidence: medium; commits: 4bc46ccfedc4; files: src/agents/pi-settings.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d5c77c4439d.

@openperf openperf force-pushed the fix/75799-zai-silent-overflow-pi-compaction-guard branch 3 times, most recently from 8ce3cad to 0a52294 Compare May 2, 2026 14:18
Copy link
Copy Markdown
Contributor

@martingarramon martingarramon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid root-cause trace + the applyPiAutoCompactionGuard extension reads cleanly. Tests at pi-settings.test.ts lock the deliberate broad-match on bare glm- (provider: "openai" / "openrouter" / "custom" all flag TRUE) so intent is unambiguous.

Two non-blocking questions:

  1. Asymmetric call count. compact.ts:977 calls applyPiAutoCompactionGuard once (post-reload), but attempt.ts calls it twice (:1473 pre-reload, :1509 post-reload). Intentional because the compaction LLM session can't trigger Pi's _checkCompaction in the pre-reload window? If so, a one-line comment at compact.ts:976 would lock the asymmetry.

  2. Auth-profile drift on compact.ts:984. baseUrl: effectiveModel.baseUrl reflects auth-profile-injected URLs (per the doc comment), but provider / modelId are bare scope vars. If auth profiles can rewrite those labels too, the silent-overflow class slips past — sanity-check?

Both nits, no blockers.

@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 2, 2026

@martingarramon

Thanks for the careful read — both worth being explicit about.

Asymmetry. In both files the Pi Session is built after reload() (createAgentSession in compact.ts; createEmbeddedAgentSessionWithResourceLoader wrapping it in attempt.ts), so _checkCompaction can't fire in the pre-reload window — the post-reload guard is the load-bearing one.

attempt.ts already had a pre-reload guard call; we extended its args with silentOverflowProneProvider to keep the diff small, and added the new post-reload call. compact.ts had no guard before, so we added only the post-reload one. The asymmetry is a "minimum-touch over symmetry" choice — explicit here in the thread so it's not opaque to future readers.

Auth-profile drift. Fair concern — sanity-checked end-to-end, and the signals hold.

The auth-profile path can inject baseUrl (via prepareProviderRuntimeAuth), but it does not rewrite provider or modelId — those come from resolveEmbeddedCompactionTarget, which reads config + params only. applyAuthHeaderOverride / applyLocalNoAuthHeaderOverride likewise only touch headers.

So effectiveModel.baseUrl is the post-override URL, and the endpointClass === "zai-native" arm catches the case where someone declares a generic provider/modelId but their auth profile points at z.ai. The three signals (zai endpointClass, zai provider id, glm- / z-ai/* modelId prefix) are complementary by design — a z.ai-style flow has to surface through at least one. If the auth-profile contract ever expanded to rewrite labels too, that'd be worth revisiting; today the URL-based arm covers it.

@openperf openperf force-pushed the fix/75799-zai-silent-overflow-pi-compaction-guard branch from b791baf to ef305bb Compare May 2, 2026 16:29
@openperf openperf merged commit cc8a8f1 into openclaw:main May 2, 2026
100 checks passed
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 2, 2026

Merged via squash.

Thanks @openperf!

@openperf openperf deleted the fix/75799-zai-silent-overflow-pi-compaction-guard branch May 2, 2026 16:32
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 2, 2026

Merged as cc8a8f1. Changelog entry added under ## Unreleased / ### Fixes before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OpenAI-compatible provider request is sent with only the current message

2 participants