Skip to content

Agents: add compaction modes (warn, error, none) with proactive conte…#54585

Open
fierai wants to merge 7 commits into
openclaw:mainfrom
diegonzn:feat/compaction-modes
Open

Agents: add compaction modes (warn, error, none) with proactive conte…#54585
fierai wants to merge 7 commits into
openclaw:mainfrom
diegonzn:feat/compaction-modes

Conversation

@fierai

@fierai fierai commented Mar 25, 2026

Copy link
Copy Markdown

Summary

  • Problem: Users often lose important conversation history due to aggressive
    auto-compaction, or they encounter cryptic provider-specific "context
    window exceeded" errors when the window is full.
  • Why it matters: Advanced users need explicit control over context
    management to prevent silent history loss and to receive clear, actionable
    warnings before a session fails.
  • What changed: Added three new compaction modes (warn, error, none) backed
    by a proactive context guard that estimates tokens before each LLM call.
    Integrated Greptile feedback to ensure the guard is robust even if token
    estimation fails.
  • What did NOT change (scope boundary): The existing logic for default and
    safeguard modes remains untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: N/A (New feature)

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test (Verified via pnpm check and oxlint)
  • Target test or file: src/agents/pi-embedded-runner/run/attempt.ts
  • Scenario the test should lock in: Ensuring warn mode returns the cleanup
    message and error mode throws before hitting the LLM.
  • If no new test is added, why not: Manual verification was performed in a
    live environment to ensure the user-visible messages appear correctly in
    the messaging channel.

User-visible / Behavior Changes

  • New configuration options in agents.defaults.compaction.mode: "warn",
    "error", and "none".
  • Users in warn mode will see a message: 🧹 Context near limit, use /compact
    instead of the agent failing or compacting automatically.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS (darwin)
  • Runtime/container: Node.js 22.22.1
  • Model/provider: Verified with OpenAI/Anthropic providers.

Steps

  1. Set agents.defaults.compaction.mode to "warn".
  2. Set a high reserveTokens (e.g., 100,000) to trigger the guard immediately.
  3. Send a message to the agent.

Expected

  • The agent stops and returns: 🧹 Context near limit, use /compact.

Actual

  • The agent correctly identifies the "near limit" state and returns the
    warning without calling the LLM.

Evidence

  • Trace/log snippets: Verified [compaction-guard] context near limit
    (mode=warn) logs in gateway.log.

Human Verification (required)

  • Verified scenarios: warn mode (message returned), error mode (error
    thrown), and none mode (auto-compaction disabled).
  • Edge cases checked: Integrated Greptile feedback to handle estimateTokens
    failures; error mode now re-throws with cause, and warn mode provides a
    fallback warning instead of failing open.
  • What you did not verify: Latency impact on extremely large (1M+ tokens)
    histories, though estimateTokens is generally fast.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in
    this PR.
  • I left unresolved only the conversations that still need reviewer or
    maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes (Defaults to default mode).
  • Config/env changes? Yes (New optional config fields).
  • Migration needed? No.

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Set compaction.mode back to
    "default" or "safeguard".
  • Files/config to restore: ~/.openclaw/openclaw.json.

Risks and Mitigations

  • Risk: Token estimation failure could block the agent.
    • Mitigation: The guard is wrapped in a try/catch block. In warn mode, it
      falls back to a safe warning message; in error mode, it provides a
      descriptive error to the user.

@greptile-apps

greptile-apps Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds three new compaction modes (warn, error, none) to the existing default/safeguard pair, giving users explicit control over what happens when the context window fills up. The changes are consistent across the type definition, Zod schema, generated JSON schema, help text, settings manager, and extension factories.

Key changes:

  • AgentCompactionMode type and all schemas extended to include "warn" | "error" | "none".
  • resolveCompactionMode exported and handles all five modes cleanly.
  • shouldDisablePiAutoCompaction disables Pi's built-in auto-compaction for warn, error, and none modes, preventing the underlying engine from compacting before the guard can fire.
  • A new proactive context guard in attempt.ts checks estimated token usage before each LLM prompt: warn returns a user-visible "near limit" message; error throws; none is left to fail at the model limit.
  • One subtle issue worth addressing: in warn mode, if estimateTokens throws, totalTokens stays 0 and the catch block falls through — the threshold check is never true, so the user never receives the warning. The error path handles this correctly (re-throws), but the warn path silently becomes a no-op. A user expecting a guardrail in warn mode may not realise the guard failed.

Confidence Score: 4/5

  • Safe to merge after addressing the warn-mode silent fail-open when token estimation throws.
  • The feature is coherent and all schema/type/validation layers are updated consistently. The error and none modes look correct. The only concrete issue is that warn mode silently degrades to a no-op when estimateTokens throws (because totalTokens remains 0 after the catch block and the threshold check never fires). This doesn't cause a crash or data loss but does mean users who configured warn for protection won't get the user-visible guardrail in that edge case.
  • src/agents/pi-embedded-runner/run/attempt.ts — specifically the catch block at line 2917 and whether warn mode should also surface a fallback warning when estimation fails.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2917-2925

Comment:
**Warn mode silently becomes a no-op when estimation fails**

When `estimateTokens` throws in `warn` mode, the catch block logs and falls through — leaving `totalTokens` at `0`. Execution then continues to the threshold check outside the try/catch:

```typescript
if (totalTokens > threshold) {  // 0 > threshold → always false
```

Since `threshold` is always `0`, this check can never fire and the user never sees the `"🧹 Context near limit, use /compact"` message. A user who configured `mode=warn` expecting a guardrail gets silent fail-open behavior instead.

In `error` mode the failure is correctly surfaced (re-thrown), but `warn` mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in `warn` mode, rather than silently proceeding.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2904-2916

Comment:
**Double `as unknown as AgentMessage` casts suggest type mismatch**

The triple cast (`msg as unknown as AgentMessage`) applied to every element of `activeSession.messages` and to the synthesised system/user objects indicates the types don't align with what `estimateTokens` expects. If the actual runtime shape differs from `AgentMessage` in ways that matter to the estimator (e.g. array vs. string `content`, missing `role` discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.

It's worth confirming that `estimateTokens` handles the exact shapes produced by `activeSession.messages`, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Agents: add compaction modes (warn, erro..." | Re-trigger Greptile

Comment on lines +2917 to +2925
} catch (err) {
log.warn(`[compaction-guard] token estimation failed: ${String(err)}`);
if (currentCompactionMode === "error") {
throw new Error(
`[compaction-guard] token estimation failed for mode=error. Error: ${String(err)}`,
{ cause: err },
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Warn mode silently becomes a no-op when estimation fails

When estimateTokens throws in warn mode, the catch block logs and falls through — leaving totalTokens at 0. Execution then continues to the threshold check outside the try/catch:

if (totalTokens > threshold) {  // 0 > threshold → always false

Since threshold is always ≥ 0, this check can never fire and the user never sees the "🧹 Context near limit, use /compact" message. A user who configured mode=warn expecting a guardrail gets silent fail-open behavior instead.

In error mode the failure is correctly surfaced (re-thrown), but warn mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in warn mode, rather than silently proceeding.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2917-2925

Comment:
**Warn mode silently becomes a no-op when estimation fails**

When `estimateTokens` throws in `warn` mode, the catch block logs and falls through — leaving `totalTokens` at `0`. Execution then continues to the threshold check outside the try/catch:

```typescript
if (totalTokens > threshold) {  // 0 > threshold → always false
```

Since `threshold` is always `0`, this check can never fire and the user never sees the `"🧹 Context near limit, use /compact"` message. A user who configured `mode=warn` expecting a guardrail gets silent fail-open behavior instead.

In `error` mode the failure is correctly surfaced (re-thrown), but `warn` mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in `warn` mode, rather than silently proceeding.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +2904 to +2916
totalTokens = activeSession.messages.reduce(
(sum, msg) => sum + estimateTokens(msg as unknown as AgentMessage),
0,
);
totalTokens += estimateTokens({
role: "system",
content: systemPromptText,
} as unknown as AgentMessage);
totalTokens += estimateTokens({
role: "user",
content: effectivePrompt,
images: imageResult.images,
} as unknown as AgentMessage);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double as unknown as AgentMessage casts suggest type mismatch

The triple cast (msg as unknown as AgentMessage) applied to every element of activeSession.messages and to the synthesised system/user objects indicates the types don't align with what estimateTokens expects. If the actual runtime shape differs from AgentMessage in ways that matter to the estimator (e.g. array vs. string content, missing role discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.

It's worth confirming that estimateTokens handles the exact shapes produced by activeSession.messages, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2904-2916

Comment:
**Double `as unknown as AgentMessage` casts suggest type mismatch**

The triple cast (`msg as unknown as AgentMessage`) applied to every element of `activeSession.messages` and to the synthesised system/user objects indicates the types don't align with what `estimateTokens` expects. If the actual runtime shape differs from `AgentMessage` in ways that matter to the estimator (e.g. array vs. string `content`, missing `role` discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.

It's worth confirming that `estimateTokens` handles the exact shapes produced by `activeSession.messages`, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74c4a3ab3e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2927 to +2931
const reserveTokens = settingsManager.getCompactionReserveTokens();
const contextWindow = params.model.contextWindow ?? DEFAULT_CONTEXT_TOKENS;
const threshold = Math.max(0, contextWindow - reserveTokens);

if (totalTokens > threshold) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound reserve headroom before warn/error threshold checks

The proactive compaction guard computes threshold as contextWindow - reserveTokens and only clamps the result to zero, but reserveTokens is globally floored to 20,000 by default in the settings layer. For any allowed context window/cap below that floor (for example 16k–19,999 via model metadata or agents.defaults.contextTokens), this makes threshold become 0, so warn/error mode will fire on nearly every non-empty prompt rather than only when context is actually near full. Please clamp the effective reserve against the active context window (or reuse the effective prompt budget) before applying this check.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40ec5af640

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/pi-settings.ts
Comment on lines +109 to +110
const mode = params.cfg?.agents?.defaults?.compaction?.mode;
return mode === "warn" || mode === "error" || mode === "none";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor none mode in overflow recovery

Treating mode === "none" as “disable Pi auto-compaction” here does not disable the separate overflow-recovery compaction loop in src/agents/pi-embedded-runner/run.ts (the branch that calls contextEngine.compact after context-overflow errors). In practice, a session configured with agents.defaults.compaction.mode=none can still auto-compact and retry once it hits model limits, which contradicts the new mode’s documented behavior (“letting it fail at the model limit”) and makes this config mode ineffective.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 25, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2e4f10ebc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2904 to +2906
totalTokens = activeSession.messages.reduce(
(sum, msg) => sum + estimateTokens(msg as unknown as AgentMessage),
0,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude tool-result details from proactive token estimation

The new proactive guard sums estimateTokens over raw activeSession.messages, which can include toolResult.details payloads that are not part of the model-visible transcript and may be very large. That inflates totalTokens (or can trigger estimator failures), causing warn/error modes to block normal turns even when actual prompt context is within budget. The compaction path already strips these details before token accounting (src/agents/compaction.ts), so this guard should apply the same sanitization before checking thresholds.

Useful? React with 👍 / 👎.

Comment on lines +2957 to +2964
return {
aborted: false,
timedOut: false,
timedOutDuringCompaction: false,
promptError: null,
sessionIdUsed: activeSession.sessionId,
messagesSnapshot: activeSession.messages.slice(),
assistantTexts: ["🧹 Context near limit, use /compact"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep llm_output hooks on warn-mode early exits

This early return in the warn guard exits before the shared post-processing path that emits llm_output hooks (later in this function). As a result, guard-generated assistant warnings are sent to users but never surfaced to hook subscribers/telemetry that rely on llm_output, creating inconsistent plugin behavior for any session using compaction.mode=warn. Returning through the normal result path (instead of returning inline here) would preserve hook semantics.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29c1b73ee6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2950 to +2952
const threshold = Math.max(0, contextWindow - reserveTokens);

if (totalTokens > threshold) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply token safety margin before warn/error threshold check

The proactive guard compares raw estimateTokens totals against the reserve threshold, but this estimator is already treated elsewhere in the codebase as an under-approximation that needs headroom. Without a safety margin here, warn/error can be skipped on large prompts (especially code- or multibyte-heavy transcripts), so the turn still reaches provider-side context overflow instead of stopping early as these modes promise.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6aa4730249

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

log.warn(
`[compaction-guard] context near limit (mode=warn): tokens=${totalTokens} limit=${contextWindow} threshold=${threshold} reserve=${reserveTokens}`,
);
return {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route warn-mode guard through shared attempt epilogue

Returning directly from this guard path exits runEmbeddedAttempt before the shared post-turn epilogue runs, so finalizeAttemptContextEngineTurn(...) and agent_end hooks are skipped for every compaction.mode=warn near-limit turn even though a user-visible assistant warning is produced. In deployments that rely on context-engine after-turn maintenance or agent_end telemetry/plugins, this creates inconsistent state and missing side effects; this branch should flow through the normal result path instead of returning inline.

Useful? React with 👍 / 👎.

Comment thread CHANGELOG.md Outdated
@@ -29,6 +31,8 @@ Docs: https://docs.openclaw.ai

### Changes

- Agents: add compaction modes (`warn`, `error`, `none`) with a proactive context guard that stops execution before reaching the model's limit, preventing cryptic provider errors and preserving control during long-running sessions.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep changelog entry only in the active release block

This line was inserted into historical release sections (for example under ## 2026.3.24) and repeated many times across past versions, which rewrites old release notes instead of documenting the change once in the active block. That violates the repository rule in AGENTS.md to append entries only in the active version section, and it can misstate version history for users while creating unnecessary churn/conflicts in future changelog updates.

Useful? React with 👍 / 👎.

@fierai fierai force-pushed the feat/compaction-modes branch from 6aa4730 to 6176332 Compare March 25, 2026 18:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61763323cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2899 to +2902
// Proactive context guard for warn/error modes
const currentCompactionMode = resolveCompactionMode(params.config);
let skipPromptForCompactionGuard = false;
if (currentCompactionMode === "warn" || currentCompactionMode === "error") {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip llm_input hook when guard blocks the model call

The proactive compaction guard is evaluated after runLlmInput(...), so in compaction.mode=warn|error near-limit turns (or warn-mode estimator failures) the code can short-circuit before activeSession.prompt(...) while still emitting an llm_input hook event. That makes hook subscribers observe phantom model requests, which can skew telemetry/billing pipelines or trigger side effects even though no provider call occurred. Gate llm_input emission behind the guard result (or run the guard before the hook) so hook semantics stay aligned with actual LLM invocations.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1a56f720d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2929 to +2933
if (params.onBlockReply) {
void params.onBlockReply({ text: warnMsg });
}
assistantTexts.push(warnMsg);
} else if (currentCompactionMode === "error") {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent warn mode from sending prompts after guard warning

In runEmbeddedAttempt, the warn branch only emits a warning (onBlockReply + assistantTexts.push) but never flips skipPromptForCompactionGuard, so the later prompt path still executes and calls the model. With agents.defaults.compaction.mode="warn", near-limit turns therefore still hit the LLM instead of stopping as documented, and can still overflow since auto-compaction is disabled for this mode.

Useful? React with 👍 / 👎.

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 29, 2026
@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 10, 2026, 1:01 AM ET / 05:01 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +105, Docs +6, Generated +12. Total +123 across 9 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b4cdd9211957.

Label changes

Label changes:

  • remove P2: Current review triage priority is none.
  • remove merge-risk: 🚨 compatibility: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 session-state: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 message-delivery: Current PR review selected no merge-risk labels.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +105, Docs +6, Generated +12. Total +123 across 9 files.

View PR surface stats
Area Files Added Removed Net
Source 6 152 47 +105
Tests 0 0 0 0
Docs 2 6 0 +6
Config 0 0 0 0
Generated 1 13 1 +12
Other 0 0 0 0
Total 9 171 48 +123

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge.

What this changes:

This PR extends agent compaction config/schema/help with warn, error, and none modes, adds a Pi pre-prompt token guard, changes Pi auto-compaction disabling and reserve floor behavior, adds a changelog entry, and adds a bundled ceospace skill placeholder.

Maintainer follow-up before merge:

This is an open implementation PR with useful source work, but the remaining action is maintainer/product review and author rework across runtime semantics, config contract, overflow recovery, hooks, tests, changelog, and unrelated skill scope rather than an autonomous replacement fix.

Review findings:

  • [P1] Stop warn mode before submitting the prompt — src/agents/pi-embedded-runner/run/attempt.ts:2929-2933
  • [P1] Honor none mode in overflow recovery — src/agents/pi-settings.ts:109
  • [P3] Add the required changelog attribution — CHANGELOG.md:10
Review details

Best possible solution:

A mergeable version should first settle the product semantics for each mode, then implement them consistently across exported config types, strict/generated schemas, docs/help, Pi mode resolution, auto-compaction disabling, pre-prompt guard behavior, overflow recovery, hooks/finalization, and focused tests. The compaction work should stay isolated from unrelated bundled skill additions.

Full review comments:

  • [P1] Stop warn mode before submitting the prompt — src/agents/pi-embedded-runner/run/attempt.ts:2929-2933
    The latest warn branch only emits the warning via onBlockReply and assistantTexts.push; it never sets skipPromptForCompactionGuard. Execution therefore reaches the later activeSession.prompt(...) block, so a near-limit warn turn still calls the model even though this mode is documented and described as stopping before provider overflow, with Pi auto-compaction disabled for this mode.
    Confidence: 0.93
  • [P1] Honor none mode in overflow recovery — src/agents/pi-settings.ts:109
    Adding none here disables only Pi's built-in auto-compaction. The separate OpenClaw overflow-recovery loop still calls contextEngine.compact() after model limit errors, so mode: "none" can still auto-compact and retry instead of failing at the model limit as the new help text promises. Thread the resolved mode into overflow recovery and skip that compaction path for none.
    Confidence: 0.89
  • [P3] Add the required changelog attribution — CHANGELOG.md:10
    This user-facing changelog entry omits the required Thanks @... attribution for added bullets. Add the credited GitHub username or remove the entry until the feature is ready, otherwise the repo's changelog hygiene rules are not satisfied.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test src/config/config.compaction-settings.test.ts src/agents/pi-settings.test.ts src/agents/pi-embedded-runner/extensions.test.ts src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
  • pnpm config:docs:gen/check
  • pnpm check:changed in Testbox before handoff if the PR is revised

What I checked:

Likely related people:

  • @steipete: Recent main history and local blame show repeated maintenance of the embedded runner attempt path, compaction-adjacent lifecycle behavior, and current touched lines. (role: recent embedded runner and compaction maintainer; confidence: high; commits: 66cdbccc8a2a, 20e21173715e, 34d11d57579d; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run.ts, src/agents/compaction.ts)
  • @vincentkoc: Recent API history shows work on config leaf type surfaces, and this PR changes the exported compaction config type/schema/help contract. (role: config/type surface maintainer; confidence: medium; commits: 74e7b8d47b18; files: src/config/types.agent-defaults.ts, src/config/zod-schema.agent-defaults.ts, src/config/schema.help.ts)
  • @openperf: Recent merged work capped compaction reserve behavior for small model context windows, directly adjacent to this PR's reserve-threshold and floor changes. (role: compaction reserve and small-context behavior contributor; confidence: medium; commits: 4bc46ccfedc4, 08992e1dbc3e; files: src/agents/pi-settings.ts, src/agents/compaction.ts)
  • @jalehman: The context-engine integration history introduced the owned-compaction and overflow-compaction seams that this PR needs to coordinate with for none mode. (role: context-engine compaction reviewer/adjacent owner; confidence: medium; commits: fee91fefceb4; files: src/agents/pi-embedded-runner/run.ts, src/agents/pi-settings.ts)

Remaining risk / open question:

  • The intended warn semantics are unsettled: the PR body says warn should stop before the LLM, while the latest commit makes warn non-blocking and review comments note prompts still go out after the warning.
  • The PR is currently mergeable=false/dirty against main and has no fresh CI proof in the provided context after the later force-pushed behavior change.
  • The unrelated skills/ceospace/SKILL.md addition creates a new bundled skill surface that needs a separate owner/product review if intentional.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e46dccb35374.

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 30, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 19, 2026
@clawsweeper clawsweeper Bot added P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 9, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 9, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant