fix: preserve persona and language continuity in compaction summaries by keepitmello · Pull Request #10456 · openclaw/openclaw

keepitmello · 2026-02-06T14:39:20Z

Background

I run a Korean-language persona agent on OpenClaw (custom SOUL.md + IDENTITY.md setup). After long conversations, when auto-compaction kicks in, the agent suddenly starts responding in English for a few turns before recovering. The English narration text also leaks into Telegram messages because of blockStreamingBreak: "text_end".

After digging into it, I found the root cause: the SDK's autoCompact() in agent-session.js hardcodes customInstructions: undefined when emitting session_before_compact. The summarization prompt and system prompt are both English-only, so the summary always comes out in English. Since the summary gets injected as a user message (via COMPACTION_SUMMARY_PREFIX), the large block of English text right before the model's next response biases it toward English output.

The system prompt (with SOUL.md etc.) is correctly re-injected every run, so the persona eventually recovers — but for the first few turns after compaction, the agent is broken.

Approach

The customInstructions parameter already exists in the SDK's generateSummary() pipeline — it's just never populated during auto-compaction. Since we can't change the SDK directly, this PR works within the safeguard extension layer:

Config field — adds compaction.customInstructions to the agent config schema, so users can provide explicit instructions if needed.
Default instructions — when no config is set, a DEFAULT_COMPACTION_INSTRUCTIONS constant is injected that tells the summarizer to:
- Write the summary body in the conversation's language
- Focus on factual content (what was discussed, decisions made, current state)
- Keep the SDK's required section headers unchanged
- Not translate code, paths, or error messages
Precedence chain — event (SDK) → config (runtime) → default constant, with normalization (trim, empty-string-to-undefined) to prevent blank values from short-circuiting the chain.
All three summarization paths covered — dropped messages, history, and split-turn prefixes all go through the same resolver. The split-turn path composes the existing TURN_PREFIX_INSTRUCTIONS with the resolved instructions.

Changes

File	What
`compaction-instructions.ts`	New — DEFAULT constant, `resolveCompactionInstructions()`, `composeSplitTurnInstructions()`, Unicode-safe truncation (800 char cap)
`compaction-instructions.test.ts`	New — 35 tests covering precedence, normalization edge cases, surrogate pair safety, composition
`zod-schema.agent-defaults.ts`	Add `customInstructions` to compaction schema
`types.agent-defaults.ts`	Add `customInstructions` to `AgentCompactionConfig`
`compaction-safeguard-runtime.ts`	Add `customInstructions` to `CompactionSafeguardRuntimeValue`
`extensions.ts`	Pass config value to runtime via `setCompactionSafeguardRuntime()`
`compaction-safeguard.ts`	Use `resolveCompactionInstructions()` across all three paths

Notes

Only affects safeguard mode — default mode is untouched.
This is an intentional behavior change for safeguard users: summaries will now include language preservation instructions by default. In my testing this significantly improved post-compaction continuity without affecting summary quality.
The default instructions deliberately avoid persona-specific directives (e.g. "preserve character cues") to prevent the summarizer from injecting persona descriptions into the summary — persona context belongs in the system prompt, not the compaction summary.
The 800-char cap on custom instructions prevents prompt bloat in the multi-stage summarization pipeline (~200 tokens).
Truncation uses Array.from() to avoid splitting surrogate pairs (emoji, CJK supplementary characters, etc).

Test plan

tsc --noEmit passes
All 35 new unit tests pass (precedence, empty strings, whitespace, Unicode truncation, composition)
Existing compaction-safeguard tests (17) and config tests (2) still pass
Verified in a live Korean persona session — post-compaction summary now preserves Korean, agent stays in character

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T14:41:32Z

+function truncateUnicodeSafe(s: string, maxCodePoints: number): string {
+  if (s.length <= maxCodePoints) return s;
+  const chars = Array.from(s);
+  return chars.length <= maxCodePoints ? s : chars.slice(0, maxCodePoints).join("");


Incorrect truncation condition
truncateUnicodeSafe compares s.length (UTF-16 code units) to maxCodePoints. This will skip truncation for strings where s.length <= 800 but the number of Unicode code points is > 800 (e.g., many combining-mark sequences), allowing instructions to exceed the intended 800 code point cap. Use Array.from(s).length (or a shared code-point count) for the early-return check so the limit is actually enforced for all Unicode inputs.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/pi-extensions/compaction-instructions.ts Line: 25:28 Comment: **Incorrect truncation condition** `truncateUnicodeSafe` compares `s.length` (UTF-16 code units) to `maxCodePoints`. This will skip truncation for strings where `s.length <= 800` but the number of Unicode code points is > 800 (e.g., many combining-mark sequences), allowing instructions to exceed the intended 800 *code point* cap. Use `Array.from(s).length` (or a shared code-point count) for the early-return check so the limit is actually enforced for all Unicode inputs. How can I resolve this? If you propose a fix, please make it concise.

Already fixed in dc37ce4 — the early-return now uses Array.from(s).length (line 27), which counts code points instead of UTF-16 code units.

greptile-apps · 2026-02-06T14:41:33Z

+    it("does not split surrogate pair when cut lands inside a pair", () => {
+      const input = "X" + "\u{1F600}".repeat(800);
+      const result = resolveCompactionInstructions(input, undefined);
+      const codePoints = Array.from(result);
+      expect(codePoints).toHaveLength(800);
+      expect(codePoints[0]).toBe("X");
+      const lastCodeUnit = result.charCodeAt(result.length - 1);
+      const isLowSurrogate = lastCodeUnit >= 0xdc00 && lastCodeUnit <= 0xdfff;
+      expect(isLowSurrogate).toBe(true);
+    });


Surrogate-pair test is wrong
This test asserts the last code unit is a low surrogate (0xDC00–0xDFFF) after truncation, but if truncation is “surrogate-safe” the string should never end with an unmatched surrogate at all. As written, it would pass for an output that ends with a dangling low surrogate (which is a broken string) and fail for a correctly-truncated output that ends on a complete code point. Please adjust the assertion to validate you don’t end with a lone high/low surrogate (and/or that the result round-trips via Array.from without replacement chars).

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/pi-extensions/compaction-instructions.test.ts Line: 151:160 Comment: **Surrogate-pair test is wrong** This test asserts the last code unit is a *low surrogate* (`0xDC00–0xDFFF`) after truncation, but if truncation is “surrogate-safe” the string should never end with an unmatched surrogate at all. As written, it would pass for an output that ends with a dangling low surrogate (which is a broken string) and fail for a correctly-truncated output that ends on a complete code point. Please adjust the assertion to validate you *don’t* end with a lone high/low surrogate (and/or that the result round-trips via `Array.from` without replacement chars). How can I resolve this? If you propose a fix, please make it concise.

Already fixed in dc37ce4 — the test now asserts no lone surrogates exist (lines 158–161). Each code point is checked against the 0xD800–0xDFFF range, so a dangling surrogate would fail the test.

keepitmello · 2026-02-07T15:50:36Z

All Node.js checks are passing (lint, test, build, format, protocol on both Linux and Windows).

The two remaining failures are unrelated to this PR:

macos-app (lint): Swift lint failure — this PR only modifies TypeScript files
checks-windows (node, test): Runner infrastructure issue (step ended without conclusion, likely timeout)

These same failures appear across other PRs in the repo as well.

openclaw-barnacle · 2026-02-21T04:32:03Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

SDK auto-compaction hardcodes customInstructions to undefined, causing summaries to always be generated in English. This breaks persona and language continuity for non-English agents after context compaction. Add a DEFAULT_COMPACTION_INSTRUCTIONS constant that instructs the summarizer to preserve the conversation language and persona cues. Wire a config → runtime → safeguard fallback chain so users can override via compaction.customInstructions in the config. Changes: - New compaction-instructions.ts with resolve/compose utilities - Config schema + types: add optional customInstructions field - Runtime type: add customInstructions to WeakMap registry - Extension builder: pass config value to runtime - Safeguard extension: use precedence chain (event → config → default) across all three summarization paths (dropped/history/split-turn) - 35 unit tests covering precedence, normalization, Unicode-safe truncation, and split-turn composition

Replace "Preserve persona, character, and speaking-style cues" with "Focus on factual content: what was discussed, decisions made, and current state" to prevent the summarizer from injecting persona descriptions into the summary (wasting tokens and potentially conflicting with system prompt persona).

darfaz · 2026-03-13T14:14:58Z

Running a Russian-language persona via SOUL.md and hit this exact problem — post-compaction the agent switches to English for several turns, narration text leaks into Telegram, and the persona tone goes flat until enough context rebuilds.

The current compaction prompt has no language or persona anchoring at all, so the summarizer defaults to English regardless of what the actual conversation language is. This PR's approach (inheriting language + persona from the active SOUL.md) is the right fix.

Bumping this — would be a shame to lose it to stale-bot.

@keepitmello

… (thanks @keepitmello)

jalehman · 2026-03-13T14:40:39Z

Merged via squash.

Prepared head SHA: 4518fb20e1037f87493e3668621cb1a45ab8233e
Merge commit: 72b6a11a832b73c9f68db09726e291bbc358fe71

Thanks @keepitmello!

@jalehman