Skip to content

fix: preserve persona and language continuity in compaction summaries#10456

Merged
jalehman merged 7 commits intoopenclaw:mainfrom
keepitmello:fix/compaction-persona-language-preservation
Mar 13, 2026
Merged

fix: preserve persona and language continuity in compaction summaries#10456
jalehman merged 7 commits intoopenclaw:mainfrom
keepitmello:fix/compaction-persona-language-preservation

Conversation

@keepitmello
Copy link
Copy Markdown
Contributor

@keepitmello keepitmello commented Feb 6, 2026

Background

I run a Korean-language persona agent on OpenClaw (custom SOUL.md + IDENTITY.md setup). After long conversations, when auto-compaction kicks in, the agent suddenly starts responding in English for a few turns before recovering. The English narration text also leaks into Telegram messages because of blockStreamingBreak: "text_end".

After digging into it, I found the root cause: the SDK's autoCompact() in agent-session.js hardcodes customInstructions: undefined when emitting session_before_compact. The summarization prompt and system prompt are both English-only, so the summary always comes out in English. Since the summary gets injected as a user message (via COMPACTION_SUMMARY_PREFIX), the large block of English text right before the model's next response biases it toward English output.

The system prompt (with SOUL.md etc.) is correctly re-injected every run, so the persona eventually recovers — but for the first few turns after compaction, the agent is broken.

Approach

The customInstructions parameter already exists in the SDK's generateSummary() pipeline — it's just never populated during auto-compaction. Since we can't change the SDK directly, this PR works within the safeguard extension layer:

  1. Config field — adds compaction.customInstructions to the agent config schema, so users can provide explicit instructions if needed.

  2. Default instructions — when no config is set, a DEFAULT_COMPACTION_INSTRUCTIONS constant is injected that tells the summarizer to:

    • Write the summary body in the conversation's language
    • Focus on factual content (what was discussed, decisions made, current state)
    • Keep the SDK's required section headers unchanged
    • Not translate code, paths, or error messages
  3. Precedence chainevent (SDK) → config (runtime) → default constant, with normalization (trim, empty-string-to-undefined) to prevent blank values from short-circuiting the chain.

  4. All three summarization paths covered — dropped messages, history, and split-turn prefixes all go through the same resolver. The split-turn path composes the existing TURN_PREFIX_INSTRUCTIONS with the resolved instructions.

Changes

File What
compaction-instructions.ts New — DEFAULT constant, resolveCompactionInstructions(), composeSplitTurnInstructions(), Unicode-safe truncation (800 char cap)
compaction-instructions.test.ts New — 35 tests covering precedence, normalization edge cases, surrogate pair safety, composition
zod-schema.agent-defaults.ts Add customInstructions to compaction schema
types.agent-defaults.ts Add customInstructions to AgentCompactionConfig
compaction-safeguard-runtime.ts Add customInstructions to CompactionSafeguardRuntimeValue
extensions.ts Pass config value to runtime via setCompactionSafeguardRuntime()
compaction-safeguard.ts Use resolveCompactionInstructions() across all three paths

Notes

  • Only affects safeguard mode — default mode is untouched.
  • This is an intentional behavior change for safeguard users: summaries will now include language preservation instructions by default. In my testing this significantly improved post-compaction continuity without affecting summary quality.
  • The default instructions deliberately avoid persona-specific directives (e.g. "preserve character cues") to prevent the summarizer from injecting persona descriptions into the summary — persona context belongs in the system prompt, not the compaction summary.
  • The 800-char cap on custom instructions prevents prompt bloat in the multi-stage summarization pipeline (~200 tokens).
  • Truncation uses Array.from() to avoid splitting surrogate pairs (emoji, CJK supplementary characters, etc).

Test plan

  • tsc --noEmit passes
  • All 35 new unit tests pass (precedence, empty strings, whitespace, Unicode truncation, composition)
  • Existing compaction-safeguard tests (17) and config tests (2) still pass
  • Verified in a live Korean persona session — post-compaction summary now preserves Korean, agent stays in character

@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Feb 6, 2026
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +25 to +28
function truncateUnicodeSafe(s: string, maxCodePoints: number): string {
if (s.length <= maxCodePoints) return s;
const chars = Array.from(s);
return chars.length <= maxCodePoints ? s : chars.slice(0, maxCodePoints).join("");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect truncation condition
truncateUnicodeSafe compares s.length (UTF-16 code units) to maxCodePoints. This will skip truncation for strings where s.length <= 800 but the number of Unicode code points is > 800 (e.g., many combining-mark sequences), allowing instructions to exceed the intended 800 code point cap. Use Array.from(s).length (or a shared code-point count) for the early-return check so the limit is actually enforced for all Unicode inputs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-extensions/compaction-instructions.ts
Line: 25:28

Comment:
**Incorrect truncation condition**
`truncateUnicodeSafe` compares `s.length` (UTF-16 code units) to `maxCodePoints`. This will skip truncation for strings where `s.length <= 800` but the number of Unicode code points is > 800 (e.g., many combining-mark sequences), allowing instructions to exceed the intended 800 *code point* cap. Use `Array.from(s).length` (or a shared code-point count) for the early-return check so the limit is actually enforced for all Unicode inputs.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in dc37ce4 — the early-return now uses Array.from(s).length (line 27), which counts code points instead of UTF-16 code units.

Comment on lines +151 to +160
it("does not split surrogate pair when cut lands inside a pair", () => {
const input = "X" + "\u{1F600}".repeat(800);
const result = resolveCompactionInstructions(input, undefined);
const codePoints = Array.from(result);
expect(codePoints).toHaveLength(800);
expect(codePoints[0]).toBe("X");
const lastCodeUnit = result.charCodeAt(result.length - 1);
const isLowSurrogate = lastCodeUnit >= 0xdc00 && lastCodeUnit <= 0xdfff;
expect(isLowSurrogate).toBe(true);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surrogate-pair test is wrong
This test asserts the last code unit is a low surrogate (0xDC00–0xDFFF) after truncation, but if truncation is “surrogate-safe” the string should never end with an unmatched surrogate at all. As written, it would pass for an output that ends with a dangling low surrogate (which is a broken string) and fail for a correctly-truncated output that ends on a complete code point. Please adjust the assertion to validate you don’t end with a lone high/low surrogate (and/or that the result round-trips via Array.from without replacement chars).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-extensions/compaction-instructions.test.ts
Line: 151:160

Comment:
**Surrogate-pair test is wrong**
This test asserts the last code unit is a *low surrogate* (`0xDC00–0xDFFF`) after truncation, but if truncation is “surrogate-safe” the string should never end with an unmatched surrogate at all. As written, it would pass for an output that ends with a dangling low surrogate (which is a broken string) and fail for a correctly-truncated output that ends on a complete code point. Please adjust the assertion to validate you *don’t* end with a lone high/low surrogate (and/or that the result round-trips via `Array.from` without replacement chars).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in dc37ce4 — the test now asserts no lone surrogates exist (lines 158–161). Each code point is checked against the 0xD800–0xDFFF range, so a dangling surrogate would fail the test.

@keepitmello keepitmello force-pushed the fix/compaction-persona-language-preservation branch 2 times, most recently from 5bcf43d to dc37ce4 Compare February 6, 2026 14:44
@keepitmello keepitmello changed the title fix: preserve conversation language in compaction summaries fix: preserve persona and language continuity in compaction summaries Feb 6, 2026
@xiaoyaner0201

This comment was marked as spam.

@keepitmello
Copy link
Copy Markdown
Contributor Author

All Node.js checks are passing (lint, test, build, format, protocol on both Linux and Windows).

The two remaining failures are unrelated to this PR:

  • macos-app (lint): Swift lint failure — this PR only modifies TypeScript files
  • checks-windows (node, test): Runner infrastructure issue (step ended without conclusion, likely timeout)

These same failures appear across other PRs in the repo as well.

@keepitmello keepitmello force-pushed the fix/compaction-persona-language-preservation branch from d54882f to be89676 Compare February 7, 2026 17:28
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added stale Marked as stale due to inactivity and removed stale Marked as stale due to inactivity labels Feb 21, 2026
@jalehman jalehman self-assigned this Mar 3, 2026
@jalehman jalehman force-pushed the fix/compaction-persona-language-preservation branch from be89676 to d73c2a5 Compare March 13, 2026 05:27
@jalehman jalehman force-pushed the fix/compaction-persona-language-preservation branch 2 times, most recently from 1ad3abf to 8ec2f80 Compare March 13, 2026 06:20
keepitmello and others added 6 commits March 13, 2026 07:05
SDK auto-compaction hardcodes customInstructions to undefined, causing
summaries to always be generated in English. This breaks persona and
language continuity for non-English agents after context compaction.

Add a DEFAULT_COMPACTION_INSTRUCTIONS constant that instructs the
summarizer to preserve the conversation language and persona cues.
Wire a config → runtime → safeguard fallback chain so users can
override via compaction.customInstructions in the config.

Changes:
- New compaction-instructions.ts with resolve/compose utilities
- Config schema + types: add optional customInstructions field
- Runtime type: add customInstructions to WeakMap registry
- Extension builder: pass config value to runtime
- Safeguard extension: use precedence chain (event → config → default)
  across all three summarization paths (dropped/history/split-turn)
- 35 unit tests covering precedence, normalization, Unicode-safe
  truncation, and split-turn composition
Replace "Preserve persona, character, and speaking-style cues" with
"Focus on factual content: what was discussed, decisions made, and
current state" to prevent the summarizer from injecting persona
descriptions into the summary (wasting tokens and potentially
conflicting with system prompt persona).
@jalehman jalehman force-pushed the fix/compaction-persona-language-preservation branch from 8ec2f80 to 3d432e4 Compare March 13, 2026 14:10
@darfaz
Copy link
Copy Markdown

darfaz commented Mar 13, 2026

Running a Russian-language persona via SOUL.md and hit this exact problem — post-compaction the agent switches to English for several turns, narration text leaks into Telegram, and the persona tone goes flat until enough context rebuilds.

The current compaction prompt has no language or persona anchoring at all, so the summarizer defaults to English regardless of what the actual conversation language is. This PR's approach (inheriting language + persona from the active SOUL.md) is the right fix.

Bumping this — would be a shame to lose it to stale-bot.

@jalehman jalehman force-pushed the fix/compaction-persona-language-preservation branch from ea8bd5d to 4518fb2 Compare March 13, 2026 14:38
@jalehman jalehman merged commit 72b6a11 into openclaw:main Mar 13, 2026
28 checks passed
@jalehman
Copy link
Copy Markdown
Contributor

Merged via squash.

Thanks @keepitmello!

z-hao-wang pushed a commit to z-hao-wang/openclaw that referenced this pull request Mar 13, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
frankekn pushed a commit to xinhuagu/openclaw that referenced this pull request Mar 14, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
sbezludny pushed a commit to sbezludny/openclaw that referenced this pull request Mar 27, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…openclaw#10456)

Merged via squash.

Prepared head SHA: 4518fb2
Co-authored-by: keepitmello <71975659+keepitmello@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants