Skip to content

fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (#80613)#86645

Merged
clawsweeper[bot] merged 5 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-80620
May 25, 2026
Merged

fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (#80613)#86645
clawsweeper[bot] merged 5 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-80620

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Makes #80620 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.

ClawSweeper 🐠 replacement reef notes:

Inherited issue-closing references from the source PR:
Closes #80613

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against ca9c027.

MoerAI and others added 5 commits May 25, 2026 21:35
The dreaming-phases dedupe path's local `tokenizeSnippet` split on `/[^a-z0-9]+/i`, producing empty token sets for pure-CJK snippets and dropping all CJK content for mixed snippets. That had two failure modes:

1. Two close paraphrases of the same Chinese fact tokenized to empty sets, fell back to exact-string match, returned similarity 0, and ended up as duplicate candidates in MEMORY.md.

2. Two semantically distinct CJK snippets that happened to share ASCII tokens (e.g. `Plan` + `exRule`) returned similarity 1.0, so the dedupe path silently dropped one of the two distinct memories.

The memory MMR layer already has a CJK-aware tokenizer (`extensions/memory-core/src/memory/mmr.ts`: unigrams + adjacent bigrams + ASCII alphanumerics). This change extracts it into `extensions/memory-core/src/memory/tokenize.ts` and routes the dreaming dedupe path through the same helper via `textSimilarity`. `mmr.ts` re-exports `tokenize` / `jaccardSimilarity` / `textSimilarity` so existing imports (including `mmr.test.ts`) continue to work without churn.

Verification with the patched module against the reporter's CJK scenarios:
- Pure-CJK paraphrase pair textSimilarity: 0 -> 0.622 (dedup threshold 0.5 now succeeds).
- Mixed-CJK distinct pair textSimilarity: 1.000 -> 0.056 (two distinct facts now kept).
- English paraphrase: 0.600 (Latin behavior unchanged).
- Unrelated short snippets: 0.000 (no over-collapse).

Scope: Bug 2 from issue #80613 only. The Bug 1 (promotion rehydration leaks managed dreaming block lines into MEMORY.md) is a separate end-to-end fixture problem that clawsweeper flagged as not high-confidence-reproducible from source alone; it should be addressed in a separate PR with a targeted promotion-path reproduction. This PR is the narrow CJK dedupe repair that clawsweeper directly endorsed.
oxlint flagged Array#sort() in the new regression test; use Array#toSorted() instead. Non-functional change — test logic and output are unchanged.
…e to empty (#80613)

Addresses chatgpt-codex-connector P1 review on #80620.

textSimilarity is used by dreaming dedupeEntries to merge near-duplicate
recall entries. The shared tokenize() only emits ASCII word-tokens and
CJK uni-/bigrams, so inputs in other scripts (Cyrillic, Arabic,
emoji-only, punctuation-only) tokenize to the empty set. Raw Jaccard
returns 1 for two empty sets — that is the correct, intentional
semantics for MMR re-ranking and is asserted by mmr.test.ts — but for
the dedupe path it would collapse distinct non-tokenized snippets into
one and drop data.

Add a literal normalized-string equality fallback inside textSimilarity
for the both-empty case only. Non-empty cases (the existing MMR path)
keep Jaccard semantics unchanged. Add a regression test in mmr.test.ts
covering Cyrillic, Arabic, emoji-only, and punctuation-only snippets:
distinct stays 0, identical stays 1.
…#80613)

Upstream lint sweep #83542 (chore(lint): remove underscore-dangle allow list) removed the `__testing` alias from the lint allow list, exposing that the 4 new CJK regression tests added in c497966 referenced `__testing.dedupeEntries` while the import statement only brought in `testing`. After upstream's rebase merge into this branch, tsgo reported TS2552 on dreaming-phases.test.ts:3028,3042,3054,3063 and a TS7006 implicit any on the inferred entry param (cascade from the missing identifier).

Fix: use the imported `testing.dedupeEntries` directly. The `testing as __testing` alias still exists in dreaming-phases.ts for any other consumers; this only adjusts the local test references.

Verification: pnpm tsgo:extensions:test reports 0 errors in dreaming-phases.test.ts (the 6 remaining errors are pre-existing infra issues unrelated to this branch: @openclaw/proxyline resolution, src/plugin-sdk/file-lock.ts type narrowing).
@clawsweeper clawsweeper Bot added extensions: memory-core Extension: memory-core size: M clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. P2 Normal backlog priority with limited blast radius. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. clawsweeper Tracked by ClawSweeper automation labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: passed. Reviewed May 25, 2026, 5:47 PM ET / 21:47 UTC.

Summary
The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.

PR surface: Source +15, Tests +96. Total +111 across 5 files.

Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source PR includes terminal before/after proof against production source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Review metrics: 1 noteworthy metric.

  • Inherited closing reference: 1 closing reference to a two-part issue. The PR fixes the CJK dedupe half, but the linked report also covers a managed-block leak that would be closed automatically if the reference remains.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Replace the inherited closing reference with a non-closing reference or create a canonical follow-up for the managed-block leak before merge.

Risk before merge

  • The PR body still contains an inherited closing reference to [Bug]: dreaming pipeline leaks raw candidate content into MEMORY.md and CJK dedup is ineffective in tokenizeSnippet #80613, but the linked report has a second managed-block promotion leak that this PR explicitly leaves out of scope; merging as-is could auto-close remaining work.
  • The change intentionally alters persisted memory dreaming candidate dedupe behavior, so maintainers should accept that session/memory state impact based on the targeted tokenizer proof rather than broad end-to-end workspace proof.
  • A full light-dreaming sweep against a populated CJK workspace was not run in this review; the available proof is pure-function terminal output plus focused regression tests.

Maintainer options:

  1. Fix the closing reference before merge (recommended)
    Replace the inherited closing keyword with a non-closing reference, or split the managed-block leak into its own canonical issue before allowing automerge.
  2. Accept the scoped fix with explicit follow-up
    Maintainers can merge the tokenizer fix if they intentionally preserve the remaining promotion-leak work elsewhere before the issue auto-closes.
  3. Pause for a combined fix
    If maintainers want the linked two-part report closed by one PR, pause this branch until the promotion sanitizer regression is included too.

Next step before merge
A maintainer or trusted automation path should adjust the PR body or issue tracking before automerge; no code repair is needed from this review.

Security
Cleared: The diff only moves plugin-local tokenizer logic and updates tests; it does not change workflows, dependencies, lockfiles, secrets, package metadata, or downloaded code execution paths.

Review details

Best possible solution:

Land the tokenizer fix after changing the PR body so the remaining managed-block leak stays open or is tracked by a separate canonical issue.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source PR includes terminal before/after proof against production source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Is this the best way to solve the issue?

Yes for the CJK dedupe portion: extracting the existing CJK-aware tokenizer and preserving mmr.ts re-exports is the narrow maintainable fix. The PR body should not close the whole linked two-part issue until the promotion leak is tracked or fixed.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5b6d03e3e2f1.

Label changes

Label justifications:

  • P2: This is a normal-priority memory-core bug fix for duplicate or dropped dreaming candidates with limited blast radius.
  • merge-risk: 🚨 automation: The PR body's closing keyword can cause GitHub automation to close a broader issue whose promotion-leak portion remains unfixed.
  • merge-risk: 🚨 session-state: The diff changes memory dreaming dedupe arithmetic, which can affect persisted recall candidate consolidation and retention.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The source PR provides terminal proof from a real checkout using production source bytes for the before/after tokenizer behavior, which is sufficient for this pure-function memory-dedupe change.
  • proof: sufficient: Contributor real behavior proof is sufficient. The source PR provides terminal proof from a real checkout using production source bytes for the before/after tokenizer behavior, which is sufficient for this pure-function memory-dedupe change.
Evidence reviewed

PR surface:

Source +15, Tests +96. Total +111 across 5 files.

View PR surface stats
Area Files Added Removed Net
Source 3 112 97 +15
Tests 2 96 0 +96
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 5 208 97 +111

Acceptance criteria:

What I checked:

Likely related people:

  • IWhatsskill: git blame attributes the current dreaming dedupe function and the existing CJK-aware MMR tokenizer body to this author in commit c916906. (role: introduced current behavior; confidence: high; commits: c91690658443; files: extensions/memory-core/src/dreaming-phases.ts, extensions/memory-core/src/memory/mmr.ts)
  • steipete: Peter Steinberger committed c916906, authored the shared coercion refactor that touched the tokenizer line, and previously moved the memory engine into the plugin. (role: recent area committer and refactor author; confidence: high; commits: c91690658443, 77d9ac30bb8d, cad83db8b2f7; files: extensions/memory-core/src/dreaming-phases.ts, extensions/memory-core/src/memory/mmr.ts, src/plugin-sdk/string-coerce-runtime.ts)
  • Mariano: Commit 79348f7 added REM preview and safe promotion replay in the same dreaming-phases surface that now consumes dedupeEntries. (role: adjacent dreaming feature contributor; confidence: medium; commits: 79348f73c8b6; files: extensions/memory-core/src/dreaming-phases.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Tiny Clawlet

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: purrs at green checks.
Image traits: location artifact grotto; accessory commit compass; palette rose quartz and slate; mood bright-eyed; pose holding its accessory up for inspection; shell frosted glass shell; lighting calm overcast light; background smooth stones and checkmarks.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Tiny Clawlet in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot merged commit 99d96c1 into main May 25, 2026
170 of 181 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-80620 branch May 25, 2026 21:50
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

🦞🧹
ClawSweeper automerge is enabled.

  • Head: ca9c02734c53
  • Label: clawsweeper:automerge
  • Action: exact-head review queued (workflow sweep.yml, event repository_dispatch).
  • Flow: review this head, repair/rebase only if needed, then re-review the exact repaired head before merge.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-25 21:50:40 UTC review passed ca9c02734c53 (structured ClawSweeper verdict: pass (sha=ca9c02734c53c60ba9a482a825ae29214584f...)
  • 2026-05-25 21:50:59 UTC merged ca9c02734c53 (merged by ClawSweeper automerge)
  • 2026-05-25 21:51:09 UTC review queued ca9c02734c53 (queued)

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 26, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…aw#80613) (openclaw#86645)

Summary:
- The PR extracts the CJK-aware memory tokenizer into a shared helper, routes dreaming dedupe through it, preserves MMR re-exports, and adds regression coverage for CJK and empty-token cases.
- PR surface: Source +15, Tests +96. Total +111 across 5 files.
- Reproducibility: yes. Current main has an ASCII-only tokenizeSnippet path in dreaming dedupe, and the source ... ction source bytes for the CJK failure modes; I did not run tests locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(memory-core): use Array.toSorted for openclaw#80613 lint fix
- PR branch already contained follow-up commit before automerge: fix(memory-core): preserve dedupe identity when both snippets tokeniz…
- PR branch already contained follow-up commit before automerge: fix(memory-core): rename __testing to testing in CJK regression tests…
- PR branch already contained follow-up commit before automerge: fix(memory-core): use CJK-aware tokenizer for dreaming dedupe (openclaw#80613)

Validation:
- ClawSweeper review passed for head ca9c027.
- Required merge gates passed before the squash merge.

Prepared head SHA: ca9c027
Review: openclaw#86645 (comment)

Co-authored-by: MoerAI <friendnt@g.skku.edu>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation extensions: memory-core Extension: memory-core merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: dreaming pipeline leaks raw candidate content into MEMORY.md and CJK dedup is ineffective in tokenizeSnippet

1 participant