You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RTC: potential data corruption on block merge write path due to rich-text text offset directly into a diff API that operates on HTML string indices #77532
After running into an RTC bug, I tried writing (vibe coding) a fuzzer to find other RTC bugs. Following up on a discussion with @dmsnell, @lynnbgriffith, and @alecgeatches, I'm writing this up here. The fuzzer has found quite a few potential bugs but (if there’s interest) I think it makes sense to start with one and look at them one at a time until we get an idea of what a good workflow would be.
I am not familiar with the code or the architecture of Gutenberg at all, so I’m very poorly qualified to tell if a bug is real or not. All of the text here is written by hand unless indicated otherwise (if I link to any AI generated text, I’ll note that it’s AI generated), but all of the fuzzer code, the bug diagnosis, etc., come directly from a coding agent. I’ve had good success using AI generated fuzzers for my own code but I haven’t tried something like this where I’m interacting with a completely unfamiliar codebase. I don’t want to spam the maintainers here, so if you don’t think this is a good use of time, I can move on to other projects; no hard feelings.
That being said, @dmsnell took a look at this bug and thought that it was real, likely related to #76058.
crdt-selection.ts uses htmlIndexToRichTextOffset() on the read path.
Reproduction
Primary reproduction:
old value: "<em>italic</em><em>italic</em>"
new value: "<em>italic</em>beta"
block-editor selection offset: 10
Without the fix, the production path corrupts the content to:
<em>italic</em>bet>
That is not a benign serialization change. The closing a from beta is
lost, and the stored HTML ends with a stray >.
Real browser/user-action reproduction:
start value: "italic<em>betabetax</em>"
user presses End, then Backspace
both collaborators now have old value: "italic<em>betabeta</em>"
user presses End, then Shift+ArrowLeft four times to select the second beta
user presses the Italic keyboard shortcut to unitalicize the selected text
local editor value: "italic<em>beta</em>beta"
Without the fix, the collaborator receives:
italic<em>beta</em>/em>
The Playwright collaboration repro drives this with browser click and keyboard
events. It only reads editor selection state for assertions; it does not use editEntityRecord() or another store dispatch for the repro mutation.
Independent confirmation:
old value: "<strong>é</strong>é<strong>é</strong><strong>é</strong>alphaalpha"
new value: "<strong>é</strong><em>italic</em>"
block-editor selection offset: 8
Without the fix, that path corrupts to:
<strong>é</strong><em>italic</eha
Evidence That It Is A Real Production Bug
The bug reproduces at multiple levels of the real production stack:
The browser repro uses built plugin assets, so rebuild before running it.
npm run build -- --skip-types
npm run wp-env-test -- start
WP_BASE_URL=http://localhost:8889 \
npm run test:e2e -- \
test/e2e/specs/editor/collaboration/collaboration-rich-text-offset-space.spec.ts
How The Bug Was Introduced
The bug was introduced on January 14, 2026 by commit 30c040ca841
("Real-time collaboration: Use alternative diff in quill-delta, provide
incremental text updates") from PR #73699.
That change:
replaced full-string replacement with cursor-guided incremental rich-text
diffs
added the block merge handoff that forwards changes.selection?.selectionStart?.offset ?? null
Later, on March 17, 2026, commit 848188b4f12
("RTC: Fix cursor index sync with rich text formatting") from PR #76418 added the
conversion helpers used elsewhere in RTC, but the block merge write path was
not updated to use them.
Proposed Fix Plan
Stop treating the block merge cursor as a bare number in the production
path.
Carry the selected clientId, selected attributeKey, and editor-space
rich-text offset through block merging.
Convert the offset with richTextOffsetToHtmlIndex() only when merging the
exact rich-text field that corresponds to the selected block attribute.
Pass null for unrelated rich-text fields.
Add repro tests for the bug from the first production-relevant merge layer
through a real Playwright collaboration test.
Add a defensive verification guard after diffWithCursor() so a bad cursor
hint cannot silently corrupt stored content.
Fix Rationale
The minimal conversion fix would be to rewrite one number right before calling mergeRichTextUpdate(). The stronger fix is to keep the cursor scoped until
the exact rich-text merge that needs it.
That approach:
fixes the offset-space mismatch
avoids over-applying a cursor hint to unrelated rich-text fields
gives the merge layer enough context to stay correct as richer block
schemas are added
Branch Structure
This branch is intended to keep the bug report separate from the fix:
one commit for repro tests and documentation
one later commit for the implementation change
The final branch should pass the targeted unit and Playwright coverage with the
fix applied.
END AI GENERATED TEXT
The text above explains the strategy for the fix and this commit has the AI-generated proposed fix. If people think this (or a modified version of this) would be reasonable to commit, I can submit a PR with that.
There’s a proposed regression test (AI generated) here
As noted above, I’ve had good success with AI generated fuzzing for my own code. But, even in those cases, there were false positives that I had to both directly reject as well as steer the design of the fuzzer away from. With me not being familiar with the code, this is much more difficult for me to do here. I’ve attempted to reduce the odds of a false positive by
Having the agent reproduce the “low-level” bug found by fuzzing in an end-to-end scenario in the browser.
Having someone who’s familiar with the project take a look at the AI generated report.
Let me know if you think is or isn’t useful and we can figure out where to go from here. Like I said above, I won’t be offended if you don’t think it’s a good use of time to deal with AI-generated bugs from outsiders to the project, but as someone who has run into bugs in Gutenberg, I would love to be able to help improve the quality of Gutenberg.
Step-by-step reproduction instructions
See above.
Screenshots, screen recording, code snippet
See above.
Environment info
Gutenberg: 23.0.0-rc.1
WordPress core: 7.1-alpha-62247
Node: v20.19.0
Playwright: 1.58.2
PHP: 8.3.30
Please confirm that you have searched existing issues in the repo.
Yes
Please confirm that you have tested with all plugins deactivated except Gutenberg.
Yes
Please confirm which theme type you used for testing.
Description
Hi,
After running into an RTC bug, I tried writing (vibe coding) a fuzzer to find other RTC bugs. Following up on a discussion with @dmsnell, @lynnbgriffith, and @alecgeatches, I'm writing this up here. The fuzzer has found quite a few potential bugs but (if there’s interest) I think it makes sense to start with one and look at them one at a time until we get an idea of what a good workflow would be.
I am not familiar with the code or the architecture of Gutenberg at all, so I’m very poorly qualified to tell if a bug is real or not. All of the text here is written by hand unless indicated otherwise (if I link to any AI generated text, I’ll note that it’s AI generated), but all of the fuzzer code, the bug diagnosis, etc., come directly from a coding agent. I’ve had good success using AI generated fuzzers for my own code but I haven’t tried something like this where I’m interacting with a completely unfamiliar codebase. I don’t want to spam the maintainers here, so if you don’t think this is a good use of time, I can move on to other projects; no hard feelings.
That being said, @dmsnell took a look at this bug and thought that it was real, likely related to #76058.
rtc-rich-text-offset-real-user-repro-annotated-real-actions.mp4
Here’s the AI generated diagnosis and repro of the bug, which I’ll paste below (@dmsnell looked at a previous version of this; I also looked at it but, as noted above, I’m unqualified to really weigh in on it).
BEGIN AI GENERATED TEXT
The block merge write path passes a block-editor rich-text text offset directly
into a diff API that operates on HTML string indices.
That coordinate-space mismatch corrupts formatted rich-text content during
real-time collaboration.
Root Cause
The relevant cursor and text APIs use two different coordinate spaces:
WPBlockSelection.offsetcounts visible rich-text positions.mergeRichTextUpdate()appliesdiffWithCursor()to the HTML string.converting it to an HTML index first.
So the browser can feed valid editor selections into a lower-level API that
interprets them in the wrong coordinate space.
Production Path
The production write path is:
useEntityBlockEditor().onInput()oronChange()editEntityRecord()SyncManager.update()applyPostChangesToCRDTDoc()mergeCrdtBlocks()mergeRichTextUpdate()Delta.diffWithCursor()The bug is introduced in step 4:
applyPostChangesToCRDTDoc()readschanges.selection?.selectionStart?.offset ?? null.mergeCrdtBlocks().mergeRichTextUpdate()as if it were an HTML index.Other RTC selection paths already know this conversion is needed:
crdt-user-selections.tsusesrichTextOffsetToHtmlIndex().block-selection-history.tsusesrichTextOffsetToHtmlIndex().crdt-selection.tsuseshtmlIndexToRichTextOffset()on the read path.Reproduction
Primary reproduction:
"<em>italic</em><em>italic</em>""<em>italic</em>beta"10Without the fix, the production path corrupts the content to:
That is not a benign serialization change. The closing
afrombetaislost, and the stored HTML ends with a stray
>.Real browser/user-action reproduction:
"italic<em>betabetax</em>"End, thenBackspace"italic<em>betabeta</em>"End, thenShift+ArrowLeftfour times to select the secondbeta"italic<em>beta</em>beta"Without the fix, the collaborator receives:
The Playwright collaboration repro drives this with browser click and keyboard
events. It only reads editor selection state for assertions; it does not use
editEntityRecord()or another store dispatch for the repro mutation.Independent confirmation:
"<strong>é</strong>é<strong>é</strong><strong>é</strong>alphaalpha""<strong>é</strong><em>italic</em>"8Without the fix, that path corrupts to:
Evidence That It Is A Real Production Bug
The bug reproduces at multiple levels of the real production stack:
mergeCrdtBlocks()reproapplyPostChangesToCRDTDoc()reproSyncManager.update()reprouseEntityBlockEditor().onInput()reproThe bug also has strong controls:
richTextOffsetToHtmlIndex()are identical
Taken together, that shows the corruption is caused by the offset-space
mismatch itself, not by malformed content or by a browserless-only test path.
Re-Running The Repros
Run these commands from the repo root with Node 20 active.
mergeCrdtBlocks()repronpm run test:unit -- --runInBand \ packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \ --testNamePattern="mergeCrdtBlocks"applyPostChangesToCRDTDoc()repronpm run test:unit -- --runInBand \ packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \ --testNamePattern="applyPostChangesToCRDTDoc"SyncManager.update()repronpm run test:unit -- --runInBand \ packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \ --testNamePattern="SyncManager.update"useEntityBlockEditor().onInput()reproPlaywright collaboration repro in a real browser
The browser repro uses built plugin assets, so rebuild before running it.
How The Bug Was Introduced
The bug was introduced on January 14, 2026 by commit
30c040ca841("Real-time collaboration: Use alternative diff in quill-delta, provide
incremental text updates") from
PR #73699.
That change:
diffs
changes.selection?.selectionStart?.offset ?? nullLater, on March 17, 2026, commit
848188b4f12("RTC: Fix cursor index sync with rich text formatting") from
PR #76418 added the
conversion helpers used elsewhere in RTC, but the block merge write path was
not updated to use them.
Proposed Fix Plan
path.
clientId, selectedattributeKey, and editor-spacerich-text offset through block merging.
richTextOffsetToHtmlIndex()only when merging theexact rich-text field that corresponds to the selected block attribute.
nullfor unrelated rich-text fields.through a real Playwright collaboration test.
diffWithCursor()so a bad cursorhint cannot silently corrupt stored content.
Fix Rationale
The minimal conversion fix would be to rewrite one number right before calling
mergeRichTextUpdate(). The stronger fix is to keep the cursor scoped untilthe exact rich-text merge that needs it.
That approach:
schemas are added
Branch Structure
This branch is intended to keep the bug report separate from the fix:
The final branch should pass the targeted unit and Playwright coverage with the
fix applied.
END AI GENERATED TEXT
The text above explains the strategy for the fix and this commit has the AI-generated proposed fix. If people think this (or a modified version of this) would be reasonable to commit, I can submit a PR with that.
There’s a proposed regression test (AI generated) here
As noted above, I’ve had good success with AI generated fuzzing for my own code. But, even in those cases, there were false positives that I had to both directly reject as well as steer the design of the fuzzer away from. With me not being familiar with the code, this is much more difficult for me to do here. I’ve attempted to reduce the odds of a false positive by
Let me know if you think is or isn’t useful and we can figure out where to go from here. Like I said above, I won’t be offended if you don’t think it’s a good use of time to deal with AI-generated bugs from outsiders to the project, but as someone who has run into bugs in Gutenberg, I would love to be able to help improve the quality of Gutenberg.
Step-by-step reproduction instructions
See above.
Screenshots, screen recording, code snippet
See above.
Environment info
Please confirm that you have searched existing issues in the repo.
Please confirm that you have tested with all plugins deactivated except Gutenberg.
Please confirm which theme type you used for testing.