Skip to content

RTC: potential data corruption on block merge write path due to rich-text text offset directly into a diff API that operates on HTML string indices #77532

@danluu

Description

@danluu

Description

Hi,

After running into an RTC bug, I tried writing (vibe coding) a fuzzer to find other RTC bugs. Following up on a discussion with @dmsnell, @lynnbgriffith, and @alecgeatches, I'm writing this up here. The fuzzer has found quite a few potential bugs but (if there’s interest) I think it makes sense to start with one and look at them one at a time until we get an idea of what a good workflow would be.

I am not familiar with the code or the architecture of Gutenberg at all, so I’m very poorly qualified to tell if a bug is real or not. All of the text here is written by hand unless indicated otherwise (if I link to any AI generated text, I’ll note that it’s AI generated), but all of the fuzzer code, the bug diagnosis, etc., come directly from a coding agent. I’ve had good success using AI generated fuzzers for my own code but I haven’t tried something like this where I’m interacting with a completely unfamiliar codebase. I don’t want to spam the maintainers here, so if you don’t think this is a good use of time, I can move on to other projects; no hard feelings.

That being said, @dmsnell took a look at this bug and thought that it was real, likely related to #76058.

rtc-rich-text-offset-real-user-repro-annotated-real-actions.mp4

Here’s the AI generated diagnosis and repro of the bug, which I’ll paste below (@dmsnell looked at a previous version of this; I also looked at it but, as noted above, I’m unqualified to really weigh in on it).

BEGIN AI GENERATED TEXT

The block merge write path passes a block-editor rich-text text offset directly
into a diff API that operates on HTML string indices.

That coordinate-space mismatch corrupts formatted rich-text content during
real-time collaboration.

Root Cause

The relevant cursor and text APIs use two different coordinate spaces:

  • WPBlockSelection.offset counts visible rich-text positions.
  • mergeRichTextUpdate() applies diffWithCursor() to the HTML string.
  • the RTC block merge path forwards the rich-text offset directly, without
    converting it to an HTML index first.

So the browser can feed valid editor selections into a lower-level API that
interprets them in the wrong coordinate space.

Production Path

The production write path is:

  1. useEntityBlockEditor().onInput() or onChange()
  2. editEntityRecord()
  3. SyncManager.update()
  4. applyPostChangesToCRDTDoc()
  5. mergeCrdtBlocks()
  6. mergeRichTextUpdate()
  7. Delta.diffWithCursor()

The bug is introduced in step 4:

  • applyPostChangesToCRDTDoc() reads
    changes.selection?.selectionStart?.offset ?? null.
  • that raw number is passed to mergeCrdtBlocks().
  • it eventually reaches mergeRichTextUpdate() as if it were an HTML index.

Other RTC selection paths already know this conversion is needed:

  • crdt-user-selections.ts uses richTextOffsetToHtmlIndex().
  • block-selection-history.ts uses richTextOffsetToHtmlIndex().
  • crdt-selection.ts uses htmlIndexToRichTextOffset() on the read path.

Reproduction

Primary reproduction:

  • old value: "<em>italic</em><em>italic</em>"
  • new value: "<em>italic</em>beta"
  • block-editor selection offset: 10

Without the fix, the production path corrupts the content to:

<em>italic</em>bet>

That is not a benign serialization change. The closing a from beta is
lost, and the stored HTML ends with a stray >.

Real browser/user-action reproduction:

  • start value: "italic<em>betabetax</em>"
  • user presses End, then Backspace
  • both collaborators now have old value: "italic<em>betabeta</em>"
  • user presses End, then Shift+ArrowLeft four times to select the second
    beta
  • user presses the Italic keyboard shortcut to unitalicize the selected text
  • local editor value: "italic<em>beta</em>beta"

Without the fix, the collaborator receives:

italic<em>beta</em>/em>

The Playwright collaboration repro drives this with browser click and keyboard
events. It only reads editor selection state for assertions; it does not use
editEntityRecord() or another store dispatch for the repro mutation.

Independent confirmation:

  • old value:
    "<strong>é</strong>é<strong>é</strong><strong>é</strong>alphaalpha"
  • new value: "<strong>é</strong><em>italic</em>"
  • block-editor selection offset: 8

Without the fix, that path corrupts to:

<strong>é</strong><em>italic</eha

Evidence That It Is A Real Production Bug

The bug reproduces at multiple levels of the real production stack:

The bug also has strong controls:

  • the same update succeeds when no cursor hint is passed
  • the same update succeeds when the rich-text offset is converted with
    richTextOffsetToHtmlIndex()
  • the same update succeeds on plain text where text offsets and HTML indices
    are identical

Taken together, that shows the corruption is caused by the offset-space
mismatch itself, not by malformed content or by a browserless-only test path.

Re-Running The Repros

Run these commands from the repo root with Node 20 active.

mergeCrdtBlocks() repro

npm run test:unit -- --runInBand \
  packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \
  --testNamePattern="mergeCrdtBlocks"

applyPostChangesToCRDTDoc() repro

npm run test:unit -- --runInBand \
  packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \
  --testNamePattern="applyPostChangesToCRDTDoc"

SyncManager.update() repro

npm run test:unit -- --runInBand \
  packages/core-data/src/utils/test/rtc-rich-text-offset-space.test.js \
  --testNamePattern="SyncManager.update"

useEntityBlockEditor().onInput() repro

npm run test:unit -- --runInBand \
  packages/core-data/src/test/rtc-rich-text-offset-space.test.js

Playwright collaboration repro in a real browser

The browser repro uses built plugin assets, so rebuild before running it.

npm run build -- --skip-types
npm run wp-env-test -- start
WP_BASE_URL=http://localhost:8889 \
npm run test:e2e -- \
  test/e2e/specs/editor/collaboration/collaboration-rich-text-offset-space.spec.ts

How The Bug Was Introduced

The bug was introduced on January 14, 2026 by commit 30c040ca841
("Real-time collaboration: Use alternative diff in quill-delta, provide
incremental text updates") from
PR #73699.

That change:

  • replaced full-string replacement with cursor-guided incremental rich-text
    diffs
  • added the block merge handoff that forwards
    changes.selection?.selectionStart?.offset ?? null

Later, on March 17, 2026, commit 848188b4f12
("RTC: Fix cursor index sync with rich text formatting") from
PR #76418 added the
conversion helpers used elsewhere in RTC, but the block merge write path was
not updated to use them.

Proposed Fix Plan

  1. Stop treating the block merge cursor as a bare number in the production
    path.
  2. Carry the selected clientId, selected attributeKey, and editor-space
    rich-text offset through block merging.
  3. Convert the offset with richTextOffsetToHtmlIndex() only when merging the
    exact rich-text field that corresponds to the selected block attribute.
  4. Pass null for unrelated rich-text fields.
  5. Add repro tests for the bug from the first production-relevant merge layer
    through a real Playwright collaboration test.
  6. Add a defensive verification guard after diffWithCursor() so a bad cursor
    hint cannot silently corrupt stored content.

Fix Rationale

The minimal conversion fix would be to rewrite one number right before calling
mergeRichTextUpdate(). The stronger fix is to keep the cursor scoped until
the exact rich-text merge that needs it.

That approach:

  • fixes the offset-space mismatch
  • avoids over-applying a cursor hint to unrelated rich-text fields
  • gives the merge layer enough context to stay correct as richer block
    schemas are added

Branch Structure

This branch is intended to keep the bug report separate from the fix:

  • one commit for repro tests and documentation
  • one later commit for the implementation change

The final branch should pass the targeted unit and Playwright coverage with the
fix applied.

END AI GENERATED TEXT

The text above explains the strategy for the fix and this commit has the AI-generated proposed fix. If people think this (or a modified version of this) would be reasonable to commit, I can submit a PR with that.

There’s a proposed regression test (AI generated) here

As noted above, I’ve had good success with AI generated fuzzing for my own code. But, even in those cases, there were false positives that I had to both directly reject as well as steer the design of the fuzzer away from. With me not being familiar with the code, this is much more difficult for me to do here. I’ve attempted to reduce the odds of a false positive by

  1. Having the agent reproduce the “low-level” bug found by fuzzing in an end-to-end scenario in the browser.
  2. Having someone who’s familiar with the project take a look at the AI generated report.

Let me know if you think is or isn’t useful and we can figure out where to go from here. Like I said above, I won’t be offended if you don’t think it’s a good use of time to deal with AI-generated bugs from outsiders to the project, but as someone who has run into bugs in Gutenberg, I would love to be able to help improve the quality of Gutenberg.

Step-by-step reproduction instructions

See above.

Screenshots, screen recording, code snippet

See above.

Environment info

  • Gutenberg: 23.0.0-rc.1
  • WordPress core: 7.1-alpha-62247
  • Node: v20.19.0
  • Playwright: 1.58.2
  • PHP: 8.3.30

Please confirm that you have searched existing issues in the repo.

  • Yes

Please confirm that you have tested with all plugins deactivated except Gutenberg.

  • Yes

Please confirm which theme type you used for testing.

  • Block
  • Classic
  • Hybrid (e.g. classic with theme.json)
  • Not sure

Metadata

Metadata

Assignees

No one assigned

    Labels

    [Feature] Real-time CollaborationPhase 3 of the Gutenberg roadmap around real-time collaboration[Type] BugAn existing feature does not function as intended

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions