RTC: Fix connection lost error modal when /wp-json/wp-sync/v1/updates exceeds 16 MiB limit by danluu · Pull Request #77724 · WordPress/gutenberg

danluu · 2026-04-27T22:39:04Z

What?

This is part of a series of issues and PRs created by a coding agent looking at the output of an AI generated fuzzer. See #77716 for the tracking issue.

This issue of the 16 MiB /wp-json/wp-sync/v1/updates limit causing the connection lost modal is themeatically similar to #77669 in that this also causes a connection lost modal and it's also due to an update that's too large, but this is a distinct failure mode.

The repro shown in the video below is a bit odd in that it's stuffing updates into the title, but the same failure mode should apply in the more reasonable case of putting updates into the body of posts.

annotated-repro.mp4

I don't know how realistic this is, but I didn't file #77631 when the fuzzer surfaced it before it was found by a human developer who was fixing a bug encountered by a real user because I didn't know if 50 rooms was realistic or not. I've gotten the "connection lost" modal a number of times myself from normal use and, in general, think it's nice to get rid of cases that could cause it so that fuzzing can find other cases that might be more realistic even if this case isn't realistic, but I can also see the argument for not adding this much code to fix a corner case issue that might not happen in practice.

BEGIN AI GENERATED TEXT

Why this is distinct

Not PHP OOM.
Not browser offline / request abort.
Not the >50 rooms validation limit.
Not the 1 MB single-update limit or oversized compaction case.
The server rejects the request at the route validator with 413 Request body is too large.

Repro shape

Baseline collaborative editor state contributes 4 rooms.
Load 40 additional numeric postType/post:* entity records into sync.
Edit all 40 extra post records with sub-1 MB title changes (450 KiB each).
The next poll sends:
- 44 rooms total
- multiple queued updates
- request body around 24.6 MB
WordPress responds with repeated 413 Request Entity Too Large.
After retries are exhausted, the editor shows the Connection lost modal.

Reproduction levels

The full Connection lost modal is only visible at the browser/editor layer,
but the bug can be reproduced at lower levels by separating the failure into
the client payload shape and the server validator.

Payload-shape repro

This is the lowest-level client-side repro. It constructs the same SyncPayload
shape that the polling manager sends, while staying under the 50 room cap and
under the 1 MiB per-update cap. The resulting JSON body still exceeds the
server's 16 MiB body cap.

Run from any checkout:

node <<'NODE'
const MAX_BODY_SIZE = 16 * 1024 * 1024;
const ROOM_COUNT = 44;
const BASELINE_ROOMS = [
	'postType/post:1',
	'root/comment',
	'taxonomy/category',
	'root/site',
];
const ENCODED_UPDATE_SIZE = 600 * 1024;
const rooms = Array.from( { length: ROOM_COUNT }, ( _, index ) => ( {
	after: 0,
	awareness: { user: `client-${ index + 1 }` },
	client_id: index + 1,
	room: BASELINE_ROOMS[ index ] || `postType/post:${ index + 1 }`,
	updates:
		index < BASELINE_ROOMS.length
			? []
			: [ { type: 'update', data: 'x'.repeat( ENCODED_UPDATE_SIZE ) } ],
} ) );
const body = JSON.stringify( { rooms } );
console.log(
	JSON.stringify(
		{
			rooms: rooms.length,
			updates: rooms.reduce(
				( total, room ) => total + room.updates.length,
				0
			),
			encodedUpdateBytes: ENCODED_UPDATE_SIZE,
			bodyBytes: Buffer.byteLength( body, 'utf8' ),
			maxBodyBytes: MAX_BODY_SIZE,
			exceedsLimit:
				Buffer.byteLength( body, 'utf8' ) > MAX_BODY_SIZE,
		},
		null,
		2
	)
);
NODE

Expected result: rooms is 44, each encoded update is 614400 bytes, and
bodyBytes is about 24581413, which is greater than 16777216.

Server-validator repro

This isolates the server behavior: the route-level validator rejects an
oversized raw request body with rest_sync_body_too_large and status 413.
This does not exercise the editor modal, but it proves that the server rejects
the aggregate request before the sync handler stores updates.

vendor/bin/phpunit \
	--filter test_sync_rejects_oversized_request_body \
	phpunit/tests/collaboration/wpHttpPollingSyncServer.php

This test is in
phpunit/tests/collaboration/wpHttpPollingSyncServer.php.

Browser HTTP repro

This is the focused browser repro without relying on a fuzzer campaign. It
loads the extra synced entity rooms, edits them, observes a POST /wp-json/wp-sync/v1/updates request above 16 MiB, observes repeated 413
responses, and then asserts that the Connection lost modal appears.

WP_ENV_PORT=8893 npm run wp-env-test start
WP_ENV_PORT=8893 WP_BASE_URL=http://localhost:8893 npm exec \
	--workspace @wordpress/e2e-tests-playwright -- wp-scripts test-playwright \
	test/e2e/specs/editor/collaboration/collaboration-sync-body-size-connection-lost.spec.ts \
	--project=chromium
WP_ENV_PORT=8893 npm run wp-env-test stop

Use a different WP_ENV_PORT if 8893 is already occupied.

How this was introduced

This appears to be a composition bug in the original HTTP polling sync design,
not a bug introduced by the later 1 MiB single-update accounting fix.

Relevant history:

#72114,
c214929139f50337250efe2bb24ff82c3ff2b6aa, made syncing a
side-concern in the core-data resolver. This is part of the path where
resolved entity records can be loaded into the sync manager.
#74564,
48ce44dac7981eb730079563a3a2975b89840fac, added the default HTTP
polling sync provider. Its polling manager built one payload from all
registered roomStates and drained each room's queued updates with
state.updateQueue.get(), without an aggregate request-byte budget.
#75699,
c4e4fac0a26bfb2dc38f17c765f2e84266b68b99, removed the
IS_GUTENBERG_PLUGIN guards around collaborative editing. In the current
path, any resolved numeric entity record with entityConfig.syncConfig and
no query is loaded into sync.
#76987,
1be2ef27e6819597dacc3b395caa05670d494194, backported the server
validation hardening from WordPress/wordpress-develop#11296. It added
MAX_BODY_SIZE = 16 * MB_IN_BYTES, MAX_ROOMS_PER_REQUEST = 50,
MAX_UPDATE_DATA_SIZE = MB_IN_BYTES, and the route-level
validate_request() path that returns rest_sync_body_too_large with
status 413.

Two later nearby fixes are related but do not fix this aggregate-body bug:

#77631,
1642980d599be51c7cce7b2dc3a0c052b69ad367, rotates rooms when the
registered room count exceeds MAX_ROOMS_PER_REQUEST. It addresses the
>50 rooms failure, but a request with 50 or fewer rooms can still exceed
the 16 MiB body cap.
#77669,
a54911b0c49e3b4abea4d6d7ce85c0e2c2bad11e, fixes the separate
per-update base64 accounting mismatch. That prevents a single encoded update
from exceeding the 1 MiB update limit, but does not limit the total body
size of a multi-room poll.

The sync endpoint has three independent caps:

at most 50 rooms per request
at most 1 MiB of encoded data for one update
at most 16 MiB for the whole request body

The client batches every registered room into one poll request and drains every
room's queued updates into that request. That makes the first two caps
insufficient to protect the third cap. For example, 40 rooms with updates far
below 1 MiB can still create a request body larger than 16 MiB once their
base64 update strings, awareness payloads, room metadata, and JSON overhead are
combined.

The number of rooms can grow beyond what a user would perceive as the one
document they are editing because resolved numeric entity records are loaded
into the sync manager. A page that touches many synced post records can
therefore create many sync rooms, each with ordinary-sized queued updates.

The failure then becomes sticky. A 413 from the route-level body validator
means the server rejected the request before storing the updates, but the client
treats it like a generic retryable poll failure. It retries the same oversized
shape after backoff, eventually surfacing the generic Connection lost modal.

Issue analysis

This is an availability and scalability issue, not evidence of server-side
partial write corruption. The server body-size validator runs before the sync
handler stores updates, so the failing request is rejected as a unit.

The risky part is the client recovery behavior after rejection. The current
generic failure path cannot tell that the payload is structurally too large to
ever succeed. It may restore the same queued updates, or replace failed outgoing
updates with compaction updates in cases where the room already has a cursor.
That behavior is appropriate for ambiguous network errors where the server
might have committed the write, but it is the wrong shape for a deterministic
413: retrying or compacting does not reduce the aggregate request size and can
keep the session in a permanent retry loop.

The fix should preserve these invariants:

never drop local queued updates silently
never split the bytes of a single Yjs update
preserve FIFO ordering of updates within each room
keep room cursors and awareness state scoped to the room that was actually
sent
avoid raising the server body limit as the primary fix, because that shifts
the failure toward memory pressure and slower requests
avoid disabling real-time collaboration for the whole editor when smaller
poll batches can make progress

Relevant code path

Server-side body cap:
- class-wp-http-polling-sync-server.php
- class-wp-http-polling-sync-server.php
Numeric entity records are loaded into the sync manager when getEntityRecord() resolves:
- packages/core-data/src/resolvers.js
- packages/core-data/src/resolvers.js
Generic poll failures become disconnected status and then the modal:
- packages/sync/src/providers/http-polling/polling-manager.ts
- packages/editor/src/utils/sync-error-messages.ts

Fix plan

The least risky fix is client-side request budgeting and batching, with a
specific 413 recovery path.

Add a client constant for the server request-body limit and use a smaller
soft budget, for example 15 MiB, to leave room for serialization details
and future metadata.
Build poll payloads in bounded batches instead of sending every registered
room in one request. Enforce both:
- MAX_ROOMS_PER_REQUEST
- serialized JSON byte length under the soft body budget
Measure the actual serialized payload size with the same object shape passed
to apiFetch, rather than estimating from raw update sizes. The update data
is already base64 at this point, so JSON.stringify( payload ) measured with
TextEncoder should be close to the request body that the REST endpoint
receives.
Change update queue handling so batching can choose a sendable prefix
without losing updates. A safe design is to add queue operations that can
peek at pending updates and then take only the updates assigned to the
current batch. Rooms and updates not assigned to the batch must remain queued.
Preserve per-room update order. If one room has many queued updates, split
that room across multiple polls at update boundaries. Do not split a single
Yjs update byte array. A single encoded update should already be bounded by
the per-update limit.
Prioritize rooms with outgoing updates, but keep a round-robin cursor for
rooms without outgoing updates so incoming updates and awareness do not
starve for secondary rooms.
Detect 413 / rest_sync_body_too_large separately from ambiguous network
failures. Because the route validator rejected the request before storing
updates, restore the exact attempted updates and retry with a smaller batch
budget. Do not replace them with compaction updates on this path.
Add regression coverage:
- unit tests for payload batching by body size and room count
- queue tests proving unsent updates remain queued and sent updates restore
  exactly on 413
- a polling-manager test where many rooms with sub-1 MiB updates are sent
  across multiple successful polls, with every serialized payload below the
  budget
- a 413 test proving the next retry shrinks the batch and does not trigger
  the disconnect modal
- an end-to-end version of this repro once the lower-level behavior is
  stable

This plan avoids new protocol semantics. It does not require server-side Yjs
chunk reassembly, does not alter sync storage, and does not raise server limits.
It makes the existing polling protocol respect the server's aggregate request
limit before sending.

Five-pass confirmation

Focused isolated reruns on 8895:

iteration 1: request 24,617,807 bytes, modal at 22.7s, repeated 413
iteration 2: request 24,617,433 bytes, modal at 22.8s, repeated 413
iteration 3: request 24,617,457 bytes, modal at 22.6s, repeated 413
iteration 4: request 24,617,468 bytes, modal at 22.8s, repeated 413
iteration 5: request 24,617,453 bytes, modal at 23.4s, repeated 413

All 5/5 runs reproduced.

Conclusion

This is a real Connection lost cause: many ordinary-sized synced edits across
many synced entities can overflow the server's 16 MiB poll-body cap, yielding
repeated 413s and the generic disconnect modal.

END AI GENERATED TEXT

Use of AI Tools

Except for the text in this PR outside of the AI generated block, everything here was AI generated including the fuzzer code that surfaced this bug.

github-actions · 2026-04-27T22:39:16Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: danluu <danluu@git.wordpress.org>
Co-authored-by: alecgeatches <alecgeatches@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2026-04-27T22:39:54Z

👋 Thanks for your first Pull Request and for helping build the future of Gutenberg and WordPress, @danluu! In case you missed it, we'd love to have you join us in our Slack community.

If you want to learn more about WordPress development in general, check out the Core Handbook full of helpful information.

alecgeatches · 2026-05-13T20:03:19Z

At first glance this PR describes a contrived situation. Because we already cap updates at MAX_UPDATE_SIZE_IN_BYTES (1 megabyte), we can't hit this issue by pasting a huge document or anything like that.

Currently, hitting the MAX_BODY_SIZE limit of 16 MB requires sending a lot of large updates on a ton of separate entities, within the same browser and second-long poll cycle. The reproduction would look more like pasting a huge document into 40 different entities in the same second. That's what collaboration-sync-body-size.spec.ts does, sending ~450 KB updates on 40 entities simultaneously. Users are typically editing one main entity at a time (with largely read-only subscriptions to other entities), so sending a large number of huge updates to so many entities in a short window is not ever likely to fit possible user behavior.

That said, if we adjust a handful of limits (possible in future changes), this situation likely becomes somewhat more reasonable and human-reproducible. For example, adjusting MAX_BODY_SIZE to close to 1 MB, and quickly making large changes to two entities could make this possible in a browser. I'm going to see if I can make a reproduction work for some version of server-side limit changes, and if I can reproduce I think this PR is still a reasonable hardening improvement.

alecgeatches · 2026-05-14T17:40:46Z

Successfully reproduced the issue! First, change a few limits:

In lib/compat/wordpress-7.0/class-wp-http-polling-sync-server.php:
```
const MAX_BODY_SIZE = 1.2 * MB_IN_BYTES;
```
and in packages/sync/src/providers/http-polling/config.ts:
```
export const POLLING_INTERVAL_WITH_COLLABORATORS_IN_MS = 10000;
```
These constants limit the max update size to 1.2 MB and give us a 10-second window to produce multiple updates that combine to be larger than the 1.2MB cap.
Then copy this big base64-encoded image into the clipboard. One of these images is below the 1.2 MB cap, but the image pasted twice will exceed it.

On `trunk`

First, this is what happens on trunk. By pasting the large-sized image update across two entities in the same poll cycle (post and pattern block), we hit the server-side update limit and see a 413 and then disconnection:

request-too-large.mov

On #77724 (client-side limits)

Here's the same test run on this PR, with this change to match the smaller server-side limits:

In packages/sync/src/providers/http-polling/config.ts:

export const MAX_SYNC_REQUEST_BODY_SIZE_IN_BYTES = 1.2 * 1024 * 1024;

request-fixed.mov

Here, the client-side code precalculates the update size, and responds by properly chunking it into two room updates. The 413 and disconnection dialog are avoided.

On #77724 (server-side limits)

There's a second mode covered by this PR if the client-side limits look okay but the server returns a 413. I was able to test this by changing this limit:

In packages/sync/src/providers/http-polling/config.ts:

export const MIN_SYNC_REQUEST_BODY_SIZE_LIMIT_IN_BYTES = 0.2 * 1024 * 1024;

and the server-side limit to 1.2 MB as before. In this path, we first see a 413 and then correctly adjust our update size to fit:

request-server-side.mov

It took a while to understand the failure modes and test here, but I'm confident this could solve a real issue if client-side or server-side constants changed to make it easier. There are a couple of small code changes I'd like to make in here before committing, but I can confirm the existing code works. Thank you @danluu!

…t value

alecgeatches

Looks good! I was able to reproduce and pushed a handful of cleanup changes and verified it still correctly chunks updates. Will merge when tests pass. Thank you!

danluu added 3 commits April 27, 2026 14:48

Add sync request body size unit tests

682c269

Add sync request body size browser test

c167779

Fix sync request body size batching

0cbf575

github-actions Bot added the [Package] Sync label Apr 27, 2026

github-actions Bot added the First-time Contributor Pull request opened by a first-time contributor to Gutenberg repository label Apr 27, 2026

Fix sync body size CI failures

4208134

t-hamano added [Feature] Real-time Collaboration Phase 3 of the Gutenberg roadmap around real-time collaboration [Type] Bug An existing feature does not function as intended labels Apr 28, 2026

alecgeatches added 6 commits May 14, 2026 11:42

Change constant name

6cd0b5b

Use getJsonByteLength instead of base64 string length defensively

8c65f36

Avoid push + overwrite on pending updates

f109819

Minor test updates

b840672

Merge branch 'trunk' into try/rtc-sync-body-size-pr

dbd73be

Revert MIN_SYNC_REQUEST_BODY_SIZE_LIMIT_IN_BYTES size change from tes…

7415d53

…t value

alecgeatches approved these changes May 14, 2026

View reviewed changes

alecgeatches enabled auto-merge (squash) May 14, 2026 19:47

alecgeatches merged commit 2b5a7a9 into WordPress:trunk May 14, 2026
39 of 44 checks passed

github-actions Bot added this to the Gutenberg 23.3 milestone May 14, 2026

alecgeatches mentioned this pull request May 15, 2026

RTC: Fuzz testing #77716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTC: Fix connection lost error modal when /wp-json/wp-sync/v1/updates exceeds 16 MiB limit#77724

RTC: Fix connection lost error modal when /wp-json/wp-sync/v1/updates exceeds 16 MiB limit#77724
alecgeatches merged 10 commits into
WordPress:trunkfrom
danluu:try/rtc-sync-body-size-pr

danluu commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

alecgeatches commented May 13, 2026 •

edited

Loading

Uh oh!

alecgeatches commented May 14, 2026 •

edited

Loading

Uh oh!

alecgeatches left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danluu commented Apr 27, 2026

What?

BEGIN AI GENERATED TEXT

Why this is distinct

Repro shape

Reproduction levels

Payload-shape repro

Server-validator repro

Browser HTTP repro

How this was introduced

Issue analysis

Relevant code path

Fix plan

Five-pass confirmation

Conclusion

END AI GENERATED TEXT

Use of AI Tools

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

alecgeatches commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alecgeatches commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

On trunk

On #77724 (client-side limits)

On #77724 (server-side limits)

Uh oh!

alecgeatches left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 27, 2026 •

edited

Loading

alecgeatches commented May 13, 2026 •

edited

Loading

alecgeatches commented May 14, 2026 •

edited

Loading

On `trunk`