Limit amount of data send by full sync#34390
Conversation
…ded and chunk size is 1. we cannot make the chunk smaller
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 The e2e test report can be found here. Please note that it can take a few minutes after the e2e tests checks are complete for the report to be available. Once your PR is ready for review, check one last time that all required checks appearing at the bottom of this PR are passing or skipped. Jetpack plugin: The Jetpack plugin has different release cadences depending on the platform:
If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack. |
fgiannar
left a comment
There was a problem hiding this comment.
Amazing work, Juanma! Thanks for working on this!
I left some initial thoughts to start a discussion. Mostly focusing on the logic for now.
-Added trim of metadata into the logic since it wasn't done for Full Sync
…data-send-by-full-sync
…rwritten if $objects is of different type as happens in Post now
…Sync Legacy than for Full Sync Immediately
…e don't loop over it for the remaining posts.
…ra time when the data will not be used
…data-send-by-full-sync
… wasnot being incremented properly.
fgiannar
left a comment
There was a problem hiding this comment.
Many thanks for working on this, Juanma!
The logic is solid, I made sure to fully test in my local env and on a JN site with/without MAX_SIZE_FULL_SYNC enforced.
Without the MAX_SIZE_FULL_SYNC enforced, we are fully syncing 100 posts at the time as we used to.
With the MAX_SIZE_FULL_SYNC enforced, we are fully syncing in chunks < 100, depending on the limit.
I also compared the remote/Cache site posts/tags/relationships and they also match.
Appreciate the unit tests too 👍
As a reminder, lets make sure to refactor send_full_sync_actions in a follow up PR as we are currently making 1 redundant query/processing of items (not related to the current PR tho)
Proposed changes:
We are seeing many OOMs caused by Full Sync which tries to send as much data as possible, ignoring the
max_upload_sizerestriction, due to the nature how Jetpack Sync queues items.This is an approach to send smaller chunks when in Full Sync if they are bigger than a specific threshold only for Post objects since they are the ones that feel like might be a problem based on size.
Prior logic used to chunk Posts only based on defaults using
chunk_sizeandmax_chunks. This PR overrides theget_next_chunkmethod for thePostsModule and incorporates existing logic about expanding posts and adding metadata that was being done later down the road in the Full Sync Process. Also, metadata trimming has been added to the process since it did not exist for Full Sync.The idea is to make sure we limit the max amount of posts and their metadata associated to a maximum size. (For the moment it is
MAX_POST_META_LENGTH + MAX_POST_CONTENT_LENGTH) All of it after trimming both posts and metadata.In the edge case of the first post and its associated metadata being bigger than the max size allowed after trimming the content, we allow only that one post to be synced. Otherwise, we will never be able to sync that object. We might need to monitor that edge case to see if it still causes issues.
Other information:
Jetpack product discussion
pf5801-gq-p2
Does this pull request change what data or activity we track or use?
Testing instructions:
Let's start with a self-hosted site without this branch applied (be it your local or a JN)
immediate-send).Now move the site to this branch
immediate-send). The specific chunking size will depend on how big your posts are.