Skip to content

Limit amount of data send by full sync#34390

Merged
darssen merged 20 commits intotrunkfrom
update/limit-amount-data-send-by-full-sync
Dec 8, 2023
Merged

Limit amount of data send by full sync#34390
darssen merged 20 commits intotrunkfrom
update/limit-amount-data-send-by-full-sync

Conversation

@darssen
Copy link
Copy Markdown
Contributor

@darssen darssen commented Nov 30, 2023

Proposed changes:

We are seeing many OOMs caused by Full Sync which tries to send as much data as possible, ignoring the max_upload_size restriction, due to the nature how Jetpack Sync queues items.

This is an approach to send smaller chunks when in Full Sync if they are bigger than a specific threshold only for Post objects since they are the ones that feel like might be a problem based on size.

Prior logic used to chunk Posts only based on defaults using chunk_size and max_chunks. This PR overrides the get_next_chunk method for the Posts Module and incorporates existing logic about expanding posts and adding metadata that was being done later down the road in the Full Sync Process. Also, metadata trimming has been added to the process since it did not exist for Full Sync.

The idea is to make sure we limit the max amount of posts and their metadata associated to a maximum size. (For the moment it is MAX_POST_META_LENGTH + MAX_POST_CONTENT_LENGTH) All of it after trimming both posts and metadata.

In the edge case of the first post and its associated metadata being bigger than the max size allowed after trimming the content, we allow only that one post to be synced. Otherwise, we will never be able to sync that object. We might need to monitor that edge case to see if it still causes issues.

Other information:

  • Have you written new tests for your changes, if applicable?
  • Have you checked the E2E test CI results, and verified that your changes do not break them?
  • Have you tested your changes on WordPress.com, if applicable (if so, you'll see a generated comment below with a script to run)?

Jetpack product discussion

pf5801-gq-p2

Does this pull request change what data or activity we track or use?

Testing instructions:

Let's start with a self-hosted site without this branch applied (be it your local or a JN)

  • Create a bunch of posts (more than 100) with some big content. I suggest creating a very big one and using some plugin to easily duplicate
  • Go to Jetpack Debugger and Schedule a Full Sync for posts only.
  • Verify the content was synced by chunk sizes of 100 checking the corresponding sync-audit logs (link provided in the Jetpack Debugger but change the queue to immediate-send).

Now move the site to this branch

  • Go to Jetpack Debugger
  • Just in case, clean the Full Sync queue by scheduling a Full Sync for options, constants, functions and updates.
  • Schedule a Full Sync for posts only.
  • Verify the content was synced by smaller chunk sizes than 100 checking the corresponding sync-audit logs (link provided in the Jetpack Debugger but change the queue to immediate-send). The specific chunking size will depend on how big your posts are.

@darssen darssen requested a review from a team November 30, 2023 12:01
@darssen darssen self-assigned this Nov 30, 2023
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 30, 2023

Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.

  • To test on WoA, go to the Plugins menu on a WordPress.com Simple site. Click on the "Upload" button and follow the upgrade flow to be able to upload, install, and activate the Jetpack Beta plugin. Once the plugin is active, go to Jetpack > Jetpack Beta, select your plugin, and enable the update/limit-amount-data-send-by-full-sync branch.

  • To test on Simple, run the following command on your sandbox:

    bin/jetpack-downloader test jetpack update/limit-amount-data-send-by-full-sync
    

Interested in more tips and information?

  • In your local development environment, use the jetpack rsync command to sync your changes to a WoA dev blog.
  • Read more about our development workflow here: PCYsg-eg0-p2
  • Figure out when your changes will be shipped to customers here: PCYsg-eg5-p2

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 30, 2023

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Team Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


The e2e test report can be found here. Please note that it can take a few minutes after the e2e tests checks are complete for the report to be available.


Once your PR is ready for review, check one last time that all required checks appearing at the bottom of this PR are passing or skipped.
Then, add the "[Status] Needs Team Review" label and ask someone from your team review the code. Once reviewed, it can then be merged.
If you need an extra review from someone familiar with the codebase, you can update the labels from "[Status] Needs Team Review" to "[Status] Needs Review", and in that case Jetpack Approvers will do a final review of your PR.


Jetpack plugin:

The Jetpack plugin has different release cadences depending on the platform:

  • WordPress.com Simple releases happen daily.
  • WoA releases happen weekly.
  • Releases to self-hosted sites happen monthly. The next release is scheduled for January 9, 2024 (scheduled code freeze on January 8, 2024).

If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack.

Copy link
Copy Markdown
Contributor

@fgiannar fgiannar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, Juanma! Thanks for working on this!

I left some initial thoughts to start a discussion. Mostly focusing on the logic for now.

@github-actions github-actions bot added the [Plugin] Jetpack Issues about the Jetpack plugin. https://wordpress.org/plugins/jetpack/ label Dec 8, 2023
@darssen darssen added [Status] Needs Review This PR is ready for review. and removed [Status] In Progress labels Dec 8, 2023
Copy link
Copy Markdown
Contributor

@fgiannar fgiannar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for working on this, Juanma!

The logic is solid, I made sure to fully test in my local env and on a JN site with/without MAX_SIZE_FULL_SYNC enforced.
Without the MAX_SIZE_FULL_SYNC enforced, we are fully syncing 100 posts at the time as we used to.
With the MAX_SIZE_FULL_SYNC enforced, we are fully syncing in chunks < 100, depending on the limit.
I also compared the remote/Cache site posts/tags/relationships and they also match.

Appreciate the unit tests too 👍

As a reminder, lets make sure to refactor send_full_sync_actions in a follow up PR as we are currently making 1 redundant query/processing of items (not related to the current PR tho)

@fgiannar fgiannar added [Status] Ready to Merge Go ahead, you can push that green button! and removed [Status] Needs Review This PR is ready for review. labels Dec 8, 2023
@darssen darssen merged commit adb1d33 into trunk Dec 8, 2023
@darssen darssen deleted the update/limit-amount-data-send-by-full-sync branch December 8, 2023 12:38
@github-actions github-actions bot removed the [Status] Ready to Merge Go ahead, you can push that green button! label Dec 8, 2023
@github-actions github-actions bot added this to the jetpack/13.0 milestone Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Package] Sync [Plugin] Jetpack Issues about the Jetpack plugin. https://wordpress.org/plugins/jetpack/ [Tests] Includes Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants