Skip to content

Address long-running queries and OOM contributor in PHP replacement#176

Closed
brandonpayton wants to merge 1 commit intowp-cli:mainfrom
brandonpayton:make-php-replacement-more-efficient
Closed

Address long-running queries and OOM contributor in PHP replacement#176
brandonpayton wants to merge 1 commit intowp-cli:mainfrom
brandonpayton:make-php-replacement-more-efficient

Conversation

@brandonpayton
Copy link
Contributor

This change makes PHP replacement queries and memory usage more efficient in order to address issues we've encountered with long-running queries and OOM conditions with certain sites.

It improves the primary key SELECT queries by eliminating the use of OFFSET because OFFSET requires that the database consider all rows up to OFFSET before taking rows up to the LIMIT. This can make queries become slower as OFFSET increases. The new SELECT query relies on primary key conditions to more efficiently eliminate previous rows from consideration. This way, the database can use an index to identify rows with keys greater than those of the previous chunk and then take rows from that set up to the LIMIT.

It improves memory usage by doing updates along the way rather than storing all a column's updated values in memory until the end. At Automattic, when we limit search-replace to 4GB of memory, we sometimes exceed that limit for large sites. It's possible there are other things that contribute to high memory usage within the search-replace command, but as a first step, we can reduce memory requirements by no longer keeping all updated column values in memory simultaneously.

This change makes PHP replacement queries and memory usage more
efficient.

It improves the primary key SELECT queries by eliminating the use of
OFFSET because OFFSET requires that the database consider all rows up to
OFFSET before taking rows up to the LIMIT. The new query relies on
primary key conditions to more efficiently eliminate previous rows from
consideration. This way, the database can use an index to identify rows
with keys greater than those of the previous chunk.

It improves memory usage by doing updates along the way rather than
storing all a column's updates in memory until the end. At Automattic,
when we limit search-replace to 4GB of memory, we sometimes exceed that
limit for large sites. It's possible there are other things that
contribute to high memory usage within the search-replace command, but
as a first step, we can reduce memory requirements by no longer keeping
all updated column values in memory simultaneously.
@brandonpayton brandonpayton requested a review from a team as a code owner January 26, 2023 21:15
@danielbachhuber
Copy link
Member

@brandonpayton Has this been running in production for some time? Also, are there any particular test cases you think we should add to cover the change?

@brandonpayton
Copy link
Contributor Author

@danielbachhuber, this is not running in production yet but should be next week. I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while. I figured the current test cases would cover this code as the resulting replacements should not change, but it would be a good idea for me to review the tests and confirm that. I can do that before resubmitting the PR. Thanks!

@danielbachhuber
Copy link
Member

I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while.

@brandonpayton Sounds good, thanks!

I figured the current test cases would cover this code as the resulting replacements should not change, but it would be a good idea for me to review the tests and confirm that.

I think the functional existing tests are pretty good but may or may not be comprehensive enough to cover this. It could be fine or there could be some minor idiosyncrasy that causes problems.


foreach ( $updates as $update ) {
$wpdb->update( ...$update );
// Because we are ordering by primary keys from least to greatest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, 2 wins. Runs faster than limit/offset and solves the memory problems. Thanks for working on it. 🙇🏻

@brandonpayton
Copy link
Contributor Author

The regex path of this change has been used in production for the last 3-4 months, and we have encountered no issues so far. So I am reopening this PR for consideration.

We already had tests for the chunked precise search-replace but none for the chunked regex path, so I added some based on the "precise" tests.

@brandonpayton
Copy link
Contributor Author

Actually, I do not have permission to reopen. :) @danielbachhuber, are you up for reopening this PR?

@danielbachhuber
Copy link
Member

@brandonpayton It seems like the branch disappeared for me too:

image

Want to create a new PR?

@brandonpayton
Copy link
Contributor Author

brandonpayton commented May 9, 2023

I'll withdraw this PR for now and plan to resubmit later after it has been in production for a while.

It was the originally stated plan anyway, but I figured "why not just reopen?" I guess this is why :)

Thanks for looking, @danielbachhuber!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants