Address long-running queries and OOM contributor in PHP replacement#180
Merged
schlessera merged 2 commits intowp-cli:mainfrom May 11, 2023
Merged
Conversation
Member
|
Thanks, @brandonpayton ! I asked @schlessera for his thoughts about including this in the v2.8.0 release: https://wordpress.slack.com/archives/C02RP4T41/p1683671164356189 |
Contributor
Author
Thank you! |
schlessera
approved these changes
May 10, 2023
1b70852 to
3872c8d
Compare
Member
|
GHA is acting up... :/ |
Contributor
Author
|
Should I manually rebase and squash this one before merge? I guess it could be a way to try triggering GHA at least, but asking first so I don't get in your way. |
This change makes PHP replacement queries and memory usage more efficient. It improves the primary key SELECT queries by eliminating the use of OFFSET because OFFSET requires that the database consider all rows up to OFFSET before taking rows up to the LIMIT. The new query relies on primary key conditions to more efficiently eliminate previous rows from consideration. This way, the database can use an index to identify rows with keys greater than those of the previous chunk. It improves memory usage by doing updates along the way rather than storing all a column's updates in memory until the end. At Automattic, when we limit search-replace to 4GB of memory, we sometimes exceed that limit for large sites. It's possible there are other things that contribute to high memory usage within the search-replace command, but as a first step, we can reduce memory requirements by no longer keeping all updated column values in memory simultaneously.
c7d639b to
56faa0c
Compare
Contributor
Author
|
Squashed and pushed. Planning to leave this be for now unless you ask for other changes :) Didn't seem to trigger any actions. |
Member
|
Thanks for the PR and the extended testing, @brandonpayton ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change makes PHP replacement queries and memory usage more efficient in order to address issues we've encountered with long-running queries and OOM conditions with certain sites.
It improves the primary key SELECT queries by eliminating the use of OFFSET because OFFSET requires that the database consider all rows up to OFFSET before taking rows up to the LIMIT. This can make queries become slower as OFFSET increases. The new SELECT query relies on primary key conditions to more efficiently eliminate previous rows from consideration. This way, the database can use an index to identify rows with keys greater than those of the previous chunk and then take rows from that set up to the LIMIT.
It improves memory usage by doing updates along the way rather than storing all a column's updated values in memory until the end. At Automattic, when we limit search-replace to 4GB of memory, we sometimes exceed that limit for large sites. It's possible there are other things that contribute to high memory usage within the search-replace command, but as a first step, we can reduce memory requirements by no longer keeping all updated column values in memory simultaneously.
This is a resurrection of #176 which was closed so we could exercise these updates in production before sharing them with the community. The
--regexpath of this change has been used in production for the last 3-4 months, and we have encountered no issues so far.Prior to this PR, there were tests for the chunked
--precisesearch-replace but none for the chunked--regexpath, so this PR adds tests for chunked--regexreplacement based on the--precisetests.