vreplication timeout query optimizer hints#13840
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
3a32cad to
12dd91b
Compare
c68c15b to
acad283
Compare
|
I got feedback to not move the flag to another package, but I get circular dependencies otherwise, not sure how to proceed here |
Sorry, meant to take a look at this earlier. My thought was to move the flag to tablet-level. I will take a look first thing tomorrow and push to this PR if that is OK. I can take a look at the failing unit tests too then. |
This commit moves the flag to the vttablet level.
Didn't look at the failing tests since I saw you had some recent commits to try to address them. Let me know if you want me to take a look at them. |
a4f7152 to
cabe21f
Compare
|
Thanks for that commit, I've pulled it in! I took a stab at the tests, but I would love some help with the remaining ones |
Only one other test related to Can you pull that and let's see if all tests pass? Thanks. |
|
Tests are all now passing! I'm also trying to find someone with permissions to update the hubspot repo settings so others can actually push to my branches 😅 |
|
@olyazavr, great, good to see that all tests are passing. Following up on the slack thread regarding this issue: did this PR fix your existing issue? Especially the fact that we are adding the timeout hints on the target for the insert/updates. Usually we have been seeing issues on the source while selecting rows in the copy phase. So just wanted to confirm that it does work. Can you also add your specific findings to the description in the issue, just so that we have it all in one place? Thanks! |
|
Yes, it did fix the issue, and we are now able to run migrations on larger/busier keyspaceshards that previously could not run schema migrations on ghost or vitess (: If the question is do we need the timeout hints for anything other than the selects - I confess I didn't try this out, and added the timeout hints in all places to be consistent, and because it seemed like there was no downside to having all vreplication queries be allowed to run longer I did not test this with any other usage of vreplication other than online ddl schema migrations |
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
…r and tabletserver use it. Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
12b53eb to
507cfd9
Compare
Signed-off-by: Olga Shestopalova <oshestopalova@hubspot.com>
|
Rebased and dealt with merge conflicts |
|
Looks like all checks are passing, anything else for me to do here? |
|
I will open a PR to update the docs here: https://vitess.io/docs/19.0/reference/vreplication/flags/#vreplication_copy_phase_duration After this PR the flag is applicable on BOTH the source and target keyspaces:
|
(#1797) Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
When we try to run online ddl operations using the Vitess strategy, we see certain migrations on specific keyspace/tables fail due to timeout. The suggestion in slack was to increase mysql's
max_execution_time, which did the trick, but we would like to limit this to only vreplication, not affecting any other query.To do so, I have used the
vreplication_copy_phase_durationflag and added/*+ MAX_EXECUTION_TIME(x) */wherexis the value ofvreplication_copy_phase_durationto the vreplication queries. This involved moving thevreplication_copy_phase_durationduration flag to avoid circular dependenciesI tested this in our own environment with the previously failing schema migration and saw it make progress and complete with a 1hr setting for this flag.
Update: This has now been running in production and we no longer see online ddl schema migrations fail because of query timeouts. We see large/busy keyspaceshards able to run migrations where they previously were unable with ghost or vitess
Related Issue(s)
Fixes #14081
Checklist
Deployment Notes