Skip to content

VReplication: Align VReplication and VTGate VStream Retry Logic#17783

Merged
mattlord merged 6 commits intovitessio:mainfrom
planetscale:vstream_manager_retries
Feb 19, 2025
Merged

VReplication: Align VReplication and VTGate VStream Retry Logic#17783
mattlord merged 6 commits intovitessio:mainfrom
planetscale:vstream_manager_retries

Conversation

@mattlord
Copy link
Copy Markdown
Member

@mattlord mattlord commented Feb 13, 2025

Description

In VTGate VStreams, we've been returning errors to the client in various cases where a retry is likely to succeed. The most egregious example is the query interrupted MySQL error (see issue) that we expect to get from each tablet during the copy phase — when multiple copy phase cycles are required — every vreplication_copy_phase_duration. We expect that for large tables in Vitess v19.0 and later after the work done in #13840 .

This PR addresses this case and a variety of others by leveraging retry logic that is similar to what we use for vreplication streams.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Signed-off-by: Matt Lord <mattalord@gmail.com>
@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot bot commented Feb 13, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Feb 13, 2025
@github-actions github-actions bot added this to the v22.0.0 milestone Feb 13, 2025
@mattlord mattlord changed the title VReplication: Unify vreplication and vtgate vstream retry logic VReplication: Unify VReplication and VTGate VStream Retry Logic Feb 13, 2025
@mattlord mattlord removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 13, 2025
@mattlord mattlord force-pushed the vstream_manager_retries branch 2 times, most recently from 47ef9af to b031939 Compare February 13, 2025 22:48
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vstream_manager_retries branch from b031939 to 8f31754 Compare February 13, 2025 22:53
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.45%. Comparing base (000fbbe) to head (7c5c747).
Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17783      +/-   ##
==========================================
- Coverage   67.95%   67.45%   -0.50%     
==========================================
  Files        1586     1592       +6     
  Lines      255208   258086    +2878     
==========================================
+ Hits       173423   174093     +670     
- Misses      81785    83993    +2208     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vstream_manager_retries branch 2 times, most recently from 6f2a0cc to dc93152 Compare February 13, 2025 23:56
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vstream_manager_retries branch from dc93152 to 4ef0558 Compare February 13, 2025 23:57
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Feb 14, 2025
@mattlord mattlord marked this pull request as ready for review February 14, 2025 01:01
@mattlord mattlord requested review from notfelineit and removed request for harshit-gangal, systay and timvaillancourt February 14, 2025 01:02
return false, false
// For anything else, if this is a recoverable/ephemeral error -- such as a
// MAX_EXECUTION_TIME SQL error during the copy phase -- then retry.
return !vreplication.IsUnrecoverableError(err), false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to its own package or maybe to like vterrors so we don't pull in all of the vreplication package here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind doing that, but from when I've looked at the binaries built in cases like this the binary only includes the necessary symbols.

Maybe you're thinking more about the package level management than the actual binaries though?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you're thinking more about the package level management than the actual binaries though?

More from a design perspective, but also even if it's not in the final binary, it stil compiles a lot more etc. And I think these are cases where having its own or putting it in a smaller more focused package is useful to do. It also makes it easier to reuse by avoiding things like import loops in the future which would force the same refactor then anyway too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No disagreement there. It's not quite general purpose, but I can move it to vterrors or somewhere else and include VReplication in the name.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to create a subdir and vreplerror package but after thinking about it more... we don't need to re-use the same function in the vstream manager as our context is a little different (e.g. it's read-only SELECTs in that context so a much smaller set of relevant errors). So I switched over to using a similar, but more generic, function that I had created for this purpose some time back: 7c5c747

Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord changed the title VReplication: Unify VReplication and VTGate VStream Retry Logic VReplication: Align VReplication and VTGate VStream Retry Logic Feb 19, 2025
@mattlord mattlord merged commit 5c27b40 into vitessio:main Feb 19, 2025
103 checks passed
@mattlord mattlord deleted the vstream_manager_retries branch February 19, 2025 15:59
twthorn pushed a commit to slackhq/vitess that referenced this pull request Apr 1, 2025
…ssio#17783)

Signed-off-by: Thomas Thornton <tthornton@salesforce.com>
tanjinx pushed a commit to slackhq/vitess that referenced this pull request Apr 8, 2025
* VStream API: validate that last PK has fields defined (vitessio#16478)

Signed-off-by: Thomas Thornton <tthornton@salesforce.com>

* Remove go/vt/vttablet/tabletserver/vstreamer/vstreamer_test.go

* VReplication: Align VReplication and VTGate VStream Retry Logic (vitessio#17783)

Signed-off-by: Thomas Thornton <tthornton@salesforce.com>

* Fix import

* Undo import changes

---------

Signed-off-by: Thomas Thornton <tthornton@salesforce.com>
Co-authored-by: Thomas Thornton <tthornton@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: VTGate VStream API Does not Properly Handle Multiple Copy Phase Cycles

3 participants