synchro: fix snapshot having an outdated synchro state#11858
synchro: fix snapshot having an outdated synchro state#11858
Conversation
9118286 to
39eafce
Compare
39eafce to
af8acae
Compare
|
I've checked out the commits and see no flaws, everything is clean and understandable. Let's just fix the EE and we're good to go. Thank you a lot for working on this, I really liked, how commits are splitted |
af8acae to
0459dcb
Compare
Serpentian
left a comment
There was a problem hiding this comment.
Thank you for the patches! Clean and well splitted. I'm really glad, that the protocol related stuff is finally back in the relay's code!
Should be merged alongside the https://github.com/tarantool/tarantool-ee/pull/1510
sergepetrenko
left a comment
There was a problem hiding this comment.
@Gerold103, thanks for the patch!
I have no objections, LGTM.
Please, rebase on top of current master and we'll be good to go.
The test had a hardcoded value of ETIMEDOUT in one of the checked error messages. The value didn't match the one on used on one of the MacOS versions. Lets make this value determined at runtime, so the test is platform-agnostic. NO_DOC=test NO_CHANGELOG=test
It was previously only available directly in WAL API. This forced the checkpoint build code to rely on WAL, breaking the journal's API encapsulation. While now it is not a big deal, it is going to become one in the next commits, where the checkpoint build code is moved into another place not having WAL as a dependency. Lets not drag WAL there and generalize the journal checkpoint API. Needed for #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
Those files were about atomic build of the "global checkpoint" of all transaction-related control states with no observable after-effects (nothing changed on disk and in any of those control states). But this code conveniently contains a lot of intricate steps which could be reused for an on-disk checkpoint creation (snapshot). That would turn it into something bigger than just "txn" checkpoint. Lets prepare to that by firstly renaming the files. The next commit will change the contents. It is separated from the file rename in order not to confuse git into thinking that the old txn_checkpoint files are rewritten from scratch. Needed for #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
txn_checkpoint was about atomic build of the "global checkpoint" of all transaction-related control states with no observable after-effects (nothing changed on disk and in any of those control states). But this code conveniently contains a lot of intricate steps which could be reused for an on-disk checkpoint creation (snapshot). That would turn it into something bigger than just "txn" checkpoint. Lets prepare to that by renaming the struct and related functions. Needed for #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
box/checkpoint.h+c already contains the in-memory control states checkpoint creation code. Lets also move here the creation of the on-disk checkpoint. This will make it easier to reuse the quite nontrivial common code between these two kinds of checkpoints. This commit though only moves the code and doesn't intend to make any functional changes. Part of #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
It contains the data that needs to be known by the specific engines to create their checkpoints. Previously it was only a flag, but now it also contains the box_checkpoint object. This one will steal some work from the memtx checkpoint in the next commits, but its data is still going to be written into "memtx checkpoint" aka snapshot. Part of #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
The struct name already has "checkpoint" suffix in it. Lets remove it from the member names. It was too verbose and long. In scope of #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
Previously the collection of it was done in the memtx code. But it isn't done right and doesn't belong to just one engine. Lets move this code into an engine-agnostic place, into checkpoint.c. It is logically more natural and will allow to reuse the existing code for in-memory checkpoint build. The commit intends no functional changes and only moves the code, so it would be easier to actually fix it in the next commit. Part of #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
It is going to be needed higher in the file in the next commits. Lets move it, so the next commits would only change a small part in it instead of having a 100% diff it due to the move. In scope of #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
There was a bug that the snapshot creation code would do the synchronous transaction control state collection too early. It was fully done even before the engines would begin their checkpoints. That led to a problem that the limbo and Raft checkpoints could have outdated confirmation LSN, terms, and anything else that is supposed to be written into the snapshot. The easiest way to reproduce it was to just start a snapshot when there is at least one pending synchronous transaction. Then the snap file would get the old confirmation LSN, but would also have this transaction's data. That seemed not to lead to any noticeable and visible bugs except when one explicitly reads the snapshot file and observes the old LSN or a term. But that still didn't look safe. The solution is to just collect these control states the same way as the in-memory checkpoint is being collected. Closes #11754 NO_DOC=bugfix
Previously the join/fetch-snapshot procedure would build checkpoints of the global engine-agnostic states (like limbo, raft, journal) right inside memtx_engine code. That didn't look right for memtx to take care of such broad things which really cover not only memtx. It also forced the relay to pass into memtx some additional context via struct engine_join_ctx. Lets move those things out. It should be fine to do them right in the relay code. That makes memtx join code more specific, and even no longer forces memtx join code to be executed first (probably it still should be, but for other reasons). Follow up for #11754 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=refactoring
0459dcb to
c0f722e
Compare
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.2
git worktree add -d .worktree/backport/release/3.2/11858 origin/release/3.2
cd .worktree/backport/release/3.2/11858
git switch --create backport/release/3.2/11858
git cherry-pick -x 0d329244c74693a0ae6ed475b3ac7cca5fa9bafb 07770c7033aec3eab83f2ea604df614fdf9f5008 0773c30b93b2c9a674fdbf0e4f9484bc99f90860 a1dc2019e656abdc2fe3c99b8f5502eaf90b83c5 65882120dce9b100a589b7d0aeda3ca549df47e3 a09e4710a36bc4f53fe8a746bee2a419b757b0b2 7a587df06042bbad4d5a1350abb6984b1521c912 0a4aeb3b2ec3872a63adbbde2de37a93abaaa1ff ca7ad72765a191bca8a23c7903694e626e3f52b0 c9acf8a2ad173e831660bb3486a5d337679d5268 3fcf9ab43ea1fe4c62f650fdd2b727e271b77d71 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.3
git worktree add -d .worktree/backport/release/3.3/11858 origin/release/3.3
cd .worktree/backport/release/3.3/11858
git switch --create backport/release/3.3/11858
git cherry-pick -x 0d329244c74693a0ae6ed475b3ac7cca5fa9bafb 07770c7033aec3eab83f2ea604df614fdf9f5008 0773c30b93b2c9a674fdbf0e4f9484bc99f90860 a1dc2019e656abdc2fe3c99b8f5502eaf90b83c5 65882120dce9b100a589b7d0aeda3ca549df47e3 a09e4710a36bc4f53fe8a746bee2a419b757b0b2 7a587df06042bbad4d5a1350abb6984b1521c912 0a4aeb3b2ec3872a63adbbde2de37a93abaaa1ff ca7ad72765a191bca8a23c7903694e626e3f52b0 c9acf8a2ad173e831660bb3486a5d337679d5268 3fcf9ab43ea1fe4c62f650fdd2b727e271b77d71 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.4
git worktree add -d .worktree/backport/release/3.4/11858 origin/release/3.4
cd .worktree/backport/release/3.4/11858
git switch --create backport/release/3.4/11858
git cherry-pick -x 0d329244c74693a0ae6ed475b3ac7cca5fa9bafb 07770c7033aec3eab83f2ea604df614fdf9f5008 0773c30b93b2c9a674fdbf0e4f9484bc99f90860 a1dc2019e656abdc2fe3c99b8f5502eaf90b83c5 65882120dce9b100a589b7d0aeda3ca549df47e3 a09e4710a36bc4f53fe8a746bee2a419b757b0b2 7a587df06042bbad4d5a1350abb6984b1521c912 0a4aeb3b2ec3872a63adbbde2de37a93abaaa1ff ca7ad72765a191bca8a23c7903694e626e3f52b0 c9acf8a2ad173e831660bb3486a5d337679d5268 3fcf9ab43ea1fe4c62f650fdd2b727e271b77d71 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.5
git worktree add -d .worktree/backport/release/3.5/11858 origin/release/3.5
cd .worktree/backport/release/3.5/11858
git switch --create backport/release/3.5/11858
git cherry-pick -x 0d329244c74693a0ae6ed475b3ac7cca5fa9bafb 07770c7033aec3eab83f2ea604df614fdf9f5008 0773c30b93b2c9a674fdbf0e4f9484bc99f90860 a1dc2019e656abdc2fe3c99b8f5502eaf90b83c5 65882120dce9b100a589b7d0aeda3ca549df47e3 a09e4710a36bc4f53fe8a746bee2a419b757b0b2 7a587df06042bbad4d5a1350abb6984b1521c912 0a4aeb3b2ec3872a63adbbde2de37a93abaaa1ff ca7ad72765a191bca8a23c7903694e626e3f52b0 c9acf8a2ad173e831660bb3486a5d337679d5268 3fcf9ab43ea1fe4c62f650fdd2b727e271b77d71 |
See #11754.