vadp-dumper: fix multithreaded backup/restore issues#1623
Merged
BareosBot merged 28 commits intobareos:bareos-23from Dec 13, 2023
Merged
vadp-dumper: fix multithreaded backup/restore issues#1623BareosBot merged 28 commits intobareos:bareos-23from
BareosBot merged 28 commits intobareos:bareos-23from
Conversation
pstorz
requested changes
Dec 12, 2023
Member
pstorz
left a comment
There was a problem hiding this comment.
Please see comments and let's talk.
7a8ba6d to
0b9c714
Compare
Contributor
Author
|
I think its best if we continue the conversation on the main pull request instead of the backport ( #1593 ) |
0b9c714 to
1302afe
Compare
Neither write nor read guarantee that they will finish everything in one call. One has to repeatedly call them in a loop to ensure that everything gets written/read.
This code path is only taken if multithreaded restore is selected.
Since the input size is a variable, we need to ensure that we hold enough capacity to actually read the data into the buffer. Otherwise, if we get unlucky, we will run into the situation were the first few cbts are very small, and the rest are really big, ensuring a buffer overflow; possibly causing a crash.
We need to ensure that the writer does not write into the block we are currently reading. Currently we only distinguish between data thats inside the queue (which can neither be written to nor read from) and data thats outside the queue (which can _both_ be written to and read from). To fix this data race we split the data into three categories instead: - The head of the queue (which can only be read from) - The rest of the queue (which can neither be read from nor written to) - The data outside the queue (which can only be written to) To accomplish this we split the normal dequeue() function into two parts. The normal dequeue() becomes peek-like, only returning the head element, and the new function FinishDeq() does the actual dequeuing.
The general gist before was this: * copy thread: - Wait for start - Wait for dequeue - if data: handle data; loop - if no data: break inner (this only happens on flush) - `signal` that queue was fushed and that the thread is waiting for work - goto begin * main thread - do some setup - start enqueueing data; signal start to thread if not started (this was done with an unsynchronized read!) - flush when done - During cleanup, cancel the thread while it is waiting on a start signal. with some locks & unlocks sprinkled in. One should note that the cleanup copy thread callback always unlockes the lock regardless of whether the copy thread locked the mutex or not! This was slightly changed: * Clearly there is no need to wait for "start" at all, since we wait on the dequeue anyways -- so this was removed. * Instead of just canceling the thread, we set a flag that tells the Copythread to exit, which the thread checks after every flush. As such we also make sure flush the queue on cleanup. * We also now properly initialize all thread_context members!
Since sectors_per_call is not actually bounded by SECTORS_PER_CALL, we need to ensure that the buffer has enough space at runtime.
This function returns -- up to a certain amount of precision -- a list of intervals of all sectors containing useful (i.e. allocated) data. For full backups its recommended to backup only the allocated blocks; for other kinds of backups (incremental/differential) it is only necessary to backup those sectors that are both changed & allocated. Since we currently cannot take advantage of information regarding unallocated, changed blocks, we just ignore them. In the future these could be used for faster restores & consolidation. The algorithm used for finding out the intersection is trivial as both lists are sorted (and the intervals themselves are disjoint in each set). As such it is enough to just go through both arrays linearly at the same time, sending the pairwise intersection if any, and finally advancing the pointer to the "smaller" interval.
Since we cannot meaningfully recover from a mutex error, we should just assert that they do not happen.
1302afe to
7c6ba5b
Compare
pstorz
approved these changes
Dec 13, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thank you for contributing to the Bareos Project!
Backport of PR #1593 to bareos-23
Please check
If you have any questions or problems, please give a comment in the PR.
Helpful documentation and best practices
Checklist for the reviewer of the PR (will be processed by the Bareos team)
Make sure you check/merge the PR using
devtools/pr-toolto have some simple automated checks run and a proper changelog record added.General
Source code quality