vadp-dumper: fix multithreaded backup/restore issues by sebsura · Pull Request #1623 · bareos/bareos

sebsura · 2023-12-11T10:52:36Z

Thank you for contributing to the Bareos Project!

Backport of PR #1593 to bareos-23

Please check

Short description and the purpose of this PR is present above this paragraph
Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General

Is the PR title usable as CHANGELOG entry?
Purpose of the PR is understood
Commit descriptions are understandable and well formatted
Check backport line
Required backport PRs have been created

Source code quality

Source code changes are understandable
Variable and function names are meaningful
Code comments are correct (logically and spelling)
Required documentation changes are present and part of the PR

pstorz

Please see comments and let's talk.

core/src/vmware/vadp_dumper/cbuf.cc

core/src/vmware/vadp_dumper/copy_thread.cc

core/src/findlib/bfile.cc

core/src/vmware/vadp_dumper/bareos_vadp_dumper.cc

sebsura · 2023-12-13T06:44:10Z

I think its best if we continue the conversation on the main pull request instead of the backport ( #1593 )

Neither write nor read guarantee that they will finish everything in one call. One has to repeatedly call them in a loop to ensure that everything gets written/read.

This code path is only taken if multithreaded restore is selected.

Since the input size is a variable, we need to ensure that we hold enough capacity to actually read the data into the buffer. Otherwise, if we get unlucky, we will run into the situation were the first few cbts are very small, and the rest are really big, ensuring a buffer overflow; possibly causing a crash.

We need to ensure that the writer does not write into the block we are currently reading. Currently we only distinguish between data thats inside the queue (which can neither be written to nor read from) and data thats outside the queue (which can _both_ be written to and read from). To fix this data race we split the data into three categories instead: - The head of the queue (which can only be read from) - The rest of the queue (which can neither be read from nor written to) - The data outside the queue (which can only be written to) To accomplish this we split the normal dequeue() function into two parts. The normal dequeue() becomes peek-like, only returning the head element, and the new function FinishDeq() does the actual dequeuing.

The general gist before was this: * copy thread: - Wait for start - Wait for dequeue - if data: handle data; loop - if no data: break inner (this only happens on flush) - `signal` that queue was fushed and that the thread is waiting for work - goto begin * main thread - do some setup - start enqueueing data; signal start to thread if not started (this was done with an unsynchronized read!) - flush when done - During cleanup, cancel the thread while it is waiting on a start signal. with some locks & unlocks sprinkled in. One should note that the cleanup copy thread callback always unlockes the lock regardless of whether the copy thread locked the mutex or not! This was slightly changed: * Clearly there is no need to wait for "start" at all, since we wait on the dequeue anyways -- so this was removed. * Instead of just canceling the thread, we set a flag that tells the Copythread to exit, which the thread checks after every flush. As such we also make sure flush the queue on cleanup. * We also now properly initialize all thread_context members!

Since sectors_per_call is not actually bounded by SECTORS_PER_CALL, we need to ensure that the buffer has enough space at runtime.

This function returns -- up to a certain amount of precision -- a list of intervals of all sectors containing useful (i.e. allocated) data. For full backups its recommended to backup only the allocated blocks; for other kinds of backups (incremental/differential) it is only necessary to backup those sectors that are both changed & allocated. Since we currently cannot take advantage of information regarding unallocated, changed blocks, we just ignore them. In the future these could be used for faster restores & consolidation. The algorithm used for finding out the intersection is trivial as both lists are sorted (and the intervals themselves are disjoint in each set). As such it is enough to just go through both arrays linearly at the same time, sending the pairwise intersection if any, and finally advancing the pointer to the "smaller" interval.

Since we cannot meaningfully recover from a mutex error, we should just assert that they do not happen.

sebsura added the is a backport to 23 label Dec 11, 2023

pstorz requested changes Dec 12, 2023

View reviewed changes

pstorz force-pushed the dev/ssura/bareos-23/fix-vadp-dumper branch from 7a8ba6d to 0b9c714 Compare December 12, 2023 19:38

sebsura force-pushed the dev/ssura/bareos-23/fix-vadp-dumper branch from 0b9c714 to 1302afe Compare December 13, 2023 11:43

sebsura added 25 commits December 13, 2023 14:19

ktls: disable ktls test if no ktls is supported

579e5c6

bfile: fix not reading/writing all data

a209de1

Neither write nor read guarantee that they will finish everything in one call. One has to repeatedly call them in a loop to ensure that everything gets written/read.

vadp-dumper: add error message on read/write failure

48aed31

vadp-dumper: read from stdin during restore

4dab69a

This code path is only taken if multithreaded restore is selected.

vadp-dumper: apply bareos_check_sources

8b51fda

apply bareos-check-sources

73d3a65

vadp-dumper: fix meta data length

1e7f92c

vadp-dumper: add comment

df55b13

cbuf: add proper members initialization

375d1fb

copy-thread: check some return values

db87c2f

copy-thread: remove redundant flushes

d567d39

vadp-dumper: fix buffer overflow

bec7ca0

Since sectors_per_call is not actually bounded by SECTORS_PER_CALL, we need to ensure that the buffer has enough space at runtime.

vadp-dumper: output capacity in verbose mode

f5b2ad2

vadp-dumper: switch to VixDiskLib_AllocateConnectParams()

dce3f12

vadp-dumper: const -> inline constexpr

bad8e18

copy-thread: add assertions

186e069

Since we cannot meaningfully recover from a mutex error, we should just assert that they do not happen.

vadp-dumper: remove struct keyword

ef28f68

vadp-dumper: remove bail_out label

012728e

vadp-dumper: fix comments

9256adf

vadp-dumper: fix sending messages outside verbose mode

c77ab0e

vadp-dumper: fix accepting bad sector_per_call values

87e3e13

sebsura added 2 commits December 13, 2023 14:19

copy-thread: remove debug message

fa1a95b

vadp-dumper: add comments describing the merging algorithm

7c6ba5b

sebsura force-pushed the dev/ssura/bareos-23/fix-vadp-dumper branch from 1302afe to 7c6ba5b Compare December 13, 2023 13:19

pstorz self-requested a review December 13, 2023 17:25

pstorz approved these changes Dec 13, 2023

View reviewed changes

Update CHANGELOG.md

19d5ea1

BareosBot merged commit 0f6e6ee into bareos:bareos-23 Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vadp-dumper: fix multithreaded backup/restore issues#1623

vadp-dumper: fix multithreaded backup/restore issues#1623
BareosBot merged 28 commits intobareos:bareos-23from
sebsura:dev/ssura/bareos-23/fix-vadp-dumper

sebsura commented Dec 11, 2023 •

edited by pstorz

Loading

Uh oh!

pstorz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebsura commented Dec 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sebsura commented Dec 11, 2023 • edited by pstorz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thank you for contributing to the Bareos Project!

Please check

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

General

Source code quality

Uh oh!

pstorz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebsura commented Dec 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sebsura commented Dec 11, 2023 •

edited by pstorz

Loading