Skip to content

stored: fix incoherent meta data when concurrently writing to the same volume#1495

Merged
BareosBot merged 19 commits intobareos:masterfrom
sebsura:dev/ssura/master/fix-parallel-jobs
Aug 1, 2023
Merged

stored: fix incoherent meta data when concurrently writing to the same volume#1495
BareosBot merged 19 commits intobareos:masterfrom
sebsura:dev/ssura/master/fix-parallel-jobs

Conversation

@sebsura
Copy link
Contributor

@sebsura sebsura commented Jun 28, 2023

Thank you for contributing to the Bareos Project!

Currently letting multiple jobs write to a device in parallel will cause the bareos database to go out of sync with reality leading to data loss at worst. This PR improves the communication between the sd and the director so that this does not happen.

We do this by

  • creating "null" jobmedia records whenever a job writes to a volume the first time, and
  • by always sending the last jobmedia record to the director even if the volume we wrote to is not the current volume.

This should stop the director from recycling actually used volumes and stop the storage daemon from not sending all jobmedia records.

We also slim down the jobmedia table somewhat by always deleting all null jobmedia records whenever a job is finished.

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Check backport line
  • Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
Tests
  • Decision taken that a test is required (if not, then remove this paragraph)
  • The choice of the type of test (unit test or systemtest) is reasonable
  • Testname matches exactly what is being tested
  • On a fail, output of the test leads quickly to the origin of the fault

@sebsura sebsura force-pushed the dev/ssura/master/fix-parallel-jobs branch 3 times, most recently from 190ef05 to 956ee25 Compare June 28, 2023 09:52
@pstorz pstorz requested a review from alaaeddineelamri June 29, 2023 10:23
@sebsura sebsura added the bugfix label Jul 3, 2023
Copy link
Contributor

@alaaeddineelamri alaaeddineelamri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make the PR title a bit more explicit to what is being fixed.
maybe fix parallel jobs incoherent meta data when concurrently writing to the same volume for example?

@sebsura sebsura changed the title stored: fix parallel jobs stored: fix incoherent meta data when concurrently writing to the same volume Jul 10, 2023
@sebsura sebsura requested a review from alaaeddineelamri July 10, 2023 09:53
@BareosBot BareosBot force-pushed the dev/ssura/master/fix-parallel-jobs branch from 544c88e to a9a0351 Compare August 1, 2023 13:31
@BareosBot BareosBot merged commit 136248f into bareos:master Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants