Skip to content

consolidate: disable vfull duplicate job check#1739

Merged
BareosBot merged 9 commits intobareos:masterfrom
SamuelBoerlin:consolidate-ignoreduplicatecheck
Jul 23, 2024
Merged

consolidate: disable vfull duplicate job check#1739
BareosBot merged 9 commits intobareos:masterfrom
SamuelBoerlin:consolidate-ignoreduplicatecheck

Conversation

@SamuelBoerlin
Copy link
Contributor

@SamuelBoerlin SamuelBoerlin commented Mar 18, 2024

Thank you for contributing to the Bareos Project!

This PR sets IgnoreDuplicateJobChecking = true for consolidate vfulls (i.e. just like migration/copy jobs).
Currently consolidate vfulls also take part in the "Allow Duplicate Job" logic which can end up causing other jobs to be cancelled. In my opinion consolidation should not affect other jobs like that, and I can't currently think of a case where you would want it to.
I ran into this problem when my incremental jobs were being cancelled due to a long running consolidation job: https://groups.google.com/g/bareos-users/c/iqx4JSjSBxE

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
    Check backport line
    Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR
Tests
  • Decision taken that a test is required (if not, then remove this paragraph)
  • The choice of the type of test (unit test or systemtest) is reasonable
  • Testname matches exactly what is being tested
  • On a fail, output of the test leads quickly to the origin of the fault

@sebsura
Copy link
Contributor

sebsura commented Mar 28, 2024

Are you still working on this PR as its labeled as draft ?

@SamuelBoerlin
Copy link
Contributor Author

Are you still working on this PR as its labeled as draft ?

Yes. I wanted to add systemtests to ensure that ignoreduplicatecheck is set for consolidate/migrate/copy jobs and works as intended, but haven't had the time yet.

@SamuelBoerlin SamuelBoerlin marked this pull request as ready for review April 10, 2024 13:55
@SamuelBoerlin
Copy link
Contributor Author

Hi @sebsura, the PR would now be ready for review whenever you have time.

@sebsura sebsura removed the draft label Apr 15, 2024
@sebsura
Copy link
Contributor

sebsura commented Apr 23, 2024

While this fixes the problem that you cannot start a virtual full job while another normal job is running, this does not fix the reverse: if you have a virtual full running, you still cannot start a normal job.

I think the best approach to fix the second issue is to ignore jobs that ignore duplicates when checking for duplicates.

Do you want to try to fix this ?

@SamuelBoerlin
Copy link
Contributor Author

While this fixes the problem that you cannot start a virtual full job while another normal job is running, this does not fix the reverse: if you have a virtual full running, you still cannot start a normal job.

I think the best approach to fix the second issue is to ignore jobs that ignore duplicates when checking for duplicates.

Do you want to try to fix this ?

Thanks for taking a look!

Hm, not quite sure I follow. From my understanding IgnoreDuplicateJobChecking already goes both ways, no?
If it is set then the job is ignored during the duplicate checks: https://github.com/bareos/bareos/blob/master/core/src/dird/job.cc#L885-L904

What you're describing is what I'm testing in the added system tests: a consolidate job is started and then a normal job is started while the consolidate virtual full is still running. Usually the normal job would get cancelled. But after this change it is now no longer cancelled.

@SamuelBoerlin
Copy link
Contributor Author

Just realized that this change might actually cause an issue if you have consolidate jobs that are running for a long time. Currently, if you have a still running consolidate VF and then a new duplicate consolidation VF is started it would be cancelled if you have something like this:

  Allow Duplicate Jobs = no
  Cancel Lower Level Duplicates = yes
  Cancel Queued Duplicates = no
  Cancel Running Duplicates = no

After this change this would of course no longer be the case because duplicate job checking is disabled. I guess duplicate consolidations could still be mitigated with setting MaxConcurrentJobs = 1 in the Consolidate job.

It would probably be better to prevent duplicate/conflicting consolidation jobs in the first place, though. What do you think?
We could perhaps check at this point here https://github.com/bareos/bareos/blob/master/core/src/dird/consolidate.cc#L276 whether there is already another always-incremental VF job running with overlapping vf_jobids.

@sebsura
Copy link
Contributor

sebsura commented Apr 24, 2024

Hm, not quite sure I follow. From my understanding IgnoreDuplicateJobChecking already goes both ways, no?

You are right!

We could perhaps check at this point here https://github.com/bareos/bareos/blob/master/core/src/dird/consolidate.cc#L276 whether there is already another always-incremental VF job running with overlapping vf_jobids.

Ill have to think about that for a bit. The code is currently not set up in a way where you can inspect another jobs vf_jobids, e.g. there is no lock protecting this member, and this might fail if vf_jobids is null (which is a special case, see GetVfJobids() in vbackup.cc) or if two of these jobs are started almost at the same time.

@sebsura
Copy link
Contributor

sebsura commented Apr 29, 2024

I would suggest that we check inside AllowDuplicateJob() whether there are any other always incremental jobs that "run" on the same client and have the same fileset (dir_impl->res.client/fileset).

@SamuelBoerlin
Copy link
Contributor Author

I've now also added a systemtest for the duplicate consolidation job cancellation.

@SamuelBoerlin
Copy link
Contributor Author

Ah whoops, only just now saw that the always-incremental-consolidate tests have been rewritten. I'll rebase the changes.

@SamuelBoerlin SamuelBoerlin force-pushed the consolidate-ignoreduplicatecheck branch from 9aba2a4 to 5f2b9c1 Compare May 21, 2024 10:03
@sebsura
Copy link
Contributor

sebsura commented May 23, 2024

The changes look good. I split up your test as well. There is one thing im currently looking into and afterwards ill push my changes.

@sebsura sebsura force-pushed the consolidate-ignoreduplicatecheck branch from 3207b4c to facd0bc Compare May 23, 2024 12:06
@sebsura
Copy link
Contributor

sebsura commented May 23, 2024

I squashed the fixup commits as well as fixing the copyright year on your new test (it was 2021-2024 before).
Let me know what you think of the split up test. Otherwise I would be happy to get this merged.

@SamuelBoerlin
Copy link
Contributor Author

Looks good to me, thanks!

@SamuelBoerlin
Copy link
Contributor Author

Hi! Are there problems with the tests? Or does the PR need some more changes?

@sebsura
Copy link
Contributor

sebsura commented Jun 14, 2024

The PR is fine. Some weird errors popped up during the testing run. I am still trying to reproduce the issue.

@sebsura sebsura force-pushed the consolidate-ignoreduplicatecheck branch from 11728aa to 95d51b0 Compare June 20, 2024 04:37
@sebsura
Copy link
Contributor

sebsura commented Jun 20, 2024

Just to give you a small update: Your tests found another bug that should be fixed by #1859 . As soon as that pr is merged, this should get merged as well.

@arogge arogge added this to the 24.0.0 milestone Jun 25, 2024
@arogge arogge force-pushed the consolidate-ignoreduplicatecheck branch from 4cf97b8 to 3c1a189 Compare July 19, 2024 17:16
@BareosBot BareosBot force-pushed the consolidate-ignoreduplicatecheck branch from da6a67d to 768cf2e Compare July 23, 2024 10:44
@BareosBot BareosBot merged commit 14afd0b into bareos:master Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants