Skip to content

consolidate: make virtualfull jobs spawned by consolidate job inherit same priority and max concurrent jobs#1530

Merged
BareosBot merged 22 commits intobareos:masterfrom
SamuelBoerlin:consolidate-job-priority
Oct 19, 2023
Merged

consolidate: make virtualfull jobs spawned by consolidate job inherit same priority and max concurrent jobs#1530
BareosBot merged 22 commits intobareos:masterfrom
SamuelBoerlin:consolidate-job-priority

Conversation

@SamuelBoerlin
Copy link
Contributor

@SamuelBoerlin SamuelBoerlin commented Aug 16, 2023

Thank you for contributing to the Bareos Project!

This PR makes virtualfull jobs spawned by a consolidate job inherit the consolidate job's priority and allow mixed priority flag.
This change puts it more in line with e.g. migration/copy jobs which already behave this way.
It also allows users to give consolidate jobs a lower priority than the original jobs that are being consolidated, so that they don't block more important jobs (i.e. jobs with a higher priority than the consolidate jobs, but still lower priority than the original jobs being consolidated).

Example use case:
I've configured consolidate jobs to have the same priority as migrate/copy jobs and all of those have a lower priority than regular always incremental backups. This is because the consolidate/migrate/copy jobs can end up running for a very long time and they shouldn't block regular backups from happening. The issue with this is, that when a consolidation is started, it spaws virtualfull jobs with the same priority as the always incremental backups, which then end up blocking my migrate/copy jobs. This issue could be solved with this PR.

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Check backport line
  • Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR
Tests
  • Decision taken that a test is required (if not, then remove this paragraph)
  • The choice of the type of test (unit test or systemtest) is reasonable
  • Testname matches exactly what is being tested
  • On a fail, output of the test leads quickly to the origin of the fault

@SamuelBoerlin SamuelBoerlin marked this pull request as ready for review August 16, 2023 17:02
@bruno-at-bareos bruno-at-bareos requested a review from arogge August 17, 2023 09:32
@bruno-at-bareos
Copy link
Contributor

Hello @SamuelBoerlin
We detected a case where extending the way consolidate handle the parameter "Spool Data = yes/no" would be useful and maybe not that hard to be included into this PR.

For example if you define a job with "SpoolData=Yes" when the consolidation run the virtual full, this parameter is always applied. If it can make sense on some setup (like consolidating a lots of small incrementals), some other setup with large full jobs maybe want to skip spooling.
So we would like to see consolidate job able to handle and propagate the "SpoolData=Yes|No" parameter so end-users will be able to override this.

@SamuelBoerlin
Copy link
Contributor Author

Hi @bruno-at-bareos
Indeed, that could also be useful. I've pushed another commit for spooldata.

@bruno-at-bareos
Copy link
Contributor

Hi @bruno-at-bareos Indeed, that could also be useful. I've pushed another commit for spooldata.

Great thanks. I know some people will be happy.

@SamuelBoerlin
Copy link
Contributor Author

I've also been experimenting with adding support for MaxConcurrentJobs now (master...SamuelBoerlin:bareos:consolidate-max-concurrent-jobs), which currently cannot be configured for consolidate jobs either. Not yet tested, but if it works I might add it to this PR too.

@SamuelBoerlin SamuelBoerlin force-pushed the consolidate-job-priority branch from c260ce7 to b779be0 Compare September 16, 2023 19:55
@arogge
Copy link
Member

arogge commented Sep 18, 2023

Did I see that correctly? The recently pushed changes are for the maximum concurrent jobs change?
Would you consider this being ready for a review?

@SamuelBoerlin
Copy link
Contributor Author

Yes please, thanks!

Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good already.
I'm a bit unhappy wit the allow_mixed_priority and allow_mixed_priority_set, as it is hard to grasp what it is supposed to do at first.
I'm pretty confident we can make this work and merge it so it still goes into Bareos 23.

directordaemon::PoolResource* next_pool{}; /**< Next Pool used for migration/copy and virtual backup */
directordaemon::FilesetResource* fileset{}; /**< FileSet resource */
directordaemon::CatalogResource* catalog{}; /**< Catalog resource */
directordaemon::runtime_job_status_t* rjs{}; /**< Runtime Job Status. May point to the rjs of another resource (e.g. for consolidation vf jobs this points to the rjs of the parent consolidation job's resource) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right place for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to have it in the DirectorJcrImpl at first, but then moved it to the Resources struct because I thought it would be a better fit as the rjs is owned by a JobResource.
Should I move it back to DirectorJcrImpl?

Comment on lines +285 to +286
bool allow_mixed_priority = false;
bool allow_mixed_priority_set = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly then these two bools mimics a tri-state enum like this:

enum class allow_mixed_priority : int {
   unset = 0,
   enable,
   disable
}

or maybe more reusable:

enum class maybe_bool : int {
   unset = 0,
   enable,
   disable
}

or maybe more in the mindset of modern C++ a std::optional<bool> could be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.
However, it just occurred to me that the allowmixedpriority UA run command parameter (and therefore this code too) isn't even needed because I can just grab it later on from the rc.consolidate_job resource which simplifies things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two bools have now been removed.

char* since = nullptr;
char* StoreName = nullptr;
char* verify_job_name = nullptr;
char* consolidate_job_name = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need the name and the pointer to the resource stored here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really, it's just how verify_job and previous_job e.g. work too and so I followed that. But I can use a local variable instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is now removed.

@SamuelBoerlin SamuelBoerlin changed the title consolidate: make virtualfull jobs spawned by consolidate job inherit same priority consolidate: make virtualfull jobs spawned by consolidate job inherit same priority and max concurrent jobs Sep 26, 2023
Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for the effort!

Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm pretty happy with your changes, I'll have to request changes again. Sadly, in continuous integration your test fails at least sometimes. For the last build it failed on 9 out of 14 platforms. The requirement for a merge is to pass on all platforms.
I'm not sure what exactly went wrong and as I'll be out-of-office next week I won't have much time to look into it.
I would be glad if you could get in touch with me via e-mail (andreas.rogge@bareos.com) so we can discuss how we can provide access to the CI server's logs for you, so you can at least try to sort out the issue yourself.

@SamuelBoerlin
Copy link
Contributor Author

While I'm pretty happy with your changes, I'll have to request changes again. Sadly, in continuous integration your test fails at least sometimes. For the last build it failed on 9 out of 14 platforms. The requirement for a merge is to pass on all platforms. I'm not sure what exactly went wrong and as I'll be out-of-office next week I won't have much time to look into it. I would be glad if you could get in touch with me via e-mail (andreas.rogge@bareos.com) so we can discuss how we can provide access to the CI server's logs for you, so you can at least try to sort out the issue yourself.

I'll look into it. Yes, would be great if I could take a look at the test logs. Sent you an email.

@SamuelBoerlin
Copy link
Contributor Author

@arogge Thanks for sending me the logs. I couldn't figure out the actual cause yet, but for now I've just increased one of the job's wait time from 5s to 15s.

@SamuelBoerlin
Copy link
Contributor Author

SamuelBoerlin commented Oct 9, 2023

To elaborate on why I have a job that waits 15s: my test starts a priority 20 consolidate job, which then consolidates a usually priority 10 job A (but the consolidate job overrides its priority to be 20) that has the 15s wait built in. Once the consolidate VF job for A has spawned, it starts a regular backup job B with priority 15 (i.e. higher priority than the spawned consolidate/VF job of A but lower priority than A usually would have) and then waits for job B to finish. If job A is still running after B finishes then the test passes. If the priority were not correctly overridden by the consolidate job, then job B could only start after the VF of A has already finished, which would result in a test failure.

@SamuelBoerlin SamuelBoerlin force-pushed the consolidate-job-priority branch from cafafc8 to b7292b5 Compare October 10, 2023 20:35
@SamuelBoerlin
Copy link
Contributor Author

Couldn't reproduce the issue locally so I decided to just rewrite the test which ended up simplifying things anyways.

@SamuelBoerlin SamuelBoerlin requested a review from arogge October 12, 2023 11:15
@arogge
Copy link
Member

arogge commented Oct 12, 2023

Thank you. Build was successful. I'll take a look at the updated test tomorrow!

@arogge arogge force-pushed the consolidate-job-priority branch from 8d5fb3d to 9dde571 Compare October 19, 2023 14:18
@BareosBot BareosBot merged commit 46ba223 into bareos:master Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants