Skip to content

lib: add source_location; fix test crash#2006

Merged
BareosBot merged 15 commits intobareos:masterfrom
sebsura:dev/ssura/master/enhance-cats
Dec 11, 2024
Merged

lib: add source_location; fix test crash#2006
BareosBot merged 15 commits intobareos:masterfrom
sebsura:dev/ssura/master/enhance-cats

Conversation

@sebsura
Copy link
Contributor

@sebsura sebsura commented Nov 7, 2024

Thank you for contributing to the Bareos Project!

This pr introduces a "polyfill" for std::source_location and uses it in our db code. This should make debugging this code easier.
Additionally it fixes a crash inside our DebugCode as it should not need to take a lock before accessing the database during a crash.

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Required backport PRs have been created
  • Correct milestone is set
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR

@arogge arogge self-requested a review November 12, 2024 11:06
@arogge arogge added bugfix bug This addresses a bug labels Nov 12, 2024
@arogge arogge added this to the 24.0.0 milestone Nov 12, 2024
@sebsura sebsura force-pushed the dev/ssura/master/enhance-cats branch 3 times, most recently from 448390b to d4cf357 Compare November 29, 2024 09:25
Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that when you're touching this, it would make sense to name the functions QueryDb(), InsertDb(), etc.

Comment on lines +78 to +81
#include <string_view>
static_assert(std::string_view{brs::source_location::current().file_name()}
== std::string_view{__FILE__});
static_assert(brs::source_location::current().line() == __LINE__);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remember correctly, you need one layer of indirection to detect the issue I saw on FreeBSD/Clang. I'll go and figure it out...

Comment on lines +62 to +65
void RwlAssertWriterIsMe(brwlock_t* rwl,
const char* function,
const char* file,
int line);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: what's the reasoning of not passing the source_location object by const ref here?
I guess passing two pointers is cheaper than passing three pointers and an int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc i did this before deciding on finally adding source_location to bareos. Now that we can use it, i should change it to just taking a source_location.

@sebsura sebsura force-pushed the dev/ssura/master/enhance-cats branch from 073a35b to 77a1914 Compare December 5, 2024 12:29
@sebsura sebsura requested a review from arogge December 9, 2024 07:52
Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found and reproduced the FreeBSD/Clang issue. Should be easy to catch.

@sebsura sebsura force-pushed the dev/ssura/master/enhance-cats branch 2 times, most recently from b217dc1 to 89ab621 Compare December 10, 2024 14:29
Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thank you!
(but apparently your new test doesn't build)

}

static_assert(make_source_loc().line() == __LINE__);
static_assert(make_source_loc().file_name() == __FILE__);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert(make_source_loc().file_name() == __FILE__);
static_assert(make_source_loc().file_name() == std::string_view{__FILE__});

@sebsura sebsura force-pushed the dev/ssura/master/enhance-cats branch from 89ab621 to 4a8b648 Compare December 11, 2024 05:45
NativeRestoreCleanup is already done on error in job_thread (job.cc)
in case an error occured, so there is no need to do it twice.

This may cause a crash because pthread_cancel and C++ dont mix well,
especially if we cancel the same thread in a short time interval.
(pthread_cancel causes an exception + destructor calling, which may
lead to an exception being caused inside a destructor, leading to a
crash.)
We should never use db_handle_ without ensuring that the db is
locked (which infact is not the case here).
Behind the curtains pthread_cleanup_pop is a destructor and
pthread_cancel causes an exception (via a signal).

This means that if you try to cancel a job while that job is in the
process of ending, then you risk an unfixable crash.  We want to make
this less likely and move the CleanupCall outside the destructor.

Note that this is just bandaid and not a real fix.  There are still
multiple other destructors and pthread_cleanup_calls inside the call
stack, so this can always cause a crash.

The only way to prevent these crashes (even if they are very unlikely)
is to rewrite everything to not rely on pthread_cancel and instead use
some other mechanism for cancelation.   This is very hard as we also
have to be able to cancel stuff like write/read/sleep/etc.

We would have to rewrite them with non blocking alternatives to have a
chance of achieving this (or by using the TIMEOUT signal, but that has
other issues ~> signal (un)safety).
This function is only called when everything is stopped, so we should
not assert that we are the owner here ...
Now we get some line numbers in the assertion which should help us
figuring out the problem in case no backtrace is available.
We used to supply LockDb/UnlockDb with pretty useless line numbers as
they were always the same regardless of which DbLocker it actually
was.  Now we use the line information of the creation of the DbLocker.
The removed macros were just used to inject line information into the
db calls.  As this can now just be done by using source_location we
can remove them and just use proper functions.
This way the autodetection works correctly.  For some reason
check_cxx_source_compiles() uses C++20.  Maybe the check is performed
too early ?
@sebsura sebsura force-pushed the dev/ssura/master/enhance-cats branch from 4a8b648 to 8402d61 Compare December 11, 2024 07:42
@BareosBot BareosBot merged commit 86ba07d into bareos:master Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This addresses a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants