-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Handle recursive serverAsserts and provide more information for recursive segfaults #12857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do you want to add a DEBUG RECURSIVE-SEGFAULT command to test this?
I couldn't come up with anything outside of setting a flag that would cause a segfault artificially during the print, which seemed very specific to test. I did think about adding a module to do it though. |
|
i think adding a flag in one of the test modules is the way to test it. @madolson did you at least test it manually? @meiravgri FYI. |
Ack. I did test it manually by adding crashes, I'll add a module test too then since that is pretty easy. |
|
|
||
| wait_for_log_messages 0 {"*=== REDIS BUG REPORT END. Make sure to include from START to END. ===*"} $loglines 10 1000 | ||
| assert_equal 1 [count_log_message 0 "=== REDIS BUG REPORT END. Make sure to include from START to END. ==="] | ||
| assert_equal 2 [count_log_message 0 "ASSERTION FAILED"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also make sure we see the stack trace (check that assertCrash is mentioned), twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically three times, but sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3 times?
also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second assertion still prints the stack trace, so the crash will be in the stack trace twice.
also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?
I didn't follow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second assertion still prints the stack trace, so the crash will be in the stack trace twice.
right, so why is it 3 times?
also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?
I didn't follow this.
we have two tests, one "with an assertion" and one "with a segfault".
the first one checks that the assertCrash function appears in both the original stack trace, and the nested one (and a 3rd?).
but the other test, only verifies that we show Crashed running the instruction at twice, and it doesn't validates that the stack trace is printed twice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second stack trace looks something like: There are two assert crashes since it recovered, and then asserted again.
Oh, I misunderstood the motivation, I thought it was just for making sure there wasn't recursive crashes (hence why I didn't add it to the assertion case). Let me raise a quick PR. |
|
@madolson Stuck test in macos and doesn't happen in ubuntu: |
|
@sundb I can't seem to be able to compile SANTIZER=thread on my m2 mac. I'll try it with my work laptop again (which is on x86). I'm not sure what the issue with it is. |
|
@madolson I got stuck on both m1 and m3, then I used dtrace on m3 yesterday to see what he was doing and broke the system.😂 |
…sive segfaults (redis#12857) This change is trying to make two failure modes a bit easier to deep dive: 1. If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process. 2. When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because `info` may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.
…sive segfaults (redis#12857) This change is trying to make two failure modes a bit easier to deep dive: 1. If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process. 2. When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because `info` may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.
This is trying to make two failure modes a bit easier to deep dive:
infomay help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.