Handle recursive serverAsserts and provide more information for recursive segfaults #12857

madolson · 2023-12-12T04:10:27Z

This is trying to make two failure modes a bit easier to deep dive:

If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process.
When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because info may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.

zuiderkwast

LGTM

Do you want to add a DEBUG RECURSIVE-SEGFAULT command to test this?

madolson · 2023-12-19T21:08:00Z

Do you want to add a DEBUG RECURSIVE-SEGFAULT command to test this?

I couldn't come up with anything outside of setting a flag that would cause a segfault artificially during the print, which seemed very specific to test. I did think about adding a module to do it though.

oranagra · 2023-12-20T08:14:27Z

i think adding a flag in one of the test modules is the way to test it.

@madolson did you at least test it manually?

@meiravgri FYI.

src/debug.c

madolson · 2023-12-20T22:19:10Z

i think adding a flag in one of the test modules is the way to test it.

Ack. I did test it manually by adding crashes, I'll add a module test too then since that is pretty easy.

src/debug.c

oranagra · 2023-12-24T10:39:05Z

tests/unit/moduleapi/crash.tcl

+
+            wait_for_log_messages 0 {"*=== REDIS BUG REPORT END. Make sure to include from START to END. ===*"} $loglines 10 1000
+            assert_equal 1 [count_log_message 0 "=== REDIS BUG REPORT END. Make sure to include from START to END. ==="]
+            assert_equal 2 [count_log_message 0 "ASSERTION FAILED"]


let's also make sure we see the stack trace (check that assertCrash is mentioned), twice.

It's technically three times, but sure.

why 3 times?

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

The second assertion still prints the stack trace, so the crash will be in the stack trace twice.

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

I didn't follow this.

The second assertion still prints the stack trace, so the crash will be in the stack trace twice.

right, so why is it 3 times?

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

I didn't follow this.

we have two tests, one "with an assertion" and one "with a segfault".
the first one checks that the assertCrash function appears in both the original stack trace, and the nested one (and a 3rd?).
but the other test, only verifies that we show Crashed running the instruction at twice, and it doesn't validates that the stack trace is printed twice too.

tests/unit/moduleapi/crash.tcl

madolson · 2024-01-04T20:48:17Z

right, so why is it 3 times?

The second stack trace looks something like:

Backtrace:
0   redis-server                        0x0000000109f86439 RM__Assert + 9
1   crash.so                            0x000000010a149ce0 assertCrash + 32
2   redis-server                        0x0000000109f8b4b7 modulesCollectInfo + 167
3   redis-server                        0x0000000109f2a620 printCrashReport + 800
4   redis-server                        0x0000000109f29ec3 _serverAssert + 195
5   redis-server                        0x0000000109f86439 RM__Assert + 9
6   crash.so                            0x000000010a149ce0 assertCrash + 32

There are two assert crashes since it recovered, and then asserted again.

but the other test, only verifies that we show Crashed running the instruction at twice, and it doesn't validates that the stack trace is printed twice too.

Oh, I misunderstood the motivation, I thought it was just for making sure there wasn't recursive crashes (hence why I didn't add it to the assertion case). Let me raise a quick PR.

sundb · 2024-01-05T09:25:54Z

@madolson Stuck test in macos and doesn't happen in ubuntu:

make SANITIZER=thread
./runtest-moduleapi --only "Test module crash when info crashes with a segfault"

madolson · 2024-01-05T22:51:50Z

@madolson Stuck test in macos and doesn't happen in ubuntu:

@sundb Was this local? We don't have a test for this AFAIK. I'll check it out.

madolson · 2024-01-06T04:50:08Z

@sundb I can't seem to be able to compile SANTIZER=thread on my m2 mac. I'll try it with my work laptop again (which is on x86). I'm not sure what the issue with it is.

sundb · 2024-01-06T05:35:13Z

@madolson I got stuck on both m1 and m3, then I used dtrace on m3 yesterday to see what he was doing and broke the system.😂

…sive segfaults (redis#12857) This change is trying to make two failure modes a bit easier to deep dive: 1. If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process. 2. When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because `info` may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.

madolson added 2 commits December 11, 2023 20:09

Handle recursive segfaults and serverAsserts

5e1bdac

Also include server Panic

ca39fed

zuiderkwast approved these changes Dec 15, 2023

View reviewed changes

madolson requested a review from oranagra December 19, 2023 21:08

oranagra reviewed Dec 20, 2023

View reviewed changes

src/debug.c Show resolved Hide resolved

src/debug.c Outdated Show resolved Hide resolved

meiravgri reviewed Dec 21, 2023

View reviewed changes

src/debug.c Outdated Show resolved Hide resolved

madolson added 2 commits December 23, 2023 20:39

Address comments

906cec5

Minor tweaks to test

689d8ea

oranagra approved these changes Dec 24, 2023

View reviewed changes

Address comments

de761ba

madolson merged commit 068051e into redis:unstable Jan 3, 2024

sundb mentioned this pull request Jan 5, 2024

Make RM_Yield thread-safe #12905

Merged

sundb mentioned this pull request Jan 7, 2024

Address some failures with new tests for improving debug report #12915

Merged

Handle recursive serverAsserts and provide more information for recursive segfaults #12857

Handle recursive serverAsserts and provide more information for recursive segfaults #12857

Uh oh!

Conversation

madolson commented Dec 12, 2023

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

madolson commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 20, 2023

Uh oh!

Uh oh!

Uh oh!

madolson commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oranagra Dec 24, 2023

Choose a reason for hiding this comment

Uh oh!

madolson Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

oranagra Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

madolson Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

oranagra Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

madolson Jan 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

madolson commented Jan 4, 2024

Uh oh!

sundb commented Jan 5, 2024

Uh oh!

madolson commented Jan 5, 2024

Uh oh!

madolson commented Jan 6, 2024

Uh oh!

sundb commented Jan 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

madolson commented Dec 19, 2023 •

edited

Loading

madolson commented Dec 20, 2023 •

edited

Loading