Skip to content

Conversation

@madolson
Copy link
Contributor

This is trying to make two failure modes a bit easier to deep dive:

  1. If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process.
  2. When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because info may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Do you want to add a DEBUG RECURSIVE-SEGFAULT command to test this?

@madolson
Copy link
Contributor Author

madolson commented Dec 19, 2023

Do you want to add a DEBUG RECURSIVE-SEGFAULT command to test this?

I couldn't come up with anything outside of setting a flag that would cause a segfault artificially during the print, which seemed very specific to test. I did think about adding a module to do it though.

@madolson madolson requested a review from oranagra December 19, 2023 21:08
@oranagra
Copy link
Member

i think adding a flag in one of the test modules is the way to test it.

@madolson did you at least test it manually?

@meiravgri FYI.

@madolson
Copy link
Contributor Author

madolson commented Dec 20, 2023

i think adding a flag in one of the test modules is the way to test it.

Ack. I did test it manually by adding crashes, I'll add a module test too then since that is pretty easy.


wait_for_log_messages 0 {"*=== REDIS BUG REPORT END. Make sure to include from START to END. ===*"} $loglines 10 1000
assert_equal 1 [count_log_message 0 "=== REDIS BUG REPORT END. Make sure to include from START to END. ==="]
assert_equal 2 [count_log_message 0 "ASSERTION FAILED"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also make sure we see the stack trace (check that assertCrash is mentioned), twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically three times, but sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 3 times?

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second assertion still prints the stack trace, so the crash will be in the stack trace twice.

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

I didn't follow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second assertion still prints the stack trace, so the crash will be in the stack trace twice.

right, so why is it 3 times?

also, despite already merged, can't / shouldn't we make a similar validation that the nested stack trace works for the segfault test too?

I didn't follow this.

we have two tests, one "with an assertion" and one "with a segfault".
the first one checks that the assertCrash function appears in both the original stack trace, and the nested one (and a 3rd?).
but the other test, only verifies that we show Crashed running the instruction at twice, and it doesn't validates that the stack trace is printed twice too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson madolson merged commit 068051e into redis:unstable Jan 3, 2024
@madolson
Copy link
Contributor Author

madolson commented Jan 4, 2024

right, so why is it 3 times?

The second stack trace looks something like:

Backtrace:
0   redis-server                        0x0000000109f86439 RM__Assert + 9
1   crash.so                            0x000000010a149ce0 assertCrash + 32
2   redis-server                        0x0000000109f8b4b7 modulesCollectInfo + 167
3   redis-server                        0x0000000109f2a620 printCrashReport + 800
4   redis-server                        0x0000000109f29ec3 _serverAssert + 195
5   redis-server                        0x0000000109f86439 RM__Assert + 9
6   crash.so                            0x000000010a149ce0 assertCrash + 32

There are two assert crashes since it recovered, and then asserted again.

but the other test, only verifies that we show Crashed running the instruction at twice, and it doesn't validates that the stack trace is printed twice too.

Oh, I misunderstood the motivation, I thought it was just for making sure there wasn't recursive crashes (hence why I didn't add it to the assertion case). Let me raise a quick PR.

@sundb sundb mentioned this pull request Jan 5, 2024
@sundb
Copy link
Collaborator

sundb commented Jan 5, 2024

@madolson Stuck test in macos and doesn't happen in ubuntu:

make SANITIZER=thread
./runtest-moduleapi --only "Test module crash when info crashes with a segfault"

@madolson
Copy link
Contributor Author

madolson commented Jan 5, 2024

@madolson Stuck test in macos and doesn't happen in ubuntu:

@sundb Was this local? We don't have a test for this AFAIK. I'll check it out.

@madolson
Copy link
Contributor Author

madolson commented Jan 6, 2024

@sundb I can't seem to be able to compile SANTIZER=thread on my m2 mac. I'll try it with my work laptop again (which is on x86). I'm not sure what the issue with it is.

@sundb
Copy link
Collaborator

sundb commented Jan 6, 2024

@madolson I got stuck on both m1 and m3, then I used dtrace on m3 yesterday to see what he was doing and broke the system.😂

roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
…sive segfaults (redis#12857)

This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…sive segfaults (redis#12857)

This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants