Kdump usability and reliability improvements#6113
Conversation
- Allow platform specific reboot script to be called after crash kernel has finished copying the kernel vmcore - Kdump configurations stored and manipulated in ConfigDB are now processed by hostcfgd and applied asynchronously - Disable pcie advanced features when running crash kernel. This improves reliability of the crash kernel to successfully create a vmcore and also reboot - Allow crash kernel to reboot if a panic is seen while it is generating a vmcore - Fix crash kernel to use the SONiC specific /usr/local/bin/reboot script instead of the Linux reboot command /sbin/reboot Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
| #KDUMP_KEXEC_ARGS="" | ||
| #KDUMP_CMDLINE="" | ||
| -#KDUMP_CMDLINE_APPEND="irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0" | ||
| +KDUMP_CMDLINE_APPEND="irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0 panic=10 debug hpet=disable pcie_port=compat pci=nommconf platform=__PLATFORM__" |
There was a problem hiding this comment.
can you explain why pass PLATFORM=$platform to the kernel cmdline? what is the use case here? I do not see this in the description.
There was a problem hiding this comment.
The PLATFORM string will be accessible to the reboot script. When the crash kernel is rebooting it requires $platform value to use any platform specific reboot script which is defined in /usr/share/sonic/device/$platform/platform_reboot.
There was a problem hiding this comment.
platform seems like a common keyword, do you know if kernel does not use it as a parameter already. will there be a chance of conflict? it would be better if you can put sonic_platform to avoid such potential conflict
|
can you separate the hostcfgd changes into a separate pr, that one is for usability. we can merge that one with the sonic-utilities pr you post. |
| In case the capture kernel (the kdump kernel started with kexec) would | ||
| either crash or be stuck, the system should reboot. We need then to add | ||
| the "panic=X" option to the kernel. Without this option, the system could | ||
| stuck and not reboot. |
There was a problem hiding this comment.
the patch description is not complete. you have added a few other options, such as debug, hpet, pcie_port, pci, PLATFORM. can you describe those purpose as well?
There was a problem hiding this comment.
After taking a second look at these two proposed patches, I feel we should not add patches to kdump-tools package. Instead these additional configurations need to be appended to /etc/default/kdump-config as part of build_debian.sh. This will make the SONiC kdump customizations easier to manage. I will also add appropriate description for the options added.
Comments?
There was a problem hiding this comment.
i agree with you. okay to move to build_debian.sh.
There was a problem hiding this comment.
Agree with you are renamed it as sonic_platform.
Yes. That it is a very good suggestion and will make the recommended changes. |
…system creation Moved changes to hostcfgd to a different PR Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
hostcfgd changes moved to #6122 |
…orm identifier string Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
|
retest mellanox |
|
retest mellanox please |
finished copying the kernel vmcore
by hostcfgd and applied asynchronously
reliability of the crash kernel to successfully create a vmcore and also
reboot
vmcore
instead of the Linux reboot command /sbin/reboot
Signed-off-by: Rajendra Dendukuri rajendra.dendukuri@broadcom.com
- Why I did it
Improve Kdump usability and reliability
- How I did it
- How to verify it
config kdump enable
echo c > /proc/sysrq-trigger
show kdump status
show kdump log 1
- Which release branch to backport (provide reason below if selected)
- Description for the changelog
Kdump usability and reliability improvements
- A picture of a cute animal (not mandatory but encouraged)