[action] [PR:271] Improve LC reboot cause for supervisor heartbeat loss#291
Merged
mssonicbld merged 1 commit intosonic-net:202505from Jul 16, 2025
Merged
Conversation
When the supervisor reboots ungracefully (e.g., kernel panic), LCs lose connection and are subsequently rebooted. This change adds 'heartbeat loss' as a software reboot cause for LCs. It also modifies the reboot cause logic to prioritize this software heartbeat loss cause over any hardware triggers that occur during the supervisor-initiated LC reboot. This ensures accurate reporting of the LC's reboot reason. With new changes, output is: 1. When there's a graceful restart on Supervisor ``` admin@nfc405-3:~$ show reboot-cause User issued 'Reboot from Supervisor' command [User: Supervisor, Time: Tue Jul 15 07:03:09 PM UTC 2025] admin@nfc405-3:~$ admin@nfc405-3:~$ show reboot-cause history Name Cause Time User Comment ------------------- -------------------------------------------------------------------------------------------------------- ------------------------------- ---------- ----------------------------------------------------------------------------------------------------------------------------------- 2025_07_15_19_07_16 Reboot from Supervisor Tue Jul 15 07:03:09 PM UTC 2025 Supervisor N/A ``` 2. When there's ungraceful restart on Supervisor ``` admin@nfc405-3:~$ show reboot-cause Heartbeat with the Supervisor card lost admin@nfc405-3:~$ admin@nfc405-3:~$ show reboot-cause history Name Cause Time User Comment ------------------- -------------------------------------------------------------------------------------------------------- ------------------------------- ---------- ----------------------------------------------------------------------------------------------------------------------------------- 2025_07_15_19_30_31 Heartbeat with the Supervisor card lost N/A N/A N/A ```
Author
|
Original PR: #271 |
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
gpunathilell
pushed a commit
to gpunathilell/sonic-host-services
that referenced
this pull request
Sep 24, 2025
```<br>* 5f79c8f - (HEAD -> 202506, origin/202505) Improve LC reboot cause for supervisor heartbeat loss (sonic-net#291) (2025-07-16) [mssonicbld] * d86b612 - Fix ProcessStatsST column name issue and add test case to cover check (sonic-net#286) (2025-07-10) [mssonicbld]<br>```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the supervisor reboots ungracefully (e.g., kernel panic), LCs lose connection and are subsequently rebooted. This change adds 'heartbeat loss' as a software reboot cause for LCs.
It also modifies the reboot cause logic to prioritize this software heartbeat loss cause over any hardware triggers that occur during the supervisor-initiated LC reboot. This ensures accurate reporting of the LC's reboot reason.
With new changes, output is: