[SmartSwitch] Add graceful shutdown and startup handling in platform daemons#703
[SmartSwitch] Add graceful shutdown and startup handling in platform daemons#703yxieca merged 14 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR refactors module admin state management by introducing a new set_admin_state_gracefully method that encapsulates the pre-shutdown and post-startup hooks alongside the admin state change. The refactor simplifies the code by removing the ModuleTransitionFlagHelper class and duplicate logic for managing module state transitions.
- Replaces explicit
module_pre_shutdown,set_admin_state, andmodule_post_startupcalls with a singleset_admin_state_gracefullymethod - Removes the
ModuleTransitionFlagHelperclass and all transition flag tracking logic - Updates tests to reflect the new simplified API
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| sonic-chassisd/scripts/chassisd | Removes ModuleTransitionFlagHelper class, simplifies submit_callback and submit_dpu_callback methods to use set_admin_state_gracefully, removes duplicate initialization code |
| sonic-chassisd/tests/mock_platform.py | Adds mock implementation of set_admin_state_gracefully method |
| sonic-chassisd/tests/test_chassisd.py | Updates tests to mock set_admin_state_gracefully instead of individual pre/post hooks, adjusts assertions accordingly |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@rameshraghupathy @gpunathilell could you please review this latest PR |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
sonic-chassisd/tests/test_chassisd.py:1738
- Lines 1652-1738 contain orphaned code that is not inside any function definition. This code appears to be leftover from an old test that was removed or refactored. Since this code is at module level, it will execute during test file import rather than as part of a test function, which could cause unintended side effects or test failures. This code block should be removed entirely.
# Test the chassisd run
chassis = MockSmartSwitchChassis()
# DPU0 details
index = 0
name = "DPU0"
desc = "DPU Module 0"
slot = 0
sup_slot = 0
serial = "DPU0-0000"
module_type = ModuleBase.MODULE_TYPE_DPU
module = MockModule(index, name, desc, module_type, slot, serial)
module.set_midplane_ip()
# Set initial state for DPU0
status = ModuleBase.MODULE_STATUS_PRESENT
module.set_oper_status(status)
chassis.module_list.append(module)
# Supervisor ModuleUpdater
module_updater = SmartSwitchModuleUpdater(SYSLOG_IDENTIFIER, chassis)
module_updater.module_db_update()
module_updater.modules_num_update()
# ChassisdDaemon setup
daemon_chassisd = ChassisdDaemon(SYSLOG_IDENTIFIER, chassis)
daemon_chassisd.module_updater = module_updater
daemon_chassisd.stop = MagicMock()
daemon_chassisd.stop.wait.return_value = True
daemon_chassisd.smartswitch = True
# Import platform and use chassis as platform_chassis
import sonic_platform.platform
platform_chassis = chassis
# Mock objects
mock_chassis = MagicMock()
mock_module_updater = MagicMock()
# Mock the module (DPU0)
mock_module = MagicMock()
mock_module.get_name.return_value = "DPU0"
# Mock chassis.get_module to return the mock_module for DPU0
def mock_get_module(index):
if index == 0: # For DPU0
return mock_module
return None # No other modules available in this test case
# Apply the side effect for chassis.get_module
mock_chassis.get_module.side_effect = mock_get_module
# Mock state_db
mock_state_db = MagicMock()
# fvs_mock = [True, {CHASSIS_MIDPLANE_INFO_ACCESS_FIELD: 'True'}]
# mock_state_db.get.return_value = fvs_mock
# Mock db_connect
mock_db_connect = MagicMock()
mock_db_connect.return_value = mock_state_db
# Mock admin_status
# mock_module_updater.get_module_admin_status.return_value = 'up'
# Set access of DPU0 up
midplane_table = module_updater.midplane_table
module.set_midplane_reachable(False)
module_updater.check_midplane_reachability()
fvs = midplane_table.get(name)
assert fvs != None
if isinstance(fvs, list):
fvs = dict(fvs[-1])
assert module.get_midplane_ip() == fvs[CHASSIS_MIDPLANE_INFO_IP_FIELD]
assert str(module.is_midplane_reachable()) == fvs[CHASSIS_MIDPLANE_INFO_ACCESS_FIELD]
# Patching platform's Chassis object to return the mocked module
with patch.object(sonic_platform.platform.Chassis, 'is_smartswitch') as mock_is_smartswitch, \
patch.object(sonic_platform.platform.Chassis, 'get_module', side_effect=mock_get_module):
# Simulate that the system is a SmartSwitch
mock_is_smartswitch.return_value = True
# Patch num_modules for the updater
with patch.object(daemon_chassisd.module_updater, 'num_modules', 1), \
patch.object(daemon_chassisd.module_updater, 'get_module_admin_status', return_value='up'):
# Now run the function that sets the initial admin state
daemon_chassisd.set_initial_dpu_admin_state()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Cherry-pick PR to 202511: #726 |
Description
HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in
sonic-platform-daemons#667This PR introduces graceful shutdown and startup orchestration across SONiC platform daemons to ensure safe DPU and peripheral module transitions during reboot or administrative state changes.
Key updates include:
ModuleBaselifecycle methods (module_pre_shutdown,module_post_startup, andset_admin_state_gracefully) into platform daemons.CHASSIS_MODULE_TABLEviaSTATE_DBto synchronize transition state across processes.Motivation and Context
Platform daemons currently perform shutdown and startup independently, leading to:
This change introduces a unified graceful shutdown framework for SmartSwitch modules.
It ensures predictable module transitions, preserves hardware health, and supports orchestrated restarts without transient hardware errors.
How Has This Been Tested?
Testing performed on both DPU-enabled (SmartSwitch).
Functional validation
detaching/attaching) reflected inSTATE_DB.pcieddaemon logs confirm ordered detach before reboot and reattach after startup.Unit tests executed
Coverage includes:
Manual validation
Additional Information (Optional)