[warm-reboot][Multi-ASIC] Add Multi-ASIC warm-reboot HLD#2153
[warm-reboot][Multi-ASIC] Add Multi-ASIC warm-reboot HLD#2153stepanblyschak wants to merge 6 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
|
/azp run |
|
No pipelines are associated with this pull request. |
ca6ed95 to
79ea235
Compare
|
/azp run |
|
No pipelines are associated with this pull request. |
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Moved the Multi ASIC topology section to a new position in the document and updated its formatting. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
79ea235 to
30b75ef
Compare
|
/azp run |
|
No pipelines are associated with this pull request. |
|
|
||
| ### 2. Scope | ||
|
|
||
| This document covers warm-reboot support on Multi-ASIC devices. |
There was a problem hiding this comment.
Please xplicitly state whether multi-ASIC warm-reboot applies to:
(a) Fixed multi-ASIC systems (e.g., leaf switch with N ASICs in single chassis)
(b) VOQ/Chassis systems (distributed line card + fabric card architecture)
(c) Both architectures
CONTEXT: The VOQ HLD (doc/voq/voq_hld.md) explicitly states "warm restart NOT
supported in Phase 1" with CHASSIS_APP_DB configured as persistence_for_warm_boot: "no".
If VOQ systems are NOT supported, this must be documented in:
- Scope section (as explicit limitation)
- Testing section (VOQ excluded from test cases)
- Release notes (to prevent user confusion)
If VOQ systems ARE supported, the following must be addressed:
- How is CHASSIS_APP_DB checkpoint handled despite persistence_for_warm_boot: "no"?
- System port state recovery mechanism
- VOQ scheduler state checkpoint (not exposed in SAI)
- Fabric ASIC coordination during warm-reboot
There was a problem hiding this comment.
Fixed multi-ASIC where all ASICs are frontend ASICs, no interconnect between ASICs.
We do not plan support for other flavors
There was a problem hiding this comment.
It means, this HLD is NOT for voq chassis with multi-ASIC linecards having iBGP inter-connection? Please specify the flavors which will not be covered with this HLD.
|  | ||
|
|
||
| ### 5. Requirements | ||
|
|
There was a problem hiding this comment.
What about the Warmboot_Manager_HLD.md (https://github.com/stepanblyschak/SONiC/blob/30b75ef6b5ee62edb4dd79b9b22d2d8a5ad5860e/doc/warm-reboot/Warmboot_Manager_HLD.md) ? This seems to be improved warmboot architecture and what will be the impact of this architecture to this HLD?
There was a problem hiding this comment.
We do not plan to integrate with warmboot manager
| ├── Ethernet512 | ||
| └── PortChannel102 | ||
| ... | ||
| ``` |
There was a problem hiding this comment.
checkpoint atomicity? if database{0} succeeds but database{1} fails, what happens? if it is all-or-nothing atomicity policy, please mention the same.
There was a problem hiding this comment.
asic1 is doing cold start, while asic0 is doing warm boot
| -v /host/warmboot$DEV:/var/warmboot/ | ||
| ``` | ||
|
|
||
| As a result, each container operates within its designated ASIC-specific or global directory. Applications inside the container do not require any modifications, as they continue to access warmboot data through the `/var/warmboot/` alias. |
There was a problem hiding this comment.
How consistency is maintained?
e.g.
ASIC0 checkpoint → route via ASIC2 system port (saved)
Route update → path changes to via ASIC1
ASIC2 checkpoint → system port saved (now stale)
On restore: ASIC0 has stale route, ASIC2 has unused port
Result: Inconsistent routing table
There was a problem hiding this comment.
All ASICs in this design are independent
| ``` | ||
|
|
||
| As a result, each container operates within its designated ASIC-specific or global directory. Applications inside the container do not require any modifications, as they continue to access warmboot data through the `/var/warmboot/` alias. | ||
|
|
There was a problem hiding this comment.
Do N databases checkpoint in parallel or sequentially? Please specify execution model and failure handling
| └── PortChannel102 | ||
| ... | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Do databases restore with dependencies? Must global DB restore before per-ASIC DBs?
There was a problem hiding this comment.
Yes, enforced by systemd
| - Every ASIC specific operation is executed for each ASIC in parallel: | ||
| - e.g: ```docker exec -it swss$dev orchagent_restart_check``` | ||
| - In case of ASIC failure (e.g: restart_check failure), that ASIC is removed from the list, ASIC is cold booted in startup path | ||
|
|
There was a problem hiding this comment.
Do we need to have any synchronization between Port→Interface→Neighbor→Route phases to prevent cross-ASIC race conditions. Without this, ASIC0 may program routes to ASIC2 before ASIC2 completes interface setup → traffic blackholing.
There was a problem hiding this comment.
There's no cross ASIC interactions in this design - all ASICs are frontend ASICs with no interconnect
| | Pre-reboot checks | Abort operation | Abort operation | | ||
| | Orchagent Restart Check | Abort operation | Cold reboot failing ASIC (no rollback of healthy ASICs) | | ||
| | Syncd Pre-shutdown | Failure is ignored | Failure is ignored | | ||
|
|
There was a problem hiding this comment.
Can you confirm the behavior of this feature during image upgrade/downgrade when a multi-asic warm-reboot is used?
There was a problem hiding this comment.
Yes, this is applicable to upgrade. Please note, no downgrade is supported even on single asic.
|
Summary from community review meeting:
|
|
/azp run |
|
No pipelines are associated with this pull request. |
Added a warm boot path check for ASIC in the state diagram. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
c05a078 to
02054db
Compare
|
/azp run |
|
No pipelines are associated with this pull request. |
|
Here is the HLD review recording link - https://zoom.us/rec/share/S1Re3IFE86dlG5FiBzLge0ktrlJdp-zJ1AZpySFTNrzksl7NEpgthkKiGOqX-3Rh._wH-5m_Pqz0gt-Pp?startTime=1769529435000 |
02054db
Uh oh!
There was an error while loading. Please reload this page.