Update DOM and Status fields for CMIS Diagnostic Monitoring HLD#1954
Update DOM and Status fields for CMIS Diagnostic Monitoring HLD#1954prgeor merged 4 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Mihir Patel <patelmi@microsoft.com>
|
/azp run |
|
No pipelines are associated with this pull request. |
There was a problem hiding this comment.
Pull Request Overview
This PR updates the diagnostic monitoring documentation for C-CMIS by adjusting field names and comments in the DOM and status diagnostic tables.
- Added C-CMIS specific field sections in multiple parts of the document.
- Updated field names and documentation comments to reflect that flag set/clear times are always initialized to N/A.
Comments suppressed due to low confidence (2)
doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:2413
- The PR description states that the flag last set time should always be initialized to 'N/A', but the documentation here uses 'never'. Please update the comment to reflect 'N/A' for consistency.
`TRANSCEIVER_DOM_FLAG_SET_TIME`: ... initialized, the timestamp is set to `never`.
doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:2416
- The PR description specifies that the flag clear time is initialized to 'N/A', yet the documentation comment states 'never'. Please update this to 'N/A' to match the requirement.
`TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: ... initialized, the timestamp is set to `never`.
… the CMIS Initialization Phase
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
| - The timestamp at which the link status was changed, or | ||
| - The polling event timestamp if the flag was set during the routine polling by the `DomInfoUpdateTask` thread. | ||
| - `TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: This table records the timestamp (in local timezone) when each DOM flag was cleared. The timestamp is recorded in the format `Day Mon DD HH:MM:SS YYYY`. During initialization, the timestamp is set to `never` if the flag is set. Since SONiC does not support flag-based interrupt handling, the timestamp refers to either: | ||
| - `TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: This table records the timestamp (in local timezone) when each DOM flag was cleared. The timestamp is recorded in the format `Day Mon DD HH:MM:SS YYYY`. During initialization, the timestamp is set to `never`. Since SONiC does not support flag-based interrupt handling, the timestamp refers to either: |
There was a problem hiding this comment.
@mihirpat1 should be UTC since the syslog is in UTC and easier for correlation with syslog event logs?
| Specifically, the `TRANSCEIVER_STATUS` table contains the `diagnostics_update_interval` field to capture the interval period at which the diagnostic information is updated by the `DomInfoUpdateTask` thread for a port. This field is not present in the other diagnostic tables since the diagnostic information is updated for all ports in a sequential manner. | ||
|
|
||
| 1. **`table_last_update_time`**: | ||
| 1. **`last_update_time`**: |
There was a problem hiding this comment.
@mihirpat1 are we planning to make diagnostics_update_interval visible to user for info?
There was a problem hiding this comment.
@prgeor Yes, we plan to display in the Update interval: section of the CLI output
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:1487
- [nitpick] The updated field name 'tx{lane_num}OutputStatus' uses CamelCase, which is inconsistent with the snake_case naming convention used in other keys. Consider renaming it to 'tx_{lane_num}_output_status' for consistency.
tx{lane_num}OutputStatus = BOOLEAN ; tx output status on media lane {lane_num}
doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:1488
- [nitpick] The updated field name 'rx{lane_num}OutputStatusHostlane' uses CamelCase, which is inconsistent with the snake_case naming used elsewhere. Consider renaming it to 'rx_{lane_num}_output_status_hostlane' for consistency.
rx{lane_num}OutputStatusHostlane = BOOLEAN ; rx output status on host lane {lane_num}
|
/azp run |
|
No pipelines are associated with this pull request. |
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "closes #xxxx",
"fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
issue when the PR is merged.
If you are adding/modifying/removing any command or utility script, please also
make sure to add/modify/remove any unit tests from the tests
directory as appropriate.
If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
subcommand, or you are adding a new subcommand, please make sure you also
update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
your changes.
Please provide the following information:
-->
#### What I did
1. With the recent transceiver DOM and STATUS table related changes, the handles of the following CLIs need to be modified to ensure that the CLI function inline to the existing behavior
1. show int transceiver eeprom -d
2. show int transceiver status
3. show int transceiver pm
4. sfputil show eeprom -d -p EthernetXX
2. Also, all the diagnostics related tables will not be present only for the first subport for breakout ports. Hence, made the corresponding CLI handler related changes to support this
3. Modified the `sfputil show error-status` CLI handler to use `TRANSCEIVER_STATUS_SW` instead of `TRANSCEIVER_STATUS` table
#### How I did it
1. Replaced function call for get_transceiver_bulk_status with get_transceiver_dom_real_value
5. As part of displaying o/p for "show int transceiver status" CLI, we are now fetching data from TRANSCEIVER_DOM_FLAG, VDM flag tables and TRANSCEIVER_STATUS_FLAG tables as well since the relevant fields have been moved to these new tables
6. As part of display o/p for "show int transceiver pm" CLI, we are now fetching data from VDM threshold tables as well since the relevant fields have been moved to these new tables
Related PRs
sonic-net/SONiC#1954
[[cmis] Separate Flag-Specific Fields for DOM and Status APIs by mihirpat1 · Pull Request #556 · sonic-net/sonic-platform-common](sonic-net/sonic-platform-common#556)
[[xcvrd] Re-organize transceiver DOM and STATUS tables by mihirpat1 · Pull Request #604 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#604)
#### How to verify it
Following tests were attempted and confirmed to pass
1. Diagnostic table related verification
1. Compare tables with nightly image and ensure non-CMIS transceivers do not change their mapping for DOM and status tables
2. Compare DOM sensor, threshold and status tables between current and nightly. Ensure units are not shown
3. Dump TRANSCEIVER_INFO table for CMIS transceiver to ensure is_replaceable field is present
2. CLI verification
1. sudo sfputil show error-status - This should dump output for all the ports and the behavior should not change between nightly and private image
2. Use invalid port name (Ethernet2033) to dump show int transceiver info, status CLI. Ensure to note new errors in PR description for invalid ports
3. For breakout scenario, keep 2 subports in different CMIS_STATES and ensure the CLI o/p is different for status CLI for both subports
3. Run transceiver onboarding test due to TRANSCEIVER_INFO table deletion change
4. XCVRD stop/start
1. Stop xcvrd and ensure all diagnostic tables are deleted
2. Start xcvrd and ensure all diagnostic tables are populated
5. Link flap event handling
1. Change link flap count for different types of transceivers and ensure no crash is seen. This is to ensure that the link event handler code does not cause any crash for non-CMIS transceivers
2. Shutdown peer port and ensure the tables get updated which are handled via link event update handler.
Also, execute sfputil reset on 10 different physical ports and ensure that all the corresponding flags on the remote side are updated for all 8 lanes of a DR8 optic
Specifically, we need to ensure that the following flags are set wherein X represents the lane number
1. LOSFlagRxX (page 11, byte 147)
2. CDRLOLFlagRxX (page 11, byte 148)
3. OpticalPowerLowAlarmFlagRxX (page 11, byte 150)
4. OpticalPowerLowWarningFlagRxX (page 11, byte 152)
6. Transceiver OIR test
1. Remove transceiver and ensure all diagnostic related tables are deleted
2. Insert transceiver and ensure all diagnostic tables are populated
3. Stop xcvrd and remove transceiver and ensure TRANSCEIVER_INFO table is present. Start xcvrd and ensure that the TRANSCEIVER_INFO table is now deleted
4. Insert the transceiver now
7. Ensure firmware upgrade is successfull and TRANSCEIVER_FIRMWARE_INFO table changes only for the primary subport eventhough, the upgrade was triggered for a non-primary subport
8. Ensure SFF manager crash is seen with nightly and is resolved with this PR (since the PR has the fix for [SffLoggerForPortUpdateEvent class does not have log_debug function · Issue #605 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#605))
#### Previous command output (if the output of a command-line utility has changed)
The output of show int transceiver info and show int transceiver eeprom will change if an invalid port name is passed to the CLI
```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```
#### New command output (if the output of a command-line utility has changed)
```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Error: Found KeyError while getting first subport for Ethernet8
Error: Unable to get first subport for Ethernet8 while converting SFP info
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```
MSFT ADO - 32238285
<!-- Provide a general summary of your changes in the Title above -->
#### Description
<!--
Describe your changes in detail
-->
The DOM and Status related APIs need to be modified to ensure that the fields which have latched registers are moved to the flag specific API.
Also, all the VDM specific fields have now been removed from DOM and Status related APIs since separate APIs pertaining to processing VDM data have now been created.
This change is in line to the requirements of [SONiC/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md at master · sonic-net/SONiC](https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md)
#### Motivation and Context
<!--
Why is this change required? What problem does it solve?
If this pull request closes/resolves an open Issue, make sure you
include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
N/A
#### How Has This Been Tested?
<!--
Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc.
-->
Please refer to sonic-net/sonic-utilities#3844 for the test details
#### Additional Information (Optional)
MSFT ADO - 30090754
Related PRs
sonic-net/SONiC#1954
sonic-net/sonic-platform-daemons#604
sonic-net/sonic-utilities#3844
…es (#176) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "closes #xxxx", "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related issue when the PR is merged. If you are adding/modifying/removing any command or utility script, please also make sure to add/modify/remove any unit tests from the tests directory as appropriate. If you are modifying or removing an existing 'show', 'config' or 'sonic-clear' subcommand, or you are adding a new subcommand, please make sure you also update the Command Line Reference Guide (doc/Command-Reference.md) to reflect your changes. Please provide the following information: --> #### What I did 1. With the recent transceiver DOM and STATUS table related changes, the handles of the following CLIs need to be modified to ensure that the CLI function inline to the existing behavior 1. show int transceiver eeprom -d 2. show int transceiver status 3. show int transceiver pm 4. sfputil show eeprom -d -p EthernetXX 2. Also, all the diagnostics related tables will not be present only for the first subport for breakout ports. Hence, made the corresponding CLI handler related changes to support this 3. Modified the `sfputil show error-status` CLI handler to use `TRANSCEIVER_STATUS_SW` instead of `TRANSCEIVER_STATUS` table #### How I did it 1. Replaced function call for get_transceiver_bulk_status with get_transceiver_dom_real_value 5. As part of displaying o/p for "show int transceiver status" CLI, we are now fetching data from TRANSCEIVER_DOM_FLAG, VDM flag tables and TRANSCEIVER_STATUS_FLAG tables as well since the relevant fields have been moved to these new tables 6. As part of display o/p for "show int transceiver pm" CLI, we are now fetching data from VDM threshold tables as well since the relevant fields have been moved to these new tables Related PRs sonic-net/SONiC#1954 [[cmis] Separate Flag-Specific Fields for DOM and Status APIs by mihirpat1 · Pull Request #556 · sonic-net/sonic-platform-common](sonic-net/sonic-platform-common#556) [[xcvrd] Re-organize transceiver DOM and STATUS tables by mihirpat1 · Pull Request #604 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#604) #### How to verify it Following tests were attempted and confirmed to pass 1. Diagnostic table related verification 1. Compare tables with nightly image and ensure non-CMIS transceivers do not change their mapping for DOM and status tables 2. Compare DOM sensor, threshold and status tables between current and nightly. Ensure units are not shown 3. Dump TRANSCEIVER_INFO table for CMIS transceiver to ensure is_replaceable field is present 2. CLI verification 1. sudo sfputil show error-status - This should dump output for all the ports and the behavior should not change between nightly and private image 2. Use invalid port name (Ethernet2033) to dump show int transceiver info, status CLI. Ensure to note new errors in PR description for invalid ports 3. For breakout scenario, keep 2 subports in different CMIS_STATES and ensure the CLI o/p is different for status CLI for both subports 3. Run transceiver onboarding test due to TRANSCEIVER_INFO table deletion change 4. XCVRD stop/start 1. Stop xcvrd and ensure all diagnostic tables are deleted 2. Start xcvrd and ensure all diagnostic tables are populated 5. Link flap event handling 1. Change link flap count for different types of transceivers and ensure no crash is seen. This is to ensure that the link event handler code does not cause any crash for non-CMIS transceivers 2. Shutdown peer port and ensure the tables get updated which are handled via link event update handler. Also, execute sfputil reset on 10 different physical ports and ensure that all the corresponding flags on the remote side are updated for all 8 lanes of a DR8 optic Specifically, we need to ensure that the following flags are set wherein X represents the lane number 1. LOSFlagRxX (page 11, byte 147) 2. CDRLOLFlagRxX (page 11, byte 148) 3. OpticalPowerLowAlarmFlagRxX (page 11, byte 150) 4. OpticalPowerLowWarningFlagRxX (page 11, byte 152) 6. Transceiver OIR test 1. Remove transceiver and ensure all diagnostic related tables are deleted 2. Insert transceiver and ensure all diagnostic tables are populated 3. Stop xcvrd and remove transceiver and ensure TRANSCEIVER_INFO table is present. Start xcvrd and ensure that the TRANSCEIVER_INFO table is now deleted 4. Insert the transceiver now 7. Ensure firmware upgrade is successfull and TRANSCEIVER_FIRMWARE_INFO table changes only for the primary subport eventhough, the upgrade was triggered for a non-primary subport 8. Ensure SFF manager crash is seen with nightly and is resolved with this PR (since the PR has the fix for [SffLoggerForPortUpdateEvent class does not have log_debug function · Issue #605 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#605)) #### Previous command output (if the output of a command-line utility has changed) The output of show int transceiver info and show int transceiver eeprom will change if an invalid port name is passed to the CLI ``` root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8 Ethernet8: SFP EEPROM Not detected root@str4-sn5600-3:/home/admin# ``` #### New command output (if the output of a command-line utility has changed) ``` root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8 Error: Found KeyError while getting first subport for Ethernet8 Error: Unable to get first subport for Ethernet8 while converting SFP info Ethernet8: SFP EEPROM Not detected root@str4-sn5600-3:/home/admin# ``` MSFT ADO - 32238285
<!-- Provide a general summary of your changes in the Title above --> #### Description <!-- Describe your changes in detail --> The DOM and Status related APIs need to be modified to ensure that the fields which have latched registers are moved to the flag specific API. Also, all the VDM specific fields have now been removed from DOM and Status related APIs since separate APIs pertaining to processing VDM data have now been created. This change is in line to the requirements of [SONiC/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md at master · sonic-net/SONiC](https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md) #### Motivation and Context <!-- Why is this change required? What problem does it solve? If this pull request closes/resolves an open Issue, make sure you include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here --> N/A #### How Has This Been Tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> Please refer to sonic-net/sonic-utilities#3844 for the test details #### Additional Information (Optional) MSFT ADO - 30090754 Related PRs sonic-net/SONiC#1954 sonic-net/sonic-platform-daemons#604 sonic-net/sonic-utilities#3844
The current changes handle the following
Updating Diagnostic Information During the CMIS Initialization Phasesectionneveras part of the table creationtable_last_update_timetolast_update_timeMSFT ADO - 32080519