Skip to content

Update DOM and Status fields for CMIS Diagnostic Monitoring HLD#1954

Merged
prgeor merged 4 commits intosonic-net:masterfrom
mihirpat1:ccmis_field_separation
Apr 4, 2025
Merged

Update DOM and Status fields for CMIS Diagnostic Monitoring HLD#1954
prgeor merged 4 commits intosonic-net:masterfrom
mihirpat1:ccmis_field_separation

Conversation

@mihirpat1
Copy link
Copy Markdown
Contributor

@mihirpat1 mihirpat1 commented Apr 1, 2025

The current changes handle the following

  1. Added Updating Diagnostic Information During the CMIS Initialization Phase section
  2. Added more information on the FLAG related tables
  3. Specifying C-CMIS specific fields for DOM and status diagnostic tables
  4. Clearly stating that the flag last set and clear time will always be initialized to never as part of the table creation
  5. Renamed table_last_update_time to last_update_time

MSFT ADO - 32080519

Signed-off-by: Mihir Patel <patelmi@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@mihirpat1 mihirpat1 requested a review from Copilot April 1, 2025 02:36
@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the diagnostic monitoring documentation for C-CMIS by adjusting field names and comments in the DOM and status diagnostic tables.

  • Added C-CMIS specific field sections in multiple parts of the document.
  • Updated field names and documentation comments to reflect that flag set/clear times are always initialized to N/A.
Comments suppressed due to low confidence (2)

doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:2413

  • The PR description states that the flag last set time should always be initialized to 'N/A', but the documentation here uses 'never'. Please update the comment to reflect 'N/A' for consistency.
`TRANSCEIVER_DOM_FLAG_SET_TIME`: ... initialized, the timestamp is set to `never`.

doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:2416

  • The PR description specifies that the flag clear time is initialized to 'N/A', yet the documentation comment states 'never'. Please update this to 'N/A' to match the requirement.
`TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: ... initialized, the timestamp is set to `never`.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

- The timestamp at which the link status was changed, or
- The polling event timestamp if the flag was set during the routine polling by the `DomInfoUpdateTask` thread.
- `TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: This table records the timestamp (in local timezone) when each DOM flag was cleared. The timestamp is recorded in the format `Day Mon DD HH:MM:SS YYYY`. During initialization, the timestamp is set to `never` if the flag is set. Since SONiC does not support flag-based interrupt handling, the timestamp refers to either:
- `TRANSCEIVER_DOM_FLAG_CLEAR_TIME`: This table records the timestamp (in local timezone) when each DOM flag was cleared. The timestamp is recorded in the format `Day Mon DD HH:MM:SS YYYY`. During initialization, the timestamp is set to `never`. Since SONiC does not support flag-based interrupt handling, the timestamp refers to either:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 should be UTC since the syslog is in UTC and easier for correlation with syslog event logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor I have now changed it to UTC

Specifically, the `TRANSCEIVER_STATUS` table contains the `diagnostics_update_interval` field to capture the interval period at which the diagnostic information is updated by the `DomInfoUpdateTask` thread for a port. This field is not present in the other diagnostic tables since the diagnostic information is updated for all ports in a sequential manner.

1. **`table_last_update_time`**:
1. **`last_update_time`**:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihirpat1 are we planning to make diagnostics_update_interval visible to user for info?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor Yes, we plan to display in the Update interval: section of the CLI output

@mihirpat1 mihirpat1 requested a review from Copilot April 4, 2025 15:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:1487

  • [nitpick] The updated field name 'tx{lane_num}OutputStatus' uses CamelCase, which is inconsistent with the snake_case naming convention used in other keys. Consider renaming it to 'tx_{lane_num}_output_status' for consistency.
tx{lane_num}OutputStatus                = BOOLEAN           ; tx output status on media lane {lane_num}

doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md:1488

  • [nitpick] The updated field name 'rx{lane_num}OutputStatusHostlane' uses CamelCase, which is inconsistent with the snake_case naming used elsewhere. Consider renaming it to 'rx_{lane_num}_output_status_hostlane' for consistency.
rx{lane_num}OutputStatusHostlane        = BOOLEAN           ; rx output status on host lane {lane_num}

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@prgeor prgeor merged commit 0f29929 into sonic-net:master Apr 4, 2025
1 check passed
mssonicbld added a commit to mssonicbld/sonic-utilities.msft that referenced this pull request May 9, 2025
<!--
    Please make sure you've read and understood our contributing guidelines:
    https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

    ** Make sure all your commits include a signature generated with `git commit -s` **

    If this is a bug fix, make sure your description includes "closes #xxxx",
    "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
    issue when the PR is merged.

    If you are adding/modifying/removing any command or utility script, please also
    make sure to add/modify/remove any unit tests from the tests
    directory as appropriate.

    If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
    subcommand, or you are adding a new subcommand, please make sure you also
    update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
    your changes.

    Please provide the following information:
-->

#### What I did
1. With the recent transceiver DOM and STATUS table related changes, the handles of the following CLIs need to be modified to ensure that the CLI function inline to the existing behavior
    1. show int transceiver eeprom -d
    2. show int transceiver status
    3. show int transceiver pm
    4. sfputil show eeprom -d -p EthernetXX

2. Also, all the diagnostics related tables will not be present only for the first subport for breakout ports. Hence, made the corresponding CLI handler related changes to support this
3. Modified the `sfputil show error-status` CLI handler to use `TRANSCEIVER_STATUS_SW` instead of `TRANSCEIVER_STATUS` table

#### How I did it
1. Replaced function call for get_transceiver_bulk_status with get_transceiver_dom_real_value
5. As part of displaying o/p for "show int transceiver status" CLI, we are now fetching data from TRANSCEIVER_DOM_FLAG, VDM flag tables and TRANSCEIVER_STATUS_FLAG tables as well since the relevant fields have been moved to these new tables
6. As part of display o/p for "show int transceiver pm" CLI, we are now fetching data from VDM threshold tables as well since the relevant fields have been moved to these new tables

Related PRs
sonic-net/SONiC#1954
[[cmis] Separate Flag-Specific Fields for DOM and Status APIs by mihirpat1 · Pull Request #556 · sonic-net/sonic-platform-common](sonic-net/sonic-platform-common#556)
[[xcvrd] Re-organize transceiver DOM and STATUS tables by mihirpat1 · Pull Request #604 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#604)

#### How to verify it
Following tests were attempted and confirmed to pass

1. Diagnostic table related verification
	1. Compare tables with nightly image and ensure non-CMIS transceivers do not change their mapping for DOM and status tables
	2. Compare DOM sensor, threshold and status tables between current and nightly. Ensure units are not shown
	3. Dump TRANSCEIVER_INFO table for CMIS transceiver to ensure is_replaceable field is present
2. CLI verification
	1. sudo sfputil show error-status - This should dump output for all the ports and the behavior should not change between nightly and private image
	2. Use invalid port name (Ethernet2033) to dump show int transceiver info, status CLI. Ensure to note new errors in PR description for invalid ports
	3. For breakout scenario, keep 2 subports in different CMIS_STATES and ensure the CLI o/p is different for status CLI for both subports
3. Run transceiver onboarding test due to TRANSCEIVER_INFO table deletion change
4. XCVRD stop/start
	1. Stop xcvrd and ensure all diagnostic tables are deleted
	2. Start xcvrd and ensure all diagnostic tables are populated
5. Link flap event handling
	1. Change link flap count for different types of transceivers and ensure no crash is seen. This is to ensure that the link event handler code does not cause any crash for non-CMIS transceivers
	2. Shutdown peer port and ensure the tables get updated which are handled via link event update handler.
		Also, execute sfputil reset on 10 different physical ports and ensure that all the corresponding flags on the remote side are updated for all 8 lanes of a DR8 optic
		Specifically, we need to ensure that the following flags are set wherein X represents the lane number
		1. LOSFlagRxX (page 11, byte 147)
		2. CDRLOLFlagRxX (page 11, byte 148)
		3. OpticalPowerLowAlarmFlagRxX (page 11, byte 150)
		4. OpticalPowerLowWarningFlagRxX (page 11, byte 152)
6. Transceiver OIR test
	1. Remove transceiver and ensure all diagnostic related tables are deleted
	2. Insert transceiver and ensure all diagnostic tables are populated
	3. Stop xcvrd and remove transceiver and ensure TRANSCEIVER_INFO table is present. Start xcvrd and ensure that the TRANSCEIVER_INFO table is now deleted
	4. Insert the transceiver now
7. Ensure firmware upgrade is successfull and TRANSCEIVER_FIRMWARE_INFO table changes only for the primary subport eventhough, the upgrade was triggered for a non-primary subport
8. Ensure SFF manager crash is seen with nightly and is resolved with this PR (since the PR has the fix for [SffLoggerForPortUpdateEvent class does not have log_debug function · Issue #605 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#605))

#### Previous command output (if the output of a command-line utility has changed)
The output of show int transceiver info and show int transceiver eeprom will change if an invalid port name is passed to the CLI

```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```

#### New command output (if the output of a command-line utility has changed)
```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Error: Found KeyError while getting first subport for Ethernet8
Error: Unable to get first subport for Ethernet8 while converting SFP info
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```
MSFT ADO - 32238285
mssonicbld added a commit to mssonicbld/sonic-platform-common.msft that referenced this pull request May 9, 2025
<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
     Describe your changes in detail
-->
The DOM and Status related APIs need to be modified to ensure that the fields which have latched registers are moved to the flag specific API.
Also, all the VDM specific fields have now been removed from DOM and Status related APIs since separate APIs pertaining to processing VDM data have now been created.

This change is in line to the requirements of [SONiC/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md at master · sonic-net/SONiC](https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md)

#### Motivation and Context
<!--
     Why is this change required? What problem does it solve?
     If this pull request closes/resolves an open Issue, make sure you
     include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
N/A

#### How Has This Been Tested?
<!--
     Please describe in detail how you tested your changes.
     Include details of your testing environment, and the tests you ran to
     see how your change affects other areas of the code, etc.
-->
Please refer to sonic-net/sonic-utilities#3844 for the test details

#### Additional Information (Optional)
MSFT ADO - 30090754

Related PRs
sonic-net/SONiC#1954
sonic-net/sonic-platform-daemons#604
sonic-net/sonic-utilities#3844
mssonicbld added a commit to Azure/sonic-utilities.msft that referenced this pull request May 9, 2025
…es (#176)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "closes #xxxx",
 "fixes #xxxx" or "resolves #xxxx" so that GitHub automatically closes the related
 issue when the PR is merged.

 If you are adding/modifying/removing any command or utility script, please also
 make sure to add/modify/remove any unit tests from the tests
 directory as appropriate.

 If you are modifying or removing an existing 'show', 'config' or 'sonic-clear'
 subcommand, or you are adding a new subcommand, please make sure you also
 update the Command Line Reference Guide (doc/Command-Reference.md) to reflect
 your changes.

 Please provide the following information:
-->

#### What I did
1. With the recent transceiver DOM and STATUS table related changes, the handles of the following CLIs need to be modified to ensure that the CLI function inline to the existing behavior
 1. show int transceiver eeprom -d
 2. show int transceiver status
 3. show int transceiver pm
 4. sfputil show eeprom -d -p EthernetXX

2. Also, all the diagnostics related tables will not be present only for the first subport for breakout ports. Hence, made the corresponding CLI handler related changes to support this
3. Modified the `sfputil show error-status` CLI handler to use `TRANSCEIVER_STATUS_SW` instead of `TRANSCEIVER_STATUS` table

#### How I did it
1. Replaced function call for get_transceiver_bulk_status with get_transceiver_dom_real_value
5. As part of displaying o/p for "show int transceiver status" CLI, we are now fetching data from TRANSCEIVER_DOM_FLAG, VDM flag tables and TRANSCEIVER_STATUS_FLAG tables as well since the relevant fields have been moved to these new tables
6. As part of display o/p for "show int transceiver pm" CLI, we are now fetching data from VDM threshold tables as well since the relevant fields have been moved to these new tables

Related PRs
sonic-net/SONiC#1954
[[cmis] Separate Flag-Specific Fields for DOM and Status APIs by mihirpat1 · Pull Request #556 · sonic-net/sonic-platform-common](sonic-net/sonic-platform-common#556)
[[xcvrd] Re-organize transceiver DOM and STATUS tables by mihirpat1 · Pull Request #604 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#604)

#### How to verify it
Following tests were attempted and confirmed to pass

1. Diagnostic table related verification
	1. Compare tables with nightly image and ensure non-CMIS transceivers do not change their mapping for DOM and status tables
	2. Compare DOM sensor, threshold and status tables between current and nightly. Ensure units are not shown
	3. Dump TRANSCEIVER_INFO table for CMIS transceiver to ensure is_replaceable field is present
2. CLI verification
	1. sudo sfputil show error-status - This should dump output for all the ports and the behavior should not change between nightly and private image
	2. Use invalid port name (Ethernet2033) to dump show int transceiver info, status CLI. Ensure to note new errors in PR description for invalid ports
	3. For breakout scenario, keep 2 subports in different CMIS_STATES and ensure the CLI o/p is different for status CLI for both subports
3. Run transceiver onboarding test due to TRANSCEIVER_INFO table deletion change
4. XCVRD stop/start
	1. Stop xcvrd and ensure all diagnostic tables are deleted
	2. Start xcvrd and ensure all diagnostic tables are populated
5. Link flap event handling
	1. Change link flap count for different types of transceivers and ensure no crash is seen. This is to ensure that the link event handler code does not cause any crash for non-CMIS transceivers
	2. Shutdown peer port and ensure the tables get updated which are handled via link event update handler.
		Also, execute sfputil reset on 10 different physical ports and ensure that all the corresponding flags on the remote side are updated for all 8 lanes of a DR8 optic
		Specifically, we need to ensure that the following flags are set wherein X represents the lane number
		1. LOSFlagRxX (page 11, byte 147)
		2. CDRLOLFlagRxX (page 11, byte 148)
		3. OpticalPowerLowAlarmFlagRxX (page 11, byte 150)
		4. OpticalPowerLowWarningFlagRxX (page 11, byte 152)
6. Transceiver OIR test
	1. Remove transceiver and ensure all diagnostic related tables are deleted
	2. Insert transceiver and ensure all diagnostic tables are populated
	3. Stop xcvrd and remove transceiver and ensure TRANSCEIVER_INFO table is present. Start xcvrd and ensure that the TRANSCEIVER_INFO table is now deleted
	4. Insert the transceiver now
7. Ensure firmware upgrade is successfull and TRANSCEIVER_FIRMWARE_INFO table changes only for the primary subport eventhough, the upgrade was triggered for a non-primary subport
8. Ensure SFF manager crash is seen with nightly and is resolved with this PR (since the PR has the fix for [SffLoggerForPortUpdateEvent class does not have log_debug function · Issue #605 · sonic-net/sonic-platform-daemons](sonic-net/sonic-platform-daemons#605))

#### Previous command output (if the output of a command-line utility has changed)
The output of show int transceiver info and show int transceiver eeprom will change if an invalid port name is passed to the CLI

```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```

#### New command output (if the output of a command-line utility has changed)
```
root@str4-sn5600-3:/home/admin# show int transceiver info Ethernet8
Error: Found KeyError while getting first subport for Ethernet8
Error: Unable to get first subport for Ethernet8 while converting SFP info
Ethernet8: SFP EEPROM Not detected
root@str4-sn5600-3:/home/admin#
```
MSFT ADO - 32238285
mssonicbld added a commit to Azure/sonic-platform-common.msft that referenced this pull request May 9, 2025
<!-- Provide a general summary of your changes in the Title above -->

#### Description
<!--
 Describe your changes in detail
-->
The DOM and Status related APIs need to be modified to ensure that the fields which have latched registers are moved to the flag specific API.
Also, all the VDM specific fields have now been removed from DOM and Status related APIs since separate APIs pertaining to processing VDM data have now been created.

This change is in line to the requirements of [SONiC/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md at master · sonic-net/SONiC](https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md)

#### Motivation and Context
<!--
 Why is this change required? What problem does it solve?
 If this pull request closes/resolves an open Issue, make sure you
 include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
N/A

#### How Has This Been Tested?
<!--
 Please describe in detail how you tested your changes.
 Include details of your testing environment, and the tests you ran to
 see how your change affects other areas of the code, etc.
-->
Please refer to sonic-net/sonic-utilities#3844 for the test details

#### Additional Information (Optional)
MSFT ADO - 30090754

Related PRs
sonic-net/SONiC#1954
sonic-net/sonic-platform-daemons#604
sonic-net/sonic-utilities#3844
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants