Skip to content

[xcvrd] Re-organize transceiver DOM and STATUS tables + enable link event handling for DOM monitoring + DOM polling based on physical port#604

Merged
prgeor merged 18 commits intosonic-net:masterfrom
mihirpat1:dom_status_tbl_flag_support
May 6, 2025
Merged

[xcvrd] Re-organize transceiver DOM and STATUS tables + enable link event handling for DOM monitoring + DOM polling based on physical port#604
prgeor merged 18 commits intosonic-net:masterfrom
mihirpat1:dom_status_tbl_flag_support

Conversation

@mihirpat1
Copy link
Copy Markdown
Contributor

@mihirpat1 mihirpat1 commented Apr 3, 2025

Description

  1. The transceiver DOM and Status tables are now being enhanced to be in-line with HLD CMIS_Diagnostic_Monitoring_Overview_in_SONiC
  2. Relevant metadata are being added for DOM and Status related tables to capture all flag changes
  3. Ensuring that the change count is initialized to 0 and last set/clear time fields are set to N/A if the corresponding metadata table is created for the first time post xcvrd boot-up or transceiver insertion.
  4. All the diagnostic tables (except for metadata tables) will now have last_update_time field to capture the last table update time.
    Metadata tables update information only if there is a change in value between the current read and previously read value. Hence, not planning to add last_update_time field for metadata tables
  5. Created TRANSCEIVER_STATUS_SW table to store information for the following fields
    • cmis_state
    • status
    • error
  6. Removed the diagnostic tables for non-primary subports for breakout ports (except for TRANSCEIVER_STATUS_SW table). Also, modified _handle_port_add function to naturally sort the logical port list stored in physical_to_logical dictionary.
  7. Enabled DOM polling based on physical port and hence, populating diagnostic tables only for the first subports for breakout port configurations.
  8. Enabled support to capture flag relevant diagnostic tables as part of link change event handling. Please refer to https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/CMIS_Diagnostic_Monitoring_Overview_in_SONiC.md#522-link-change-event-detection for further details

Unrelated issues being addressed

  1. Added back is_replaceable field to TRANSCEIVER_INFO table for CMIS optics. This field was removed by a recent commit
  2. Removed an error message if get_transceiver_vdm_thresholds raises NotImplementedError exception. This is done to avoid showing error message for non-CMIS modules since these module do not support VDM functionality
  3. Deleting TRANSCEIVER_INFO table upon xcvrd initialization if transceiver is not present anymore. This scenario can happen if the transceiver was removed while xcvrd was not running
  4. Fixes [Master] SffLoggerForPortUpdateEvent class does not have log_debug function #605
  5. Fixes Bug: non-CMIS transceivers will fail to get VDM thresholds and log error is printed sonic-buildimage#22220
  6. Using SysLogger for DomInfoUpdateTask instead of using xcvrd's helper_logger
  7. Modified post_port_active_apsel_to_db to initialize active_apsel_hostlaneX to N/A for unrelated lanes wherein X represents the lane number from 1 to 8

Motivation and Context

All the changes in this PR are done to ensure that xcvrd is in-line with the HLD CMIS_Diagnostic_Monitoring_Overview_in_SONiC

How Has This Been Tested?

Please refer to sonic-net/sonic-utilities#3844 for the test details

Additional Information (Optional)

Related PRs
sonic-net/SONiC#1954
sonic-net/sonic-platform-common#556
sonic-net/sonic-utilities#3844

Fixes sonic-net/sonic-buildimage#22150

MSFT ADO - 29932118

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@mihirpat1 mihirpat1 marked this pull request as ready for review April 3, 2025 23:17
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mihirpat1 mihirpat1 requested review from Copilot and prgeor April 3, 2025 23:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

sonic-xcvrd/xcvrd/dom/utilities/dom_sensor/db_utils.py:26

  • Consider using an f-string for proper variable interpolation in this error message. For example, change the line to: f"Post port dom sensor info to db failed for {logical_port_name} as no asic index found".
self.logger.log_error("Post port dom sensor info to db failed for {logical_port_name} as no asic index found")

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.


asic_index = self.port_mapping.get_asic_id_for_logical_port(logical_port_name)
if asic_index is None:
self.logger.log_error("Post port dom flags to db failed for {logical_port_name} "
Copy link

Copilot AI Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subsequent log_error call uses "{logical_port_name}" without f-string interpolation. Update the log message to use f"..." so that the actual logical_port_name is printed.

Suggested change
self.logger.log_error("Post port dom flags to db failed for {logical_port_name} "
self.logger.log_error(f"Post port dom flags to db failed for {logical_port_name} "

Copilot uses AI. Check for mistakes.
def post_port_dom_thresholds_to_db(self, logical_port_name, db_cache=None):
asic_index = self.port_mapping.get_asic_id_for_logical_port(logical_port_name)
if asic_index is None:
self.logger.log_error("Post port dom thresholds to db failed for {logical_port_name} "
Copy link

Copilot AI Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log_error message below also uses "{logical_port_name}" without proper interpolation. Converting the log message to an f-string will ensure the correct value is displayed.

Suggested change
self.logger.log_error("Post port dom thresholds to db failed for {logical_port_name} "
self.logger.log_error(f"Post port dom thresholds to db failed for {logical_port_name} "

Copilot uses AI. Check for mistakes.
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@mihirpat1 mihirpat1 requested a review from Copilot April 3, 2025 23:36
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

sonic-xcvrd/xcvrd/xcvrd_utilities/xcvr_table_helper.py:21

  • For consistency with the other table constants that include a '_TABLE' suffix, consider renaming this constant to 'TRANSCEIVER_STATUS_FLAG_CHANGE_COUNT_TABLE'.
TRANSCEIVER_STATUS_FLAG_CHANGE_COUNT = 'TRANSCEIVER_STATUS_FLAG_CHANGE_COUNT'

sonic-xcvrd/xcvrd/xcvrd.py:252

  • The word 'execption' appears to be a typo; consider correcting it to 'exception'.
#continue to process next port since execption could be raised due to port reset, transceiver removal

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mihirpat1 mihirpat1 requested a review from Copilot April 4, 2025 00:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

sonic-xcvrd/xcvrd/dom/utilities/vdm/db_utils.py:90

  • The variable 'flag_data' is used in the metadata update block without being defined in the current scope. Please review and define 'flag_data' or adjust the condition accordingly.
if flag_data and threshold_value_dict:

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mihirpat1 mihirpat1 requested a review from Copilot April 4, 2025 17:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

sonic-xcvrd/xcvrd/xcvrd_utilities/xcvr_table_helper.py:14

  • Ensure that the naming convention for the new DOM and Status flag metadata tables remains consistent with existing table names and is clearly documented in the HLD.
TRANSCEIVER_DOM_FLAG_TABLE = 'TRANSCEIVER_DOM_FLAG'

sonic-xcvrd/xcvrd/dom/utilities/db/utils.py:124

  • [nitpick] For flags that are not already present in the database metadata, consider initializing their metadata (e.g., change count and timestamp fields) rather than skipping the update; this will ensure all new flags get proper metadata tracking.
if flag_key in db_flags_value_dict and db_flags_value_dict[flag_key] != str(curr_flag_value):

sonic-xcvrd/xcvrd/dom/utilities/vdm/db_utils.py:109

  • Verify that appending 'last_update_time' as a field in the FieldValuePairs conforms to the expected database schema and does not disrupt consumers of the table data.
[('last_update_time', self.get_current_time())]

OriTrabelsi added a commit to OriTrabelsi/sonic-mgmt that referenced this pull request Jun 12, 2025
Call check_transceiver_dom_sensor_basic only for first_port_in_split.
Call check_transceiver_dom_sensor_details only for first_port_in_split.

This aligns with the OS change where TRANSCEIVER_DOM_SENSOR values are available only for the first port in each split.
See PR: sonic-net/sonic-platform-daemons#604
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jun 19, 2025
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E
mssonicbld pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Jun 20, 2025
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E
arlakshm pushed a commit to Azure/sonic-mgmt.msft that referenced this pull request Jun 24, 2025
The changes to the sonic-mgmt testcases are required due to the below
changes:
sonic-net/sonic-platform-common#556
sonic-net/sonic-platform-daemons#604
sonic-net/sonic-utilities#3844

This PR also reverts the changes made by #181 .
@yejianquan
Copy link
Copy Markdown
Contributor

Hi @mihirpat1 , please fix conflict and create PR to 202505 branch

@yejianquan
Copy link
Copy Markdown
Contributor

202505 branched out around 5/16, this commit is included when created, removing request labels.
and please ignore the conflict notification

opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E

Signed-off-by: opcoder0 <110003254+opcoder0@users.noreply.github.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E

Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
Description of PR
Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

Approach
What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The last_update_time field was added in four place including post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db.
So there are many table involved such as TRANSCEIVER_STATUS, TRANSCEIVER_STATUS_FLAG. Thus skip in the val comparison.

How did you do it?
Skip the last_update_time for comparison

How did you verify/test it?
E2E

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
mssonicbld added a commit to mssonicbld/sonic-mgmt.msft that referenced this pull request Feb 4, 2026
<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505

### Approach
#### What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The `last_update_time` field was added in four place including `post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db`.
So there are many table involved such as `TRANSCEIVER_STATUS`, `TRANSCEIVER_STATUS_FLAG`. Thus skip in the val comparison.
#### How did you do it?
Skip the last_update_time for comparison
#### How did you verify/test it?
E2E
#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
mssonicbld added a commit to Azure/sonic-mgmt.msft that referenced this pull request Feb 4, 2026
…per design (#997)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: Skip last_update_time per design.
Fixes # (issue) ADO: 32970571

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505

### Approach
#### What is the motivation for this PR?
Due to recent change in sonic-net/sonic-platform-daemons#604, the last_udpate_time was added. It should be skipped for comparison per design.
The `last_update_time` field was added in four place including `post_diagnostic_values_to_db, post_port_dom_flags_to_db, post_port_transceiver_hw_status_flags_to_db, post_port_vdm_thresholds_to_db`.
So there are many table involved such as `TRANSCEIVER_STATUS`, `TRANSCEIVER_STATUS_FLAG`. Thus skip in the val comparison.
#### How did you do it?
Skip the last_update_time for comparison
#### How did you verify/test it?
E2E
#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment