Skip to content

[xcvrd] Skip VDM threshold DB update for flat memory transceivers#595

Merged
prgeor merged 4 commits intosonic-net:masterfrom
mihirpat1:vdm_threshold_copper_master
Mar 18, 2025
Merged

[xcvrd] Skip VDM threshold DB update for flat memory transceivers#595
prgeor merged 4 commits intosonic-net:masterfrom
mihirpat1:vdm_threshold_copper_master

Conversation

@mihirpat1
Copy link
Copy Markdown
Contributor

@mihirpat1 mihirpat1 commented Mar 17, 2025

Description

The following traceback is seen with the latest image for DAC cables.

2025 Mar 17 17:24:19.889826 sonic-dut ERR pmon#xcvrd[67]: Traceback (most recent call last):
2025 Mar 17 17:24:19.889997 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet259', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet259', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.890132 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1878, in run
2025 Mar 17 17:24:19.890201 sonic-dut ERR pmon#xcvrd[67]:     self.task_worker(self.task_stopping_event, self.sfp_error_event)
2025 Mar 17 17:24:19.890397 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet21', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet21', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.890516 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1671, in task_worker
2025 Mar 17 17:24:19.890609 sonic-dut ERR pmon#xcvrd[67]:     self.init()
2025 Mar 17 17:24:19.890757 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet86', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet86', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.890810 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1590, in init
2025 Mar 17 17:24:19.890856 sonic-dut ERR pmon#xcvrd[67]:     self.retry_eeprom_set = self._post_port_sfp_info_and_dom_thr_to_db_once(port_mapping_data, self.xcvr_table_helper, self.main_thread_stop_event)
2025 Mar 17 17:24:19.891031 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet294', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet294', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.891059 sonic-dut ERR pmon#xcvrd[67]:                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.891104 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/xcvrd.py", line 1547, in _post_port_sfp_info_and_dom_thr_to_db_once
2025 Mar 17 17:24:19.891255 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet289', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet289', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.891290 sonic-dut ERR pmon#xcvrd[67]:     self.vdm_db_utils.post_port_vdm_thresholds_to_db(logical_port_name)
2025 Mar 17 17:24:19.891332 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/dom/utilities/vdm/db_utils.py", line 70, in post_port_vdm_thresholds_to_db
2025 Mar 17 17:24:19.891490 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet128', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet128', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.891525 sonic-dut ERR pmon#xcvrd[67]:     return self._post_port_vdm_thresholds_or_flags_to_db(logical_port_name, self.xcvr_table_helper.get_vdm_threshold_tbl,
2025 Mar 17 17:24:19.891569 sonic-dut ERR pmon#xcvrd[67]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.891721 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet2', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet2', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.891757 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/dom/utilities/vdm/db_utils.py", line 100, in _post_port_vdm_thresholds_or_flags_to_db
2025 Mar 17 17:24:19.891794 sonic-dut ERR pmon#xcvrd[67]:     vdm_values_dict = get_vdm_values_func(physical_port)
2025 Mar 17 17:24:19.891839 sonic-dut ERR pmon#xcvrd[67]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.891990 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet471', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet471', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.892029 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/xcvrd/dom/utilities/vdm/utils.py", line 39, in get_vdm_thresholds
2025 Mar 17 17:24:19.892067 sonic-dut ERR pmon#xcvrd[67]:     return self.sfp_obj_dict[physical_port].get_transceiver_vdm_thresholds()
2025 Mar 17 17:24:19.892109 sonic-dut ERR pmon#xcvrd[67]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.892262 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet118', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet118', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.892302 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/sonic_platform_base/sonic_xcvr/sfp_optoe_base.py", line 76, in get_transceiver_vdm_thresholds
2025 Mar 17 17:24:19.892340 sonic-dut ERR pmon#xcvrd[67]:     return api.get_transceiver_vdm_thresholds() if api is not None else None
2025 Mar 17 17:24:19.892381 sonic-dut ERR pmon#xcvrd[67]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.892544 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet343', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet343', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.892585 sonic-dut ERR pmon#xcvrd[67]:   File "/usr/local/lib/python3.11/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 2566, in get_transceiver_vdm_thresholds
2025 Mar 17 17:24:19.892625 sonic-dut ERR pmon#xcvrd[67]:     vdm_raw_dict = self.get_vdm(self.vdm.VDM_THRESHOLD)
2025 Mar 17 17:24:19.892666 sonic-dut ERR pmon#xcvrd[67]:                                 ^^^^^^^^^^^^^^^^^^^^^^
2025 Mar 17 17:24:19.892812 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet224', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'true', 'index': '-1', 'port_name': 'Ethernet224', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.892857 sonic-dut ERR pmon#xcvrd[67]: AttributeError: 'NoneType' object has no attribute 'VDM_THRESHOLD'
2025 Mar 17 17:24:19.893100 sonic-dut NOTICE pmon#xcvrd[67]: Stop daemon main loop
2025 Mar 17 17:24:19.893330 sonic-dut WARNING pmon#xcvrd[67]: *** ('Ethernet230', 'STATE_DB', 'PORT_TABLE') handle_port_update_event() fvp {'host_tx_ready': 'false', 'index': '-1', 'port_name': 'Ethernet230', 'asic_id': 0, 'op': 'SET'}
2025 Mar 17 17:24:19.893330 sonic-dut ERR pmon#xcvrd[67]: Xcvrd: exception found at child thread SfpStateUpdateTask due to AttributeError("'NoneType' object has no attribute 'VDM_THRESHOLD'")
2025 Mar 17 17:24:19.893412 sonic-dut ERR pmon#xcvrd[67]: Exiting main loop as child thread raised exception!
2025 Mar 17 17:24:19.904444 sonic-dut INFO pmon#supervisord 2025-03-17 17:24:19,904 WARN exited: xcvrd (terminated by SIGKILL; not expected)

Motivation and Context

With #582 merged, we are now updating the VDM threshold data for all types of transceivers.

However, for transceivers which are CMIS compliant but have flat memory, they don't have VDM support. The driver handler for fetching the VDM threshold data does not check if the CMIS transceiver supports VDM or not, which causes xcvrd to crash.
https://github.com/sonic-net/sonic-platform-common/blob/e5aedb6bab10a16d0167488eb9e291805c397c8f/sonic_platform_base/sonic_xcvr/api/public/cmis.py#L2619

To address this issue, ensure that a transceiver is flat memory based before reading the VDM threshold data from the transceiver.

How Has This Been Tested?

  1. Ensured that xcvrd is stable and VDM threshold table is present for CMIS optics supporting VDM
  2. Ensured that xcvrd is stable and VDM threshold table is present for C-CMIS optics supporting VDM
  3. Ensured that xcvrd is stable and VDM threshold table is not present for
    3.1 CMIS optics not supporting VDM + does not have flat memory
    3.2 CMIS optics but has flat memory
    3.3 10G SFP

Additional Information (Optional)

MSFT ADO - 31849344

…vers

Signed-off-by: Mihir Patel <patelmi@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mihirpat1 mihirpat1 changed the title [xcvrd] Skip VDM threshold DB update for copper or flat memory based transceivers [xcvrd] Skip VDM threshold DB update for flat memory transceivers Mar 17, 2025
@mihirpat1 mihirpat1 marked this pull request as ready for review March 17, 2025 22:30
@mihirpat1 mihirpat1 requested a review from Copilot March 17, 2025 22:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prevents updates of VDM threshold data in the DB for flat memory transceivers, addressing crashes caused when trying to read threshold values on optics that do not support VDM. Key changes include:

  • Skipping DB updates in db_utils when a transceiver is flat memory
  • Adding tests to verify behavior for both flat memory and non-flat memory transceivers
  • Introducing the is_transceiver_flat_memory utility method in the common utilities

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
sonic-xcvrd/tests/test_xcvrd.py Added tests for flat memory handling and verifying that no DB update occurs when transceiver is flat memory
sonic-xcvrd/xcvrd/xcvrd_utilities/utils.py Introduced is_transceiver_flat_memory to determine flat memory condition based on the transceiver API
sonic-xcvrd/xcvrd/dom/utilities/vdm/db_utils.py Modified DB update function to skip updating when a transceiver is flat memory
Comments suppressed due to low confidence (2)

sonic-xcvrd/xcvrd/dom/utilities/vdm/db_utils.py:94

  • [nitpick] Consider adding a log message before returning when a transceiver is identified as flat memory to provide better traceability in production.
if self.xcvrd_utils.is_transceiver_flat_memory(physical_port):

sonic-xcvrd/tests/test_xcvrd.py:622

  • [nitpick] Consider adding an assertion to verify that VDM thresholds are correctly updated when the transceiver supports VDM (i.e. when is_transceiver_flat_memory returns False).
vdm_db_utils.xcvrd_utils.is_transceiver_flat_memory = MagicMock(return_value=False)

@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-platform-daemons.msft#11

mihirpat1 added a commit to mihirpat1/sonic-platform-daemons that referenced this pull request May 6, 2025
…nic-net#595)

* Skip VDM threshold DB update for copper or flat memory based transceivers

Signed-off-by: Mihir Patel <patelmi@microsoft.com>

* Simplified if check

* Removed is_copper check

* Removed is_copper check from test case

---------

Signed-off-by: Mihir Patel <patelmi@microsoft.com>
lotus-nexthop pushed a commit to lotus-nexthop/sonic-platform-daemons that referenced this pull request Oct 28, 2025
…eric health ID for other brands (sonic-net#595)

Description
Fix health check for SSD vendors: add a parser for ATP, and add a generic health ID for other brands.
Each vendor stores health information in different SMART attributes.
ATP stores it in attribute ID 248, we add a parser for it.
We also have SSDs use Attribute ID 231 and it is commonly used, so add it in the generic parser.
Skip obtaining vendor SSD info for ATP and Virtium NVMe SSD because they are handle by parse_generic_ssd_info and parse_vendor_ssd_info will overwrite data with N/A.
Add unit test cases for ATP SATA/NVMe SSD.

Motivation and Context
show platform ssdhealth shows N/A health for some qualified SSDs.

Back port request
 202412
 202505
How Has This Been Tested?
We have tested the code change on DUTs with different SSDs including all the qualified SSDs that show N/A in health and also on the ones that worked fine before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants