Skip to content

xcvrd received an error "get redis type 3 got type 5, err: NON-STRING-REPLY" when it tried removing an entry in TRANSCEIVER_DOM_SENSOR table for an absent SFP module #10530

@stephenxs

Description

@stephenxs

Description

Error NON-STRING-REPLY: Input/output error occurred when xcvrd tries removing TRANSCEIVER_DOM_SENSOR table in test_check_sfputil_reset.

Mar 23 20:08:34.437008 r-leopard-41 INFO pmon#/supervisord: xcvrd Traceback (most recent call last):
Mar 23 20:08:34.437075 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
Mar 23 20:08:34.437075 r-leopard-41 INFO pmon#/supervisord: xcvrd     self.run()
Mar 23 20:08:34.437102 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
Mar 23 20:08:34.437156 r-leopard-41 INFO pmon#/supervisord: xcvrd     self._target(*self._args, **self._kwargs)
Mar 23 20:08:34.437156 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 1140, in task_worker
Mar 23 20:08:34.437184 r-leopard-41 INFO pmon#/supervisord: xcvrd     del_port_sfp_dom_info_from_db(logical_port, self.port_mapping, xcvr_table_helper.get_intf_tbl(asic_index), xcvr_table_helper.get_dom_tbl(asic_index))
Mar 23 20:08:34.437242 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/local/lib/python3.7/dist-packages/xcvrd/xcvrd.py", line 548, in del_port_sfp_dom_info_from_db
Mar 23 20:08:34.437242 r-leopard-41 INFO pmon#/supervisord: xcvrd     dom_tbl._del(port_name)
Mar 23 20:08:34.437242 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2474, in _del
Mar 23 20:08:34.437275 r-leopard-41 INFO pmon#/supervisord: xcvrd     return self.delete(*args, **kwargs)
Mar 23 20:08:34.437324 r-leopard-41 INFO pmon#/supervisord: xcvrd   File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2468, in delete
Mar 23 20:08:34.437324 r-leopard-41 INFO pmon#/supervisord: xcvrd     return _swsscommon.Table_delete(self, *args, **kwargs)
Mar 23 20:08:34.437324 r-leopard-41 INFO pmon#/supervisord: xcvrd RuntimeError: Expected to get redis type 3 got type 5, err: NON-STRING-REPLY: Input/output error

Steps to reproduce the issue:

  1. Run test_check_sfputil_reset test.

The issue is reproduced at a very low probability.

Describe the results you received:

See above.

Describe the results you expected:

There should not be an error.

Output of show version:


SONiC Software Version: SONiC.202111.36-d5354df3d_Internal
Distribution: Debian 11.2
Kernel: 5.10.0-8-2-amd64
Build commit: d5354df3d
Build date: Wed Mar 23 15:13:58 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2047X05586
Model Number: MSN4700-WS2FO
Hardware Revision: A1
Uptime: 11:19:26 up  8:06,  1 user,  load average: 0.28, 0.30, 0.43

Docker images:
REPOSITORY                                         TAG                            IMAGE ID       SIZE
docker-dhcp-relay                                  latest                         b730b1609fe1   440MB
docker-teamd                                       202111.36-d5354df3d_Internal   7504fb9dfede   440MB
docker-teamd                                       latest                         7504fb9dfede   440MB
docker-syncd-mlnx                                  202111.36-d5354df3d_Internal   319ca0799459   919MB
docker-syncd-mlnx                                  latest                         319ca0799459   919MB
docker-sonic-telemetry                             202111.36-d5354df3d_Internal   fd34833c0e65   515MB
docker-sonic-telemetry                             latest                         fd34833c0e65   515MB
docker-sonic-mgmt-framework                        202111.36-d5354df3d_Internal   c6b6976eb7a6   582MB
docker-sonic-mgmt-framework                        latest                         c6b6976eb7a6   582MB
docker-snmp                                        202111.36-d5354df3d_Internal   d10b4d6808a2   470MB
docker-snmp                                        latest                         d10b4d6808a2   470MB
docker-sflow                                       202111.36-d5354df3d_Internal   a2d91e19d650   441MB
docker-sflow                                       latest                         a2d91e19d650   441MB
docker-router-advertiser                           202111.36-d5354df3d_Internal   c6405b50d408   426MB
docker-router-advertiser                           latest                         c6405b50d408   426MB
docker-platform-monitor                            202111.36-d5354df3d_Internal   5559e18327e9   811MB
docker-platform-monitor                            latest                         5559e18327e9   811MB
docker-orchagent                                   202111.36-d5354df3d_Internal   0846a835d5bb   459MB
docker-orchagent                                   latest                         0846a835d5bb   459MB
docker-nat                                         202111.36-d5354df3d_Internal   f91e91c53826   443MB
docker-nat                                         latest                         f91e91c53826   443MB
docker-mux                                         202111.36-d5354df3d_Internal   8181bb494c0c   479MB
docker-mux                                         latest                         8181bb494c0c   479MB
docker-macsec                                      202111.36-d5354df3d_Internal   d2037e3dfc19   444MB
docker-macsec                                      latest                         d2037e3dfc19   444MB
docker-lldp                                        202111.36-d5354df3d_Internal   a25eecb07e2e   466MB
docker-lldp                                        latest                         a25eecb07e2e   466MB
docker-fpm-frr                                     202111.36-d5354df3d_Internal   706fc3e4f24e   459MB
docker-fpm-frr                                     latest                         706fc3e4f24e   459MB
docker-database                                    202111.36-d5354df3d_Internal   1f1cc72c4651   426MB
docker-database                                    latest                         1f1cc72c4651   426MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.2.0-202111-internal-20       d9e7b3422f7d   472MB

Output of show techsupport:

(paste your output here or download and attach the file here )

sonic_dump_r-leopard-41_20220323_201136.tar.gz
sonic_dump_r-tigon-17_20220323_202911.tar.gz

Additional information you deem important (e.g. issue happens only occasionally):

It is likely that the issue is caused by socket being shared between parent/child processes of xcvrd.
There are two processes in xcvrd
- A main (parent) process which is responsible for DOM information update. Implemented in class DomInfoUpdateTask
- A child process which is responsible for SFP change event handling. Implemented in class SfpStateUpdateTask
Both child and parent share the global xcvr_table_helper which is an instance of XcvrTableHelper. In XcvrTableHelper it will creates a db_connect object which is on top of a socket. It is created in DaemonXcvrd.init which is called by DaemonXcvrd.run. Later sfp_state_update is created and SfpStateUpdateTask.task_run is called to create the child process which will inherite all the resources, including the socket, with the parent.

When the system runs, both child and parent interact with redis-db server via the socket. The message can interleave between child and parent.
Normal flow should be like

P - parent
C - child
R - redis-db server

P sends request to R
R responds P with rp
P receives and handles rp and continue to run
C sends request to R
R responds P with rc
C receives and handles rc and continue to run
...

But in some rare case, the interleaved flow can be like this:

P send request p to R
C send request c to R
R responds P with rp
R responds C with rc
C receives and handles the response rp which was sent to parent. At this point it complains for receiving a mismatched redis-reply

Metadata

Metadata

Assignees

Labels

Triagedthis issue has been triaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions