Skip to content

Handle error seen when interface counter is not available in COUNTERS_DB#341

Merged
SuvarnaMeenakshi merged 9 commits intosonic-net:masterfrom
SuvarnaMeenakshi:err_if_counter
Mar 4, 2025
Merged

Handle error seen when interface counter is not available in COUNTERS_DB#341
SuvarnaMeenakshi merged 9 commits intosonic-net:masterfrom
SuvarnaMeenakshi:err_if_counter

Conversation

@SuvarnaMeenakshi
Copy link
Copy Markdown
Contributor

@SuvarnaMeenakshi SuvarnaMeenakshi commented Dec 7, 2024

- What I did
Fix to handle error seen:

snmp-subagent [ax_interface] ERROR: SubtreeMIBEntry.__call__() caught an unexpected exception during _callable_.__call__()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.9/dist-packages/ax_interface/mib.py", line 194, in __call__#012    return self._callable_.__call__(sub_id, *self._callable_args)#012  File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/vendor/cisco/ciscoPfcExtMIB.py", line 248, in indications_per_priority#012    counter_value += self._get_counter(mibs.get_index_from_str(lag_member), counter_name)#012TypeError: unsupported operand type(s) for +=

Above error log occurs when the services restarted, swss/syncd due to some crash/reboot/config reload, this will also cause snmp service to restart. During this time, it can happen that all interface COUNTERS are not yet available in COUNTERS_DB for a short period. At this point, if a SNMP query is made to retrieve interface/PFC counters, then this error syslog will show up until the COUNTERS_DB data is populated with counters for all interfaces.
MSFT ADO 26506804

- How I did it
Avoid adding up counters if _get_counter returns None.

- How to verify it
Before FIX:
send a continuous query to get pfc counters for any of the configured port-channel interface:

watch -n 1 snmpwalk -v2c -c <comm> <ip> 1.3.6.1.4.1.9.9.813.1.2.1.3.1103    

on the device, execute config reload and we should see the error log on the device:

INFO snmp#snmp-subagent [ax_interface] INFO: Registering subID: [.1.3.6.1.2.1.2.2.1.20]
INFO snmp#snmp-subagent [ax_interface] INFO: Registering subID: [.1.3.6.1.2.1.2.2.1.21]
INFO snmp#snmp-subagent [ax_interface] INFO: Registering subID: [.1.3.6.1.2.1.2.2.1.22]
INFO snmp#snmp-subagent [ax_interface] INFO: OID registration complete. Waiting to receive PDUs...
NOTICE pmon#xcvrd[31]: Starting up...
NOTICE pmon#xcvrd[31]: XCVRD INIT: Start daemon init...
INFO lldp#supervisord 2025-01-15 19:15:36,067 INFO spawned: 'lldpmgrd' with pid 36
WARNING snmp#snmp-subagent [sonic_ax_impl] WARNING: SyncD 'COUNTERS_DB' missing attribute ''SAI_PORT_STAT_PFC_0_RX_PKTS''.
ERR snmp#snmp-subagent [ax_interface] ERROR: SubtreeMIBEntry.__call__() caught an unexpected exception during _callable_.__call__()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.9/dist-packages/ax_interface/mib.py", line 194, in __call__#012    return self._callable_.__call__(sub_id, *self._callable_args)#012  File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/vendor/cisco/ciscoPfcExtMIB.py", line 248, in indications_per_priority#012    counter_value += self._get_counter(mibs.get_index_from_str(lag_member), counter_name)#012TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'

After fix
Perform same snmpwalk as above, SNMP ERR log will not be seen.

- Description for the changelog

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

for lag_member in self.lag_name_if_name_map[self.oid_lag_name_map[port_oid]]:
counter_value += self._get_counter(mibs.get_index_from_str(lag_member), counter_name)
member_counter = self._get_counter(mibs.get_index_from_str(lag_member), counter_name)
if member_counter:
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if

In else branch, should we set counter_value=None, so client could still query it and understand some internal counter issue. #Closed

Copy link
Copy Markdown
Contributor Author

@SuvarnaMeenakshi SuvarnaMeenakshi Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that "if member_counter" condition will hold good if member_counter is not None or is not zero.
In case a specific counter is 0, that will also drop to else condition.
I can do either:

if member_counter:
     add ..

or

if member_counter is not None:
    add...
else:
   return None # this will handle the scenario where _get_counter returned None for any of its member interface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuvarnaMeenakshi , I think @qiluo-msft original comment is suggesting you to take on the second choice to return None. Please go ahead modify your PR accordingly and we can ask @qiluo-msft to help review it again.
Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested and corresponding unit-test.

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
(cherry picked from commit da06490cada1d4b83d44ec6e183e84b8fcd48b36)
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@SuvarnaMeenakshi SuvarnaMeenakshi merged commit 25f9e4f into sonic-net:master Mar 4, 2025
5 checks passed
ssithaia-ebay pushed a commit to ssithaia-ebay/sonic-snmpagent that referenced this pull request May 23, 2025
…_DB (sonic-net#341)

**- What I did**
Fix to handle error seen:
```
snmp-subagent [ax_interface] ERROR: SubtreeMIBEntry.__call__() caught an unexpected exception during _callable_.__call__()#012Traceback (most recent call last):sonic-net#12  File "/usr/local/lib/python3.9/dist-packages/ax_interface/mib.py", line 194, in __call__#012    return self._callable_.__call__(sub_id, *self._callable_args)sonic-net#12  File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/vendor/cisco/ciscoPfcExtMIB.py", line 248, in indications_per_priority#012    counter_value += self._get_counter(mibs.get_index_from_str(lag_member), counter_name)#012TypeError: unsupported operand type(s) for +=
```

Above error log occurs when the services restarted, swss/syncd due to some crash/reboot/config reload, this will also cause snmp service to restart. During this time, it can happen that all interface COUNTERS are not yet available in COUNTERS_DB for a short period. At this point, if a SNMP query is made to retrieve interface/PFC counters, then this error syslog will show up until the COUNTERS_DB data is populated with counters for all interfaces.
MSFT ADO 26506804

**- How I did it**
Avoid adding up counters if _get_counter returns None.

**- How to verify it**
Before FIX:
send a continuous query to get pfc counters for any of the configured port-channel interface:
watch -n 1 snmpwalk -v2c -c <comm> <ip> 1.3.6.1.4.1.9.9.813.1.2.1.3.1103    
on the device, execute config reload and we should see the error log on the device:
After fix, perform same snmpwalk as above, SNMP ERR log will not be seen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants