Allow SNMP interface error/discard test to tolerate some live traffic#23272
Merged
wsycqyz merged 3 commits intosonic-net:masterfrom Mar 30, 2026
Merged
Allow SNMP interface error/discard test to tolerate some live traffic#23272wsycqyz merged 3 commits intosonic-net:masterfrom
wsycqyz merged 3 commits intosonic-net:masterfrom
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azpw run |
Collaborator
|
Retrying failed(or canceled) jobs... |
Contributor
Author
|
/azpw retry |
Collaborator
|
Retrying failed(or canceled) jobs... |
Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
987f08a to
d717e18
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
yxieca
approved these changes
Mar 26, 2026
Collaborator
yxieca
left a comment
There was a problem hiding this comment.
AI agent on behalf of Ying. Reviewed; no issues found.
selldinesh
pushed a commit
to selldinesh/sonic-mgmt
that referenced
this pull request
Apr 1, 2026
…sonic-net#23272) * account for possible live traffic Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * limit live traffic tolerance Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * fix docstring nit Signed-off-by: Dan Caugherty <dcaugher@cisco.com> --------- Signed-off-by: Dan Caugherty <dcaugher@cisco.com> Signed-off-by: selldinesh <dinesh.sellappan@keysight.com>
albertovillarreal-keys
pushed a commit
to albertovillarreal-keys/sonic-mgmt
that referenced
this pull request
Apr 3, 2026
…sonic-net#23272) * account for possible live traffic Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * limit live traffic tolerance Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * fix docstring nit Signed-off-by: Dan Caugherty <dcaugher@cisco.com> --------- Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
mssonicbld
pushed a commit
to mssonicbld/sonic-mgmt
that referenced
this pull request
Apr 10, 2026
…sonic-net#23272) * account for possible live traffic Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * limit live traffic tolerance Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * fix docstring nit Signed-off-by: Dan Caugherty <dcaugher@cisco.com> --------- Signed-off-by: Dan Caugherty <dcaugher@cisco.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Collaborator
|
Cherry-pick PR to 202511: #23807 |
Merged
12 tasks
12 tasks
rraghav-cisco
pushed a commit
to rraghav-cisco/sonic-mgmt
that referenced
this pull request
Apr 20, 2026
…sonic-net#23272) * account for possible live traffic Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * limit live traffic tolerance Signed-off-by: Dan Caugherty <dcaugher@cisco.com> * fix docstring nit Signed-off-by: Dan Caugherty <dcaugher@cisco.com> --------- Signed-off-by: Dan Caugherty <dcaugher@cisco.com> Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
ZhaohuiS
added a commit
to ZhaohuiS/sonic-mgmt
that referenced
this pull request
Apr 20, 2026
PR sonic-net#23272 refactored verify_snmp_counter to use check_counter_with_margin but regressed the ifInErrors and ifOutErrors expected values back to COUNTER_VALUE instead of COUNTER_VALUE * num_port_intfs. For PortChannels with multiple member ports, COUNTER_VALUE is set on each member port, so SNMP reports the aggregate. The expected value must be multiplied by the number of member ports, consistent with how ifInDiscards and ifOutDiscards are already handled. This was originally fixed by PR sonic-net#22118 but lost during the sonic-net#23272 refactor. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
12 tasks
StormLiangMS
pushed a commit
that referenced
this pull request
Apr 21, 2026
…els (#24058) ### Description of PR Summary: Fix regression in `verify_snmp_counter` where `ifInErrors` and `ifOutErrors` expected values were not multiplied by `num_port_intfs` for multi-member PortChannel interfaces. PR #23272 refactored the counter validation to use `check_counter_with_margin` but regressed the `ifInErrors`/`ifOutErrors` expected values back to `COUNTER_VALUE` instead of `COUNTER_VALUE * num_port_intfs`, which was originally fixed by PR #22118. Fixes #36433389 failed reason: ``` > pytest_assert(wait_until(60, 10, 0, verify_snmp_counter, duthost, localhost, creds_all_duts, hostip, mg_facts, rif_interface, rif_counters, port_counters, num_port_intfs), "SNMP counter validate Failure") E Failed: SNMP counter validate Failure ``` test log: `14/04/2026 00:26:02 test_snmp_interfaces.check_counter_with_ L0277 INFO | ifInErrors value is 10000 but must not exceed 5100 (expected 5000 + margin 100)` ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? For PortChannel interfaces with multiple member ports (e.g., 2-member LAG), `COUNTER_VALUE` (5000) is set on **each** member port's `SAI_PORT_STAT_IF_IN_ERRORS` and `SAI_PORT_STAT_IF_OUT_ERRORS`. SNMP aggregates these across all member ports, so the reported `ifInErrors`/`ifOutErrors` equals `COUNTER_VALUE * num_port_intfs`. PR #22118 correctly fixed this, but PR #23272 (which refactored to `check_counter_with_margin`) inadvertently regressed it back to comparing against just `COUNTER_VALUE`. This causes `test_snmp_interfaces_error_discard` to fail on any PortChannel with >1 member port. Note: `ifInDiscards` and `ifOutDiscards` are already correctly using aggregated port counters — only `ifInErrors`/`ifOutErrors` had this regression. #### How did you do it? Changed the expected values in `verify_snmp_counter` from `COUNTER_VALUE` to `COUNTER_VALUE * num_port_intfs` for both `ifInErrors` and `ifOutErrors`, matching the pattern already used for port counter assertions and discards. #### How did you verify/test it? Ran `test_snmp_interfaces_error_discard` on testbed-bjw2-can-t0-7260-1 (t0-116 topology, internal-202511) with this fix: - **Before fix**: FAILED with `SNMP counter validate Failure` (ifInErrors was 10000 but expected 5000, because PortChannel had 2 members) - **After fix**: **1 passed** in 365.74s #### Any platform specific information? Affects any platform where PortChannels have multiple member ports. Verified on Broadcom 7260CX3 (Memory:4096M, 64 ports). #### Supported testbed topology if it's a new test case? N/A (existing test fix) ### Documentation N/A Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Merged
12 tasks
mssonicbld
added a commit
that referenced
this pull request
Apr 21, 2026
…lti-member PortChannels (#24097) ### Description of PR Summary: Fix regression in `verify_snmp_counter` where `ifInErrors` and `ifOutErrors` expected values were not multiplied by `num_port_intfs` for multi-member PortChannel interfaces. PR #23272 refactored the counter validation to use `check_counter_with_margin` but regressed the `ifInErrors`/`ifOutErrors` expected values back to `COUNTER_VALUE` instead of `COUNTER_VALUE LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests num_port_intfs`, which was originally fixed by PR #22118. Fixes #36433389 failed reason: ``` > pytest_assert(wait_until(60, 10, 0, verify_snmp_counter, duthost, localhost, creds_all_duts, hostip, mg_facts, rif_interface, rif_counters, port_counters, num_port_intfs), "SNMP counter validate Failure") E Failed: SNMP counter validate Failure ``` test log: `14/04/2026 00:26:02 test_snmp_interfaces.check_counter_with_ L0277 INFO | ifInErrors value is 10000 but must not exceed 5100 (expected 5000 + margin 100)` ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? For PortChannel interfaces with multiple member ports (e.g., 2-member LAG), `COUNTER_VALUE` (5000) is set on **each** member port's `SAI_PORT_STAT_IF_IN_ERRORS` and `SAI_PORT_STAT_IF_OUT_ERRORS`. SNMP aggregates these across all member ports, so the reported `ifInErrors`/`ifOutErrors` equals `COUNTER_VALUE LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests num_port_intfs`. PR #22118 correctly fixed this, but PR #23272 (which refactored to `check_counter_with_margin`) inadvertently regressed it back to comparing against just `COUNTER_VALUE`. This causes `test_snmp_interfaces_error_discard` to fail on any PortChannel with >1 member port. Note: `ifInDiscards` and `ifOutDiscards` are already correctly using aggregated port counters — only `ifInErrors`/`ifOutErrors` had this regression. #### How did you do it? Changed the expected values in `verify_snmp_counter` from `COUNTER_VALUE` to `COUNTER_VALUE LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests num_port_intfs` for both `ifInErrors` and `ifOutErrors`, matching the pattern already used for port counter assertions and discards. #### How did you verify/test it? Ran `test_snmp_interfaces_error_discard` on testbed-bjw2-can-t0-7260-1 (t0-116 topology, internal-202511) with this fix: - **Before fix**: FAILED with `SNMP counter validate Failure` (ifInErrors was 10000 but expected 5000, because PortChannel had 2 members) - **After fix**: **1 passed** in 365.74s #### Any platform specific information? Affects any platform where PortChannels have multiple member ports. Verified on Broadcom 7260CX3 (Memory:4096M, 64 ports). #### Supported testbed topology if it's a new test case? N/A (existing test fix) ### Documentation N/A Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fix flaky
test_snmp_interfaces_error_discardtest by using >= comparison instead of exact equality when verifying SNMP counters.Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
The
test_snmp_interfaces_error_discardtest fails intermittently with errors like:This occurs because live traffic on the test interface can cause additional discards between when the test sets counter values in Redis and when SNMP is queried. The test expects exact counter values, but real traffic on PortChannel member interfaces adds extra discards.
How did you do it?
Changed the counter verification from exact equality (
!=) to minimum threshold (<) comparison. The test now passes if the SNMP counter value is at least the expected value, tolerating additional discards from live traffic. The amount of live traffic, however, is limited to 100 packets or 5% of the expected total count, whichever is less.How did you verify/test it?
The fix addresses the root cause identified in test failure logs where
ifInDiscardsshowed 10018 instead of 10000 due to 18 extra discards on a PortChannel member interface from live traffic.Any platform specific information?
No platform-specific changes. The fix applies to all platforms.
Supported testbed topology if it's a new test case?
N/A - existing test case fix.