Skip to content

Allow SNMP interface error/discard test to tolerate some live traffic#23272

Merged
wsycqyz merged 3 commits intosonic-net:masterfrom
dcaugher:fix-snmp-interface-test
Mar 30, 2026
Merged

Allow SNMP interface error/discard test to tolerate some live traffic#23272
wsycqyz merged 3 commits intosonic-net:masterfrom
dcaugher:fix-snmp-interface-test

Conversation

@dcaugher
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fix flaky test_snmp_interfaces_error_discard test by using >= comparison instead of exact equality when verifying SNMP counters.

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

The test_snmp_interfaces_error_discard test fails intermittently with errors like:

ifInDiscards value is 10018 but must be 10000

This occurs because live traffic on the test interface can cause additional discards between when the test sets counter values in Redis and when SNMP is queried. The test expects exact counter values, but real traffic on PortChannel member interfaces adds extra discards.

How did you do it?

Changed the counter verification from exact equality (!=) to minimum threshold (<) comparison. The test now passes if the SNMP counter value is at least the expected value, tolerating additional discards from live traffic. The amount of live traffic, however, is limited to 100 packets or 5% of the expected total count, whichever is less.

How did you verify/test it?

The fix addresses the root cause identified in test failure logs where ifInDiscards showed 10018 instead of 10000 due to 18 extra discards on a PortChannel member interface from live traffic.

Any platform specific information?

No platform-specific changes. The fix applies to all platforms.

Supported testbed topology if it's a new test case?

N/A - existing test case fix.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dcaugher dcaugher marked this pull request as ready for review March 24, 2026 19:29
@dcaugher
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@dcaugher
Copy link
Copy Markdown
Contributor Author

/azpw retry

@mssonicbld
Copy link
Copy Markdown
Collaborator

Retrying failed(or canceled) jobs...

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
@dcaugher dcaugher force-pushed the fix-snmp-interface-test branch from 987f08a to d717e18 Compare March 24, 2026 19:52
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

@wsycqyz wsycqyz added the Request for 202511 branch Request to backport a change to 202511 branch label Mar 30, 2026
@wsycqyz wsycqyz merged commit 30b9309 into sonic-net:master Mar 30, 2026
19 checks passed
selldinesh pushed a commit to selldinesh/sonic-mgmt that referenced this pull request Apr 1, 2026
…sonic-net#23272)

* account for possible live traffic

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* limit live traffic tolerance

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* fix docstring nit

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

---------

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: selldinesh <dinesh.sellappan@keysight.com>
albertovillarreal-keys pushed a commit to albertovillarreal-keys/sonic-mgmt that referenced this pull request Apr 3, 2026
…sonic-net#23272)

* account for possible live traffic

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* limit live traffic tolerance

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* fix docstring nit

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

---------

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Apr 10, 2026
…sonic-net#23272)

* account for possible live traffic

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* limit live traffic tolerance

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* fix docstring nit

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

---------

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #23807

rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Apr 20, 2026
…sonic-net#23272)

* account for possible live traffic

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* limit live traffic tolerance

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

* fix docstring nit

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>

---------

Signed-off-by: Dan Caugherty <dcaugher@cisco.com>
Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
ZhaohuiS added a commit to ZhaohuiS/sonic-mgmt that referenced this pull request Apr 20, 2026
PR sonic-net#23272 refactored verify_snmp_counter to use check_counter_with_margin
but regressed the ifInErrors and ifOutErrors expected values back to
COUNTER_VALUE instead of COUNTER_VALUE * num_port_intfs.

For PortChannels with multiple member ports, COUNTER_VALUE is set on
each member port, so SNMP reports the aggregate. The expected value
must be multiplied by the number of member ports, consistent with how
ifInDiscards and ifOutDiscards are already handled.

This was originally fixed by PR sonic-net#22118 but lost during the sonic-net#23272
refactor.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
StormLiangMS pushed a commit that referenced this pull request Apr 21, 2026
…els (#24058)

### Description of PR

Summary:
Fix regression in `verify_snmp_counter` where `ifInErrors` and
`ifOutErrors` expected values were not multiplied by `num_port_intfs`
for multi-member PortChannel interfaces.

PR #23272 refactored the counter validation to use
`check_counter_with_margin` but regressed the `ifInErrors`/`ifOutErrors`
expected values back to `COUNTER_VALUE` instead of `COUNTER_VALUE *
num_port_intfs`, which was originally fixed by PR #22118.

Fixes #36433389

failed reason:
```
>       pytest_assert(wait_until(60, 10, 0, verify_snmp_counter, duthost, localhost, creds_all_duts, hostip, mg_facts,
                                 rif_interface, rif_counters, port_counters, num_port_intfs),
                      "SNMP counter validate Failure")
E       Failed: SNMP counter validate Failure
```
test log:
`14/04/2026 00:26:02 test_snmp_interfaces.check_counter_with_ L0277 INFO
| ifInErrors value is 10000 but must not exceed 5100 (expected 5000 +
margin 100)`

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
    - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

For PortChannel interfaces with multiple member ports (e.g., 2-member
LAG), `COUNTER_VALUE` (5000) is set on **each** member port's
`SAI_PORT_STAT_IF_IN_ERRORS` and `SAI_PORT_STAT_IF_OUT_ERRORS`. SNMP
aggregates these across all member ports, so the reported
`ifInErrors`/`ifOutErrors` equals `COUNTER_VALUE * num_port_intfs`.

PR #22118 correctly fixed this, but PR #23272 (which refactored to
`check_counter_with_margin`) inadvertently regressed it back to
comparing against just `COUNTER_VALUE`. This causes
`test_snmp_interfaces_error_discard` to fail on any PortChannel with >1
member port.

Note: `ifInDiscards` and `ifOutDiscards` are already correctly using
aggregated port counters — only `ifInErrors`/`ifOutErrors` had this
regression.

#### How did you do it?

Changed the expected values in `verify_snmp_counter` from
`COUNTER_VALUE` to `COUNTER_VALUE * num_port_intfs` for both
`ifInErrors` and `ifOutErrors`, matching the pattern already used for
port counter assertions and discards.

#### How did you verify/test it?

Ran `test_snmp_interfaces_error_discard` on testbed-bjw2-can-t0-7260-1
(t0-116 topology, internal-202511) with this fix:
- **Before fix**: FAILED with `SNMP counter validate Failure`
(ifInErrors was 10000 but expected 5000, because PortChannel had 2
members)
- **After fix**: **1 passed** in 365.74s

#### Any platform specific information?
Affects any platform where PortChannels have multiple member ports.
Verified on Broadcom 7260CX3 (Memory:4096M, 64 ports).

#### Supported testbed topology if it's a new test case?
N/A (existing test fix)

### Documentation
N/A

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
mssonicbld added a commit that referenced this pull request Apr 21, 2026
…lti-member PortChannels (#24097)

### Description of PR

Summary:
Fix regression in `verify_snmp_counter` where `ifInErrors` and
`ifOutErrors` expected values were not multiplied by `num_port_intfs`
for multi-member PortChannel interfaces.

PR #23272 refactored the counter validation to use
`check_counter_with_margin` but regressed the `ifInErrors`/`ifOutErrors`
expected values back to `COUNTER_VALUE` instead of `COUNTER_VALUE
LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs
pylintrc pyproject.toml sdn_tests setup-container.sh
sonic_dictionary.txt spytest test_reporting tests num_port_intfs`, which
was originally fixed by PR #22118.

Fixes #36433389

failed reason:
```
> pytest_assert(wait_until(60, 10, 0, verify_snmp_counter, duthost, localhost, creds_all_duts, hostip, mg_facts,
 rif_interface, rif_counters, port_counters, num_port_intfs),
 "SNMP counter validate Failure")
E Failed: SNMP counter validate Failure
```
test log:
`14/04/2026 00:26:02 test_snmp_interfaces.check_counter_with_ L0277 INFO
| ifInErrors value is 10000 but must not exceed 5100 (expected 5000 +
margin 100)`

### Type of change

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511

### Approach
#### What is the motivation for this PR?

For PortChannel interfaces with multiple member ports (e.g., 2-member
LAG), `COUNTER_VALUE` (5000) is set on **each** member port's
`SAI_PORT_STAT_IF_IN_ERRORS` and `SAI_PORT_STAT_IF_OUT_ERRORS`. SNMP
aggregates these across all member ports, so the reported
`ifInErrors`/`ifOutErrors` equals `COUNTER_VALUE LICENSE Makefile
README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc
pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest
test_reporting tests num_port_intfs`.

PR #22118 correctly fixed this, but PR #23272 (which refactored to
`check_counter_with_margin`) inadvertently regressed it back to
comparing against just `COUNTER_VALUE`. This causes
`test_snmp_interfaces_error_discard` to fail on any PortChannel with >1
member port.

Note: `ifInDiscards` and `ifOutDiscards` are already correctly using
aggregated port counters — only `ifInErrors`/`ifOutErrors` had this
regression.

#### How did you do it?

Changed the expected values in `verify_snmp_counter` from
`COUNTER_VALUE` to `COUNTER_VALUE LICENSE Makefile README.md SECURITY.md
ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests
setup-container.sh sonic_dictionary.txt spytest test_reporting tests
num_port_intfs` for both `ifInErrors` and `ifOutErrors`, matching the
pattern already used for port counter assertions and discards.

#### How did you verify/test it?

Ran `test_snmp_interfaces_error_discard` on testbed-bjw2-can-t0-7260-1
(t0-116 topology, internal-202511) with this fix:
- **Before fix**: FAILED with `SNMP counter validate Failure`
(ifInErrors was 10000 but expected 5000, because PortChannel had 2
members)
- **After fix**: **1 passed** in 365.74s

#### Any platform specific information?
Affects any platform where PortChannels have multiple member ports.
Verified on Broadcom 7260CX3 (Memory:4096M, 64 ports).

#### Supported testbed topology if it's a new test case?
N/A (existing test fix)

### Documentation
N/A

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants