Skip to content

Handle stale transient FEC errors in test_verify_fec_stats_counters#22733

Open
kewei-arista wants to merge 2 commits intosonic-net:masterfrom
kewei-arista:pr-sonic.8
Open

Handle stale transient FEC errors in test_verify_fec_stats_counters#22733
kewei-arista wants to merge 2 commits intosonic-net:masterfrom
kewei-arista:pr-sonic.8

Conversation

@kewei-arista
Copy link
Copy Markdown
Contributor

Description of PR

test_verify_fec_stats_counters has a similar issue as test_verify_fec_histogram that has been fixed in #21685, where it may fail for a stable link that has stale transient FEC errors.

This change follows the same approach as #21685 to fix the issue for test_verify_fec_stats_counters by polling FEC counters for an interface for a longer time to determine whether the link has been stable, and not fail the test for a stable link with stale transient FEC errors.

This change also optimizes the code path to poll FEC counters so we only need to do it once for both test_verify_fec_stats_counters and test_verify_fec_histogram, so this can save the polling time and speed up the whole test.

This change also addresses the #21685 (comment) so the polling time is now test attribute driven.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

Improve the pass rate by handling these corner cases

How did you do it?

Wait for enough long time to make sure the links are actually stable

How did you verify/test it?

Confirmed the test now can pass with a stable link but transient FEC symbol errors

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS
Copy link
Copy Markdown
Collaborator

Hi @kewei-arista, thanks for the detailed fix!

I noticed this PR overlaps with #22829 (longhuan-cisco) — both address the same root issue: stale/historical FEC uncorrectable errors causing test_verify_fec_stats_counters to falsely fail.

A few issues I found in this PR:

Bug 1: NameError in except ValueError handlers (2 locations)

In get_interface_snapshots and test_verify_fec_stats_counters, the except ValueError block references the variable that failed to be assigned:

try:
    fec_uncorr_int = int(fec_uncorr)
except ValueError:
    pytest.fail("... fec_uncorr: {}".format(intf_name, fec_uncorr_int))  # NameError!

Should use fec_uncorr (the raw string) instead of fec_uncorr_int.

Bug 2: Global state never cleared — breaks multi-HWSKU testbeds

INTF_STATUS_SNAPSHOTS and INTF_FEC_HISTOGRAM_SNAPSHOTS are module-level lists that are never cleared. With enum_rand_one_per_hwsku_frontend_hostname, on a multi-HWSKU testbed:

  • Run 1 (dut1) → populates globals
  • Run 2 (dut2) → if not INTF_STATUS_SNAPSHOTS is False → silently uses dut1's data to validate dut2

This silently validates the wrong DUT's counters.

Design concern: Cross-test state sharing

The shared global mutable state between test_verify_fec_stats_counters and test_verify_fec_histogram makes tests order-dependent and fragile. Consider using a module-scoped pytest fixture instead.

Comparison with #22829:

PR #22829 takes a simpler approach (clear counters + wait 60s) that avoids these issues. However, this PR has the advantage of also optimizing the histogram test polling. Perhaps the histogram optimization could be a separate follow-up PR once the bugs above are addressed?

@prgeor
Copy link
Copy Markdown
Contributor

prgeor commented Mar 16, 2026

@kewei-arista can we use this PR? #22829

StormLiangMS

This comment was marked as duplicate.

StormLiangMS

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants