Skip to content

Fix test_advanced_reboot: configurable control_plane_down_timeout + SSH thread cleanup on failure#2

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/run-python-test-case
Draft

Fix test_advanced_reboot: configurable control_plane_down_timeout + SSH thread cleanup on failure#2
Copilot wants to merge 2 commits intomasterfrom
copilot/run-python-test-case

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 15, 2026

test_fast_reboot fails with TimeoutError: DUT hasn't shutdown in 600 seconds because control_plane_down_timeout was hardcoded, and the resulting exception skipped sending quit to neighbor SSH threads, leaving them blocked on queue.get() until SIGTERM killed the PTF process.

Changes

  • ptftests/py3/advanced-reboot.py

    • Replace hardcoded self.control_plane_down_timeout = 600 with a test parameter (check_param('control_plane_down_timeout', 600)), preserving default behavior
    • In the except block of runTest(), send quit via put_nowait to all SSH threads to unblock them when the DUT timeout path is taken — previously these threads hung indefinitely since handle_post_reboot_health_check() (where quit is normally sent) was skipped
  • tests/common/platform/args/advanced_reboot_args.py

    • Add --control_plane_down_timeout CLI option (default 600) so slow-rebooting hardware can set a longer threshold
  • tests/common/fixtures/advanced_reboot.py

    • Read --control_plane_down_timeout and forward it as control_plane_down_timeout in the PTF params dict
# Before: hardcoded, no way to override
self.control_plane_down_timeout = 600

# After: driven by test parameter, configurable via --control_plane_down_timeout CLI flag
self.check_param('control_plane_down_timeout', 600, required=False)
self.control_plane_down_timeout = self.test_params['control_plane_down_timeout']

📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

…e and fix SSH thread cleanup on failure

Co-authored-by: ravaliyel <227423972+ravaliyel@users.noreply.github.com>
Copilot AI changed the title [WIP] Run advanced reboot test case with pytest Fix test_advanced_reboot: configurable control_plane_down_timeout + SSH thread cleanup on failure Mar 15, 2026
Copilot AI requested a review from ravaliyel March 15, 2026 07:26
ravaliyel pushed a commit that referenced this pull request Mar 27, 2026
…kets not received on collector interface (sonic-net#22186)

* [sonic-mgmt] Fix sflow/test_sflow.py failures with expected sflow packets not received on collector interface

Issue #1:
In some cases (like sflow config enabled for first time, device reboot),
hsflowd daemon is taking little over 3 mins to be fully initialized and
process collector config. During this window, hsflowd service won't send
sflow packets ('CounterSample', 'FlowSample' etc) to collector interface
and thus test can fail with i) "Packets are not received in active
collector, collector\d+" and ii) "Expected Number of samples are not
collected from Interface Ethernet\d+ in collector collector\d+ , Received \d+"

hsflowd service is writing to "/etc/hsflowd.auto" once it's processed
collector configuration. Thus waiting for collector info to be present in
"/etc/hsflowd.auto" seems to be safe option before proceeding with
sflow traffic verfication.

Issue #2:
If the test expects flow samples/packets on the collector interface but they aren't
seen for some reason, then we are hitting "KeyError: 'flow_port_count'". Due to
counter samples seen on collector interface, "data['total_samples']" will not be
zero but "data['total_flow_count']" will be 0 and lead to KeyError when tried to
access "data['flow_port_count']". Fix is to have assert on "total_flow_count" and
"total_counter_count" before calling corresponding sample analyze functions.

Signed-off-by: Vinod <vkjammala@arista.com>

* Addressing review comments

1) Enhanced "wait_until_hsflowd_ready" to make it wait for all the
   collector IPs (instead of calling it sequentially for each IP)
2) Add docstring for "wait_until_hsflowd_ready" function
3) Updated "ast.literal_eval" usage to handle the case where
   "active_collectors" is passed as empty string ("" instead of "[]")

Signed-off-by: Vinod <vkjammala@arista.com>

* Fix pre-commit check failures

Signed-off-by: Vinod <vkjammala@arista.com>

* Revert PR#21674 partially to enable "sflow/test_sflow.py" test

Signed-off-by: Vinod <vkjammala@arista.com>

---------

Signed-off-by: Vinod <vkjammala@arista.com>
ravaliyel pushed a commit that referenced this pull request Mar 27, 2026
…kets not received on collector interface (sonic-net#22186)

* [sonic-mgmt] Fix sflow/test_sflow.py failures with expected sflow packets not received on collector interface

Issue #1:
In some cases (like sflow config enabled for first time, device reboot),
hsflowd daemon is taking little over 3 mins to be fully initialized and
process collector config. During this window, hsflowd service won't send
sflow packets ('CounterSample', 'FlowSample' etc) to collector interface
and thus test can fail with i) "Packets are not received in active
collector, collector\d+" and ii) "Expected Number of samples are not
collected from Interface Ethernet\d+ in collector collector\d+ , Received \d+"

hsflowd service is writing to "/etc/hsflowd.auto" once it's processed
collector configuration. Thus waiting for collector info to be present in
"/etc/hsflowd.auto" seems to be safe option before proceeding with
sflow traffic verfication.

Issue #2:
If the test expects flow samples/packets on the collector interface but they aren't
seen for some reason, then we are hitting "KeyError: 'flow_port_count'". Due to
counter samples seen on collector interface, "data['total_samples']" will not be
zero but "data['total_flow_count']" will be 0 and lead to KeyError when tried to
access "data['flow_port_count']". Fix is to have assert on "total_flow_count" and
"total_counter_count" before calling corresponding sample analyze functions.

Signed-off-by: Vinod <vkjammala@arista.com>

* Addressing review comments

1) Enhanced "wait_until_hsflowd_ready" to make it wait for all the
   collector IPs (instead of calling it sequentially for each IP)
2) Add docstring for "wait_until_hsflowd_ready" function
3) Updated "ast.literal_eval" usage to handle the case where
   "active_collectors" is passed as empty string ("" instead of "[]")

Signed-off-by: Vinod <vkjammala@arista.com>

* Fix pre-commit check failures

Signed-off-by: Vinod <vkjammala@arista.com>

* Revert PR#21674 partially to enable "sflow/test_sflow.py" test

Signed-off-by: Vinod <vkjammala@arista.com>

---------

Signed-off-by: Vinod <vkjammala@arista.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants