Skip to content

Disable loganalyzer for voq_disrupts test#15917

Merged
rlhui merged 1 commit intosonic-net:masterfrom
arista-hpandya:skip-loganalyzer-check-in-voq-disrupts
Dec 14, 2024
Merged

Disable loganalyzer for voq_disrupts test#15917
rlhui merged 1 commit intosonic-net:masterfrom
arista-hpandya:skip-loganalyzer-check-in-voq-disrupts

Conversation

@arista-hpandya
Copy link
Copy Markdown
Contributor

@arista-hpandya arista-hpandya commented Dec 5, 2024

Description of PR

Summary:
Fixes #15916

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

When an LC or a supervisor is rebooted it produces a lot of logs irrelevant to the individual testcases. This is one of the drawbacks faced by the voq disrupts tests that reboots the device and fails because of these unaccounted error logs. Following the theme seen in the below tests that call reboots, I have disabled the log analyzer in voq disrupt to make sure the test focuses more on catching the voq related failures.

How did you do it?

By adding the pytest.mark.disable_loganalyzer pytest mark

How did you verify/test it?

I have run the test on a T2 topology and verified that we no longer hit the loganalyzer errors

Any platform specific information?

NA

Supported testbed topology if it's a new test case?

NA

Documentation

@kenneth-arista
Copy link
Copy Markdown
Contributor

@arlakshm for awareness

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

arista-hpandya commented Dec 9, 2024

Looks like the pipeline failed with an Azure authentication error:

##[error]Error Code: [1]
##[error]Error: Azure login failed
##[error]Script failed with error: ERROR: AADSTS500031: Cannot find signing certificate configured. Trace ID: b11efad7-3c3b-4607-9c3f-d81b4ce04500 Correlation ID: e76892ba-b02e-4a05-acb9-1a03c19be3b7 Timestamp: 2024-12-05 22:00:55Z
Interactive authentication is needed. Please run:
az login

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

/Azp run Azure.sonic-mgmt

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 15917 in repo sonic-net/sonic-mgmt

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-mgmt

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Dec 11, 2024

@arista-hpandya @wenyiz2021 I suggest add the offending error logs to the issue and assess if we should white-list them instead? disabling the log analyzer will generally ignore all issues could be caught by log analyzer therefore leaving windows for regression.

@wenyiz2021
Copy link
Copy Markdown
Contributor

@arista-hpandya @wenyiz2021 I suggest add the offending error logs to the issue and assess if we should white-list them instead? disabling the log analyzer will generally ignore all issues could be caught by log analyzer therefore leaving windows for regression.

@yxieca , here the error logs happened due to reboot, since the other reboot cases are disabling loganalyzer, and this test are all reboot tests except 1 config reload test.

I think it would be hard to whitelist all errors due to reboot.
If we really want to do whitelist, then it should be applicable for other reboot tests?

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

arista-hpandya commented Dec 11, 2024

@arista-hpandya @wenyiz2021 I suggest add the offending error logs to the issue and assess if we should white-list them instead? disabling the log analyzer will generally ignore all issues could be caught by log analyzer therefore leaving windows for regression.

Hi @yxieca and @wenyiz2021 , thank you for reviewing the change. To my knowledge the current process of ignoring logs is via adding them to ansible/roles/test/files/tools/loganalyzer/loganalyzer_common_ignore.txt. The issue with this approach is that it is not test specific. As you rightly mentioned many of these issues raised by loganalyzer are valid issues/regressions to look at. In fact many of them are reproducible in other tests and are being actively worked upon. Adding these logs in the ignore file will hide these errors from all tests and not just the voq_disrupts tests.

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

If you wish I can open an issue for these untracked loganalyzer errors which would ensure that they don't slide away while giving the voq reboot tests the ability to run and verify reboot related checks

@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Dec 11, 2024

@arista-hpandya @wenyiz2021 I suggest add the offending error logs to the issue and assess if we should white-list them instead? disabling the log analyzer will generally ignore all issues could be caught by log analyzer therefore leaving windows for regression.

Hi @yxieca and @wenyiz2021 , thank you for reviewing the change. To my knowledge the current process of ignoring logs is via adding them to ansible/roles/test/files/tools/loganalyzer/loganalyzer_common_ignore.txt. The issue with this approach is that it is not test specific. As you rightly mentioned many of these issues raised by loganalyzer are valid issues/regressions to look at. In fact many of them are reproducible in other tests and are being actively worked upon. Adding these logs in the ignore file will hide these errors from all tests and not just the voq_disrupts tests.

The log analyzer infrastructure support adding whitelist to a single test case.

With that said, reboot related error logs is long and Wenyi is right, other tests currently also choose to disable log analyzer.

@wenyiz2021
Copy link
Copy Markdown
Contributor

there is a way to ignore certain syslog on the testcase
https://github.com/sonic-net/sonic-mgmt/blob/4aa835e441f6c35a7c35f8f7ba62ae09971e3f76/tests/autorestart/test_container_autorestart.py#L82C5-L82C42

@arista-hpandya
Copy link
Copy Markdown
Contributor Author

Thank you @wenyiz2021 and @yxieca for the prompt response. Based on the discussion can we go ahead and merge this change for consistency across all reboot test cases.

@rlhui rlhui added the chassis label Dec 14, 2024
@rlhui rlhui merged commit f3fd9ba into sonic-net:master Dec 14, 2024
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 17, 2024
Disable loganalyzer for voq_disrupts test, error logs due to reboot
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202405: #16098

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: VOQ disrupts test log analyzer failures

8 participants