Skip to content

Verify generated coredump files between reboot cycles#2334

Merged
vaibhavhd merged 3 commits intosonic-net:masterfrom
vaibhavhd:cont-wb-improvements
Oct 14, 2020
Merged

Verify generated coredump files between reboot cycles#2334
vaibhavhd merged 3 commits intosonic-net:masterfrom
vaibhavhd:cont-wb-improvements

Conversation

@vaibhavhd
Copy link
Copy Markdown
Contributor

Description of PR

Summary: Verify coredump presence between continuous reboots
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

Verify if new coredumps are created between reboot iterations

What is the motivation for this PR?

How did you do it?

Added a check for any files inside /var/core dir. Fail the test if files count > 0.

How did you verify/test it?

Tested on T0 testbed. Test passes when no coredumps generated, and fails when core files are generated inside /var/core.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@vaibhavhd vaibhavhd requested a review from a team October 13, 2020 02:23
# Wait until uptime reaches allowed value
self.wait_until_uptime()
# Perform additional post-reboot health-check
self.verify_no_coredumps()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to handle the case when there are already core files before test running?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a check now to test pre-existing coredumps. I considered deleting (clean-up) the cores before starting the test. But this can prove harmful for core analysis and debugging the test which had created these cores. So, the test would just fail if there are preexisting cores.

Copy link
Copy Markdown
Collaborator

@bingwang-ms bingwang-ms Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea that fail the test immediately if core files exist, since there may be other program crashed (like python).
I have two possible ideas to WA

  1. Count core files before and after test run;
  2. Filter core files according to creating time.

BTW, we should be ware that there are both UPLOADED_xxx.core.gz and core dump files under /var/core at a certain point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern. But, should we at all go ahead with this test if a core is found preexisting within the device?
My concern - the test supports installing images at the beginning and between reboots. If a new installation is to be done (current image != new image), the cores will be deleted, and this may cause of loss of information for debug.
Should we make a backup of these core files before starting the test?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the test should run even there have been core files. After all, we are attempting to verify continuous_reboot, and shouldn't be affected by pre-running cases.
And I don't think it's necessary to backup core files before upgrading for now. Two reasons, 1. the core files are probably uploaded 2. Backup and restore core files means a lot of code and logic, which is not worth doing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no harm with proceeding with the test if the core files are uploaded.
I was only thinking about the cores which are not uploaded, and get deleted due to this test installing a new image. Losing cores may lead to lost RCA for the test which originally caused these cores. You are right, this test by itself isn't affected by pre-running test cases/cores, and can go ahead without an issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I start this test locally, I just manually clean the /var/core/ if there are any core files before the test. So local version of the testcase isn't a problem anyway.
The problem is when this test runs on nightly, along with other tests. But, there we do not install a new image on the DUT. So, the cores will not be deleted anyway.
I will let the test continue to run even if core files are present, and filter the new cores (if present) to decide to fail the test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bingwang-ms for the suggestions. I have made the new changes now.

@vaibhavhd vaibhavhd requested a review from bingwang-ms October 14, 2020 07:12
@vaibhavhd vaibhavhd merged commit ad43809 into sonic-net:master Oct 14, 2020
@vaibhavhd vaibhavhd deleted the cont-wb-improvements branch October 14, 2020 07:23
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Commits:

bee3684 - 2022-06-20 : Add BGP profile to Vnet routes (sonic-net#2339) [Prince Sunny]
f9af510 - 2022-06-16 : [intfmgr]: Set proxy_arp kernel param (sonic-net#2334) [Lawrence Lee]
725071f - 2022-06-08 : Fix test_warm_reboot issues blocking PR merge (sonic-net#2309) [Vaibhav Hemant Dixit]
0db6f15 - 2021-11-16 : [orchagent] Flush pipeline every 1 second, not only when select will timeout (sonic-net#2003) [Kamil Cudnik]
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
swss:
* a3bfd96 2022-06-18 | Enhance mock test for dynamic buffer manager for port removing and qos reload flows (sonic-net#2262) (HEAD -> 202205, github/202205) [Stephen Sun]
* b17d6c0 2022-05-28 | Support mock_test infra for dynamic buffer manager and fix issues found during mock test (sonic-net#2234) [Stephen Sun]
* 3fb23a1 2022-06-16 | [aclorch] Fix and simplify DTel watchlist tables and entries (sonic-net#2155) [Mickey Spiegel]
* 9ace643 2022-06-16 | [intfmgr]: Set proxy_arp kernel param (sonic-net#2334) [Lawrence Lee]
* 013609a 2022-06-14 | [crmorch] Prevent exceededLogCounter from resetting when low and high values are equal (sonic-net#2327) [Alexander Allen]
* 83a1306 2022-06-13 | Fix key generation in removeDecapTunnel (sonic-net#2322) [Myron Sosyak]
* 3d018ad 2022-06-15 | Apply `DSCP_TO_TC_MAP` from `PORT_QOS_MAP|global` to switch level (sonic-net#2314) [bingwang-ms]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants