Verify generated coredump files between reboot cycles#2334
Verify generated coredump files between reboot cycles#2334vaibhavhd merged 3 commits intosonic-net:masterfrom
Conversation
| # Wait until uptime reaches allowed value | ||
| self.wait_until_uptime() | ||
| # Perform additional post-reboot health-check | ||
| self.verify_no_coredumps() |
There was a problem hiding this comment.
How to handle the case when there are already core files before test running?
There was a problem hiding this comment.
I have added a check now to test pre-existing coredumps. I considered deleting (clean-up) the cores before starting the test. But this can prove harmful for core analysis and debugging the test which had created these cores. So, the test would just fail if there are preexisting cores.
There was a problem hiding this comment.
I don't think it's a good idea that fail the test immediately if core files exist, since there may be other program crashed (like python).
I have two possible ideas to WA
- Count core files before and after test run;
- Filter core files according to creating time.
BTW, we should be ware that there are both UPLOADED_xxx.core.gz and core dump files under /var/core at a certain point.
There was a problem hiding this comment.
I understand your concern. But, should we at all go ahead with this test if a core is found preexisting within the device?
My concern - the test supports installing images at the beginning and between reboots. If a new installation is to be done (current image != new image), the cores will be deleted, and this may cause of loss of information for debug.
Should we make a backup of these core files before starting the test?
There was a problem hiding this comment.
IMO, the test should run even there have been core files. After all, we are attempting to verify continuous_reboot, and shouldn't be affected by pre-running cases.
And I don't think it's necessary to backup core files before upgrading for now. Two reasons, 1. the core files are probably uploaded 2. Backup and restore core files means a lot of code and logic, which is not worth doing.
There was a problem hiding this comment.
I agree that there is no harm with proceeding with the test if the core files are uploaded.
I was only thinking about the cores which are not uploaded, and get deleted due to this test installing a new image. Losing cores may lead to lost RCA for the test which originally caused these cores. You are right, this test by itself isn't affected by pre-running test cases/cores, and can go ahead without an issue.
There was a problem hiding this comment.
When I start this test locally, I just manually clean the /var/core/ if there are any core files before the test. So local version of the testcase isn't a problem anyway.
The problem is when this test runs on nightly, along with other tests. But, there we do not install a new image on the DUT. So, the cores will not be deleted anyway.
I will let the test continue to run even if core files are present, and filter the new cores (if present) to decide to fail the test.
There was a problem hiding this comment.
Thanks @bingwang-ms for the suggestions. I have made the new changes now.
Commits: bee3684 - 2022-06-20 : Add BGP profile to Vnet routes (sonic-net#2339) [Prince Sunny] f9af510 - 2022-06-16 : [intfmgr]: Set proxy_arp kernel param (sonic-net#2334) [Lawrence Lee] 725071f - 2022-06-08 : Fix test_warm_reboot issues blocking PR merge (sonic-net#2309) [Vaibhav Hemant Dixit] 0db6f15 - 2021-11-16 : [orchagent] Flush pipeline every 1 second, not only when select will timeout (sonic-net#2003) [Kamil Cudnik]
swss: * a3bfd96 2022-06-18 | Enhance mock test for dynamic buffer manager for port removing and qos reload flows (sonic-net#2262) (HEAD -> 202205, github/202205) [Stephen Sun] * b17d6c0 2022-05-28 | Support mock_test infra for dynamic buffer manager and fix issues found during mock test (sonic-net#2234) [Stephen Sun] * 3fb23a1 2022-06-16 | [aclorch] Fix and simplify DTel watchlist tables and entries (sonic-net#2155) [Mickey Spiegel] * 9ace643 2022-06-16 | [intfmgr]: Set proxy_arp kernel param (sonic-net#2334) [Lawrence Lee] * 013609a 2022-06-14 | [crmorch] Prevent exceededLogCounter from resetting when low and high values are equal (sonic-net#2327) [Alexander Allen] * 83a1306 2022-06-13 | Fix key generation in removeDecapTunnel (sonic-net#2322) [Myron Sosyak] * 3d018ad 2022-06-15 | Apply `DSCP_TO_TC_MAP` from `PORT_QOS_MAP|global` to switch level (sonic-net#2314) [bingwang-ms] Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Description of PR
Summary: Verify coredump presence between continuous reboots
Fixes # (issue)
Type of change
Approach
Verify if new coredumps are created between reboot iterations
What is the motivation for this PR?
How did you do it?
Added a check for any files inside
/var/coredir. Fail the test if filescount > 0.How did you verify/test it?
Tested on T0 testbed. Test passes when no coredumps generated, and fails when core files are generated inside
/var/core.Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation