[LogAnalyzer]: Accumulate syslog files rather than force logrotate#590
Conversation
ansible/roles/test/tasks/pfc_wd.yml
Outdated
There was a problem hiding this comment.
any references to this 60 seconds number?
There was a problem hiding this comment.
60 seconds is current logrotate cron job configuration
ansible/roles/test/tasks/pfc_wd.yml
Outdated
There was a problem hiding this comment.
this should not be just for pfcwd, it should generic.
736c429 to
e2371cd
Compare
|
also there is a catch here, since we disable log rotate, it might happen that we will get disk full duration the test. |
|
That should not be an issue, as we only disable cron job when we want to force logrotate, and enable default cron job right after that. |
There was a problem hiding this comment.
I feel like we should also check to make sure no logrotate processes are running before proceeding. Even though you disable the cron job, there's a possibility the previous cron job is still running.
There was a problem hiding this comment.
Good point. I updated the logic to wait 1min before proceed as cron checks configuration update every 1min. https://www.freebsd.org/cgi/man.cgi?cron
There was a problem hiding this comment.
Cron will definitely pick up the config change before the next run, as I have scheduled logrotate to run every minute. Therefore, the next time cron wakes up, it will reload its config and will not kick off logrotate.
My concern is not when cron will pick up the configuration change, but rather that we need to wait for any already-running instances of logrotate from previous cron jobs to exit before starting another process.
Signed-off-by: Sihui Han <sihan@microsoft.com>
9cb5a55 to
f1fd1cd
Compare
…onic-net#590)" This reverts commit 0454ca1.
…t#590) Signed-off-by: Sihui Han <sihan@microsoft.com>
…rotate (sonic-net#590)" (sonic-net#641)" This reverts commit c225eb5.
* Revert "Revert "[logrotate]: Accumulate syslogs rather than force logrotate (#590)" (#641)" This reverts commit c225eb5. * [loganalyze] Make sure logrotate will not run during logs accumulation Signed-off-by: stepanb <stepanb@mellanox.com> * Improve "extract_log" ansible module performance In most scenarios of using this module it is reasonable to search 'start_string' in the most recent log file instead of searching all appearence of 'start_string' in all logs in /var/log/syslog*. This improves performance of tests which use loganalyze frequently (e.g crm, pfc wd) Signed-off-by: stepanb <stepanb@mellanox.com>
…rotate (sonic-net#590)" (sonic-net#641)" This reverts commit c225eb5. Signed-off-by: stepanb <stepanb@mellanox.com> [logrotate] Make sure logrotate will not interfere with extract logs task for loganalyzer Signed-off-by: stepanb <stepanb@mellanox.com>
…er (sonic-net#590) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) On t2 topo we observed the following failure: `Failed: Not all routes flushed from nexthop 10.0.0.25 on asic 0 on cmp210-4` in tests: ``` pc/test_po_update.py::test_po_update::test_po_update_io_no_loss pc/test_po_voq.py::test_po_voq::test_voq_po_member_update ``` Increasing the timeout of the check_no_routes_from_nexthop call, resolves these issues. ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202012 - [ ] 202205 - [ ] 202305 - [ ] 202311 - [x] 202405 - [ ] 202411 ### Approach #### What is the motivation for this PR? #### How did you do it? #### How did you verify/test it? #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
Signed-off-by: Sihui Han sihan@microsoft.com
Description of PR
Fixes # (issue)
Type of change
Approach
How did you do it?
Problem: In current design, before loganalyzer analyzes syslog file, it will force logrotate. Since we have default logrotate cron job every 1min, it can result in following two issues:
Solution:
Instead of force logrotate and only analyze the latest one, we change the logic to following:
How did you verify/test it?
Tested on DUT.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation