Skip to content

[LogAnalyzer]: Accumulate syslog files rather than force logrotate#590

Merged
sihuihan88 merged 1 commit intosonic-net:masterfrom
sihuihan88:dev/sihan/logrotate
Jun 6, 2018
Merged

[LogAnalyzer]: Accumulate syslog files rather than force logrotate#590
sihuihan88 merged 1 commit intosonic-net:masterfrom
sihuihan88:dev/sihan/logrotate

Conversation

@sihuihan88
Copy link
Copy Markdown
Contributor

@sihuihan88 sihuihan88 commented Apr 27, 2018

Signed-off-by: Sihui Han sihan@microsoft.com

Description of PR

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

How did you do it?
Problem: In current design, before loganalyzer analyzes syslog file, it will force logrotate. Since we have default logrotate cron job every 1min, it can result in following two issues:

  1. When default logrotate and force logrotate happen at the same time, due to the conflict, it will return a middle-stage result i.e. creating syslog.1.gz (or swss.1.gz/sairedis.1.gz), without actually rotating it. Afterwards, all logrotate attempt will fail.
  2. Loganalyzer place start_marker and end_marker in the syslog file so that it knows which part should be analyzed. Currently it only analyze the /var/log/syslog file and that is the reason why it forces logrotate before anything. However, when the default logrotate happens within this period, it could rotate start_marker or end_marker to other files, so that loganalyzer will fail as it could not find the markers.

Solution:
Instead of force logrotate and only analyze the latest one, we change the logic to following:

  1. Place start_marker before taking any action.
  2. Extract all syslogs after latest start_marker and store them to /tmp/syslog for future analyze.
  3. Analyze /tmp/syslog so that there is no need to force logrotate, and there will be no conflict with default logrotate cron job.

How did you verify/test it?
Tested on DUT.
Any platform specific information?
Supported testbed topology if it's a new test case?

Documentation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any references to this 60 seconds number?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 seconds is current logrotate cron job configuration

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be just for pfcwd, it should generic.

@sihuihan88 sihuihan88 force-pushed the dev/sihan/logrotate branch from 736c429 to e2371cd Compare April 27, 2018 21:06
@sihuihan88 sihuihan88 changed the title [pfcwd]: disable logrotate during the test [LogAnalyzer]: disable logrotate cron job during loganalyzer Apr 27, 2018
@sihuihan88 sihuihan88 changed the title [LogAnalyzer]: disable logrotate cron job during loganalyzer [LogAnalyzer]: disable logrotate cron job during the test Apr 27, 2018
@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Apr 30, 2018

also there is a catch here, since we disable log rotate, it might happen that we will get disk full duration the test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put into /tmp?

@sihuihan88
Copy link
Copy Markdown
Contributor Author

sihuihan88 commented Apr 30, 2018

That should not be an issue, as we only disable cron job when we want to force logrotate, and enable default cron job right after that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should also check to make sure no logrotate processes are running before proceeding. Even though you disable the cron job, there's a possibility the previous cron job is still running.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated the logic to wait 1min before proceed as cron checks configuration update every 1min. https://www.freebsd.org/cgi/man.cgi?cron

Copy link
Copy Markdown
Contributor

@jleveque jleveque May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cron will definitely pick up the config change before the next run, as I have scheduled logrotate to run every minute. Therefore, the next time cron wakes up, it will reload its config and will not kick off logrotate.

My concern is not when cron will pick up the configuration change, but rather that we need to wait for any already-running instances of logrotate from previous cron jobs to exit before starting another process.

Signed-off-by: Sihui Han <sihan@microsoft.com>
@sihuihan88 sihuihan88 force-pushed the dev/sihan/logrotate branch from 9cb5a55 to f1fd1cd Compare May 17, 2018 21:55
@sihuihan88 sihuihan88 changed the title [LogAnalyzer]: disable logrotate cron job during the test [LogAnalyzer]: Accumulate syslog files rather than force logrotate May 17, 2018
@sihuihan88 sihuihan88 merged commit 0454ca1 into sonic-net:master Jun 6, 2018
sihuihan88 added a commit that referenced this pull request Jun 11, 2018
sihuihan88 added a commit that referenced this pull request Jun 11, 2018
stepanblyschak pushed a commit to stepanblyschak/sonic-mgmt that referenced this pull request Jul 4, 2018
stepanblyschak pushed a commit to stepanblyschak/sonic-mgmt that referenced this pull request Jul 4, 2018
stepanblyschak pushed a commit to stepanblyschak/sonic-mgmt that referenced this pull request Jul 19, 2018
maggiemsft pushed a commit that referenced this pull request Jul 24, 2018
* Revert "Revert "[logrotate]: Accumulate syslogs rather than force logrotate (#590)" (#641)"

This reverts commit c225eb5.

* [loganalyze] Make sure logrotate will not run during logs accumulation

Signed-off-by: stepanb <stepanb@mellanox.com>

* Improve "extract_log" ansible module performance

In most scenarios of using this module it is reasonable to search
'start_string' in the most recent log file instead of searching
all appearence of 'start_string' in all logs in /var/log/syslog*.
This improves performance of tests which use loganalyze frequently
(e.g crm, pfc wd)

Signed-off-by: stepanb <stepanb@mellanox.com>
stepanblyschak pushed a commit to stepanblyschak/sonic-mgmt that referenced this pull request Jan 17, 2019
…rotate (sonic-net#590)" (sonic-net#641)"

This reverts commit c225eb5.

Signed-off-by: stepanb <stepanb@mellanox.com>

[logrotate] Make sure logrotate will not interfere with extract logs task for loganalyzer

Signed-off-by: stepanb <stepanb@mellanox.com>
auspham pushed a commit to auspham/sonic-mgmt that referenced this pull request Feb 3, 2026
…er (sonic-net#590)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary:
Fixes # (issue)

On t2 topo we observed the following failure:
`Failed: Not all routes flushed from nexthop 10.0.0.25 on asic 0 on cmp210-4`
in tests:
```
pc/test_po_update.py::test_po_update::test_po_update_io_no_loss
pc/test_po_voq.py::test_po_voq::test_voq_po_member_update
```

Increasing the timeout of the check_no_routes_from_nexthop call, resolves these issues.

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [ ] Test case improvement

### Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [x] 202405
- [ ] 202411

### Approach
#### What is the motivation for this PR?

#### How did you do it?

#### How did you verify/test it?

#### Any platform specific information?

#### Supported testbed topology if it's a new test case?

### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants