perf: log debug only if it is enabled to avoid heap allocations#2989
perf: log debug only if it is enabled to avoid heap allocations#2989
Conversation
|
Hi @ValarDragon, any updates on profiling these changes ? Please let us know when you have some results so we can decide if we will continue with this PR or not (currently it's on draft mode until we get some feedback on these changes). Thanks again ! |
|
Yes! Been mildly delayed due to this breaking an api in the sdk, and our tooling messed up on both sdk and comet patches at once. Going to try again today |
Amazing, please keep us posted ! 👍 |
|
I just did this on some known hot loop locations I saw due to merge conflicts, and saw a very significant improvement: Savings were over 20GB of RAM. We see that |
cason
left a comment
There was a problem hiding this comment.
This idea is great, but we don't need to test the log level for every produced log message. I left some comments where I think there is no gain at all in using this approach (actually, we check the log level twice).
| if vote := ps.PickVoteToSend(rs.LastCommit); vote != nil { | ||
| logger.Debug("Picked rs.LastCommit to send") | ||
| if logger.DebugOn() { | ||
| logger.Debug("Picked rs.LastCommit to send") |
There was a problem hiding this comment.
So, this kind of debug log does not constitute a problem. The string is a constant, included in the binary (whether debug is enabled or not). And the cost of checking the level can be paid inside the Debug() call.
| "round", prs.ProposalPOLRound) | ||
| if logger.DebugOn() { | ||
| logger.Debug("Picked rs.Prevotes(prs.ProposalPOLRound) to send", | ||
| "round", prs.ProposalPOLRound) |
There was a problem hiding this comment.
In this case, the only non constant field passed to the logger is an integer. Not sure whether this code around is need also in this case.
| "index", index) | ||
|
|
||
| if ps.logger.DebugOn() { | ||
| ps.logger.Debug("setHasProposalBlockPart", |
There was a problem hiding this comment.
Are you sure that the lazy print does not solve the problem here?
There was a problem hiding this comment.
Yes, the lazy print is adding new objects to heap
| if logger.DebugOn() { | ||
| logger.Debug("propose step; not proposing since node is not in the validator set", | ||
| "addr", addr, | ||
| "vals", cs.Validators) |
There was a problem hiding this comment.
This is a case that is worth considering, as cs.Validators is a complex struct with a string method etc. etc.
| "timestamp_difference", cs.ProposalReceiveTime.Sub(cs.Proposal.Timestamp)) | ||
| if logger.DebugOn() { | ||
| logger.Debug("prevote step: Proposal is timely", | ||
| "timestamp", cs.Proposal.Timestamp.Format(time.RFC3339Nano), |
There was a problem hiding this comment.
This is another case in which makes sense, as we expend time formatting timestamps.
| if logger.DebugOn() { | ||
| logger.Debug( | ||
| "entering prevote wait step with invalid args", | ||
| "current", log.NewLazySprintf("%v/%v/%v", cs.Height, cs.Round, cs.Step), |
There was a problem hiding this comment.
We don't need the log.NewLazySprintf in this case, right? The idea of this feature was to avoid paying the cost of printing when we are not printing because the logger level is not the desired one (debug).
|
That's amazing @ValarDragon, if it save 20GB of RAM 👀, please let us know if you re-profile I've added the code in all debugs. I've seen @cason comments that we might not need in all the places and I've seen in your commit the hot spot ones. If you confirm these are the ones causing issues I can revert some of the logic for Thanks again! |
|
Yeah I think just doing it on the hot spot ones sounds great to me! |
agree 👍 |
|
Hi @ValarDragon, just following up here. Are these debug logs something you had a chance to validate in performance gains? Should we proceed with these changes? Please let us know. Thanks ! |
|
Ah sorry missed this ping (please feel free to ping on slack if I miss something in the future) Definitely noticing the cost of logging in:
Can we please proceed with either removing those, or getting those costs not always paid? |
|
Thanks for the follow-up @ValarDragon. I'll try to constrain this PR only on those places that are hot spots that you mentioned above and then mark it ready for review. |
|
I've removed several more that are bottlenecks in the p2p stack. Forgot to post here, will reply with more as I remember / re-see them. (Maybe when we do our SDK v50 upgrade, since I'll have to move all the commits vo v0.38.x line) ch *Channel) recvPacketMsg is a big one to remove |
) Delete an expensive debug log that has already been slated for deletion. On the osmosis fork (with many other debug logs removed), this debug log ended up taking 1.5% of all CPU time when we had many large blocks. (All from outbound gossip curiously enough) This caused 12GB of total memory allocation over an hour in that benchmark (out of 180GB total. With other optimizations in PR's / merged since that benchmark, would have been 12GB / ~120GB total) I am just making a PR to delete it, since it already had a code comment indicating we should delete it later. (Rather than waiting for #2989 Note that under normal benchmarks (e.g. the last one I did), this is fairly low impact. (68MB over an hour, out of 69 GB, so .1%) --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - didn't add one, since its just deleting a debug log - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
) Delete an expensive debug log that has already been slated for deletion. On the osmosis fork (with many other debug logs removed), this debug log ended up taking 1.5% of all CPU time when we had many large blocks. (All from outbound gossip curiously enough) This caused 12GB of total memory allocation over an hour in that benchmark (out of 180GB total. With other optimizations in PR's / merged since that benchmark, would have been 12GB / ~120GB total) I am just making a PR to delete it, since it already had a code comment indicating we should delete it later. (Rather than waiting for #2989 Note that under normal benchmarks (e.g. the last one I did), this is fairly low impact. (68MB over an hour, out of 69 GB, so .1%) --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - didn't add one, since its just deleting a debug log - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec (cherry picked from commit 1964898)
…ckport #3412) (#3488) Delete an expensive debug log that has already been slated for deletion. On the osmosis fork (with many other debug logs removed), this debug log ended up taking 1.5% of all CPU time when we had many large blocks. (All from outbound gossip curiously enough) This caused 12GB of total memory allocation over an hour in that benchmark (out of 180GB total. With other optimizations in PR's / merged since that benchmark, would have been 12GB / ~120GB total) I am just making a PR to delete it, since it already had a code comment indicating we should delete it later. (Rather than waiting for #2989 Note that under normal benchmarks (e.g. the last one I did), this is fairly low impact. (68MB over an hour, out of 69 GB, so .1%) --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - didn't add one, since its just deleting a debug log - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3412 done by [Mergify](https://mergify.com). Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
|
I'll close this PR. The implementation here is not the best way to solve the problem in #2847. I'm working on a new PR that has a better solution to the problem. |
) Delete an expensive debug log that has already been slated for deletion. On the osmosis fork (with many other debug logs removed), this debug log ended up taking 1.5% of all CPU time when we had many large blocks. (All from outbound gossip curiously enough) This caused 12GB of total memory allocation over an hour in that benchmark (out of 180GB total. With other optimizations in PR's / merged since that benchmark, would have been 12GB / ~120GB total) I am just making a PR to delete it, since it already had a code comment indicating we should delete it later. (Rather than waiting for #2989 Note that under normal benchmarks (e.g. the last one I did), this is fairly low impact. (68MB over an hour, out of 69 GB, so .1%) --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - didn't add one, since its just deleting a debug log - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
) Delete an expensive debug log that has already been slated for deletion. On the osmosis fork (with many other debug logs removed), this debug log ended up taking 1.5% of all CPU time when we had many large blocks. (All from outbound gossip curiously enough) This caused 12GB of total memory allocation over an hour in that benchmark (out of 180GB total. With other optimizations in PR's / merged since that benchmark, would have been 12GB / ~120GB total) I am just making a PR to delete it, since it already had a code comment indicating we should delete it later. (Rather than waiting for #2989 Note that under normal benchmarks (e.g. the last one I did), this is fairly low impact. (68MB over an hour, out of 69 GB, so .1%) --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - didn't add one, since its just deleting a debug log - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
close: #2847
NOTE: Please don't merge this yet. This is an experimental WIP so @ValarDragon can do some testing for performance profiling. Depending on the outcome of these changes we then can decide to merge it or not. But it would still need a bit of work in the logging library.
PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments