Skip to content

perf: log debug only if it is enabled to avoid heap allocations#2989

Closed
andynog wants to merge 8 commits intomainfrom
andy/2847-log-debug-only-if-on
Closed

perf: log debug only if it is enabled to avoid heap allocations#2989
andynog wants to merge 8 commits intomainfrom
andy/2847-log-debug-only-if-on

Conversation

@andynog
Copy link
Collaborator

@andynog andynog commented May 2, 2024

close: #2847

NOTE: Please don't merge this yet. This is an experimental WIP so @ValarDragon can do some testing for performance profiling. Depending on the outcome of these changes we then can decide to merge it or not. But it would still need a bit of work in the logging library.

PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@andynog andynog added enhancement New feature or request logs Anything relating to logging labels May 2, 2024
@andynog andynog self-assigned this May 2, 2024
@andynog andynog requested a review from a team as a code owner May 2, 2024 20:04
@andynog andynog requested a review from a team May 2, 2024 20:04
@andynog andynog added the wip Work in progress label May 2, 2024
@andynog andynog marked this pull request as draft May 2, 2024 20:06
@andynog
Copy link
Collaborator Author

andynog commented May 6, 2024

Hi @ValarDragon, any updates on profiling these changes ?

Please let us know when you have some results so we can decide if we will continue with this PR or not (currently it's on draft mode until we get some feedback on these changes). Thanks again !

@ValarDragon
Copy link
Contributor

ValarDragon commented May 6, 2024

Yes! Been mildly delayed due to this breaking an api in the sdk, and our tooling messed up on both sdk and comet patches at once. Going to try again today

@andynog
Copy link
Collaborator Author

andynog commented May 6, 2024

Yes! Been mildly delayed due to this breaking an api in the sdk, and our tooling messed up on both sdk and comet patches at once. Going to try again today

Amazing, please keep us posted ! 👍

@ValarDragon
Copy link
Contributor

I just did this on some known hot loop locations I saw due to merge conflicts, and saw a very significant improvement:
osmosis-labs@30a125b

Savings were over 20GB of RAM. We see that setHasVote's heap and time went away entirely, same for StringShort. I would need to reprofile to get something more apples to apples to understand improvements outside of these two functions. Would that be helpful? Anyways going to re-profile to see where performance with all merged improvements jointly together is.

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea is great, but we don't need to test the log level for every produced log message. I left some comments where I think there is no gain at all in using this approach (actually, we check the log level twice).

if vote := ps.PickVoteToSend(rs.LastCommit); vote != nil {
logger.Debug("Picked rs.LastCommit to send")
if logger.DebugOn() {
logger.Debug("Picked rs.LastCommit to send")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this kind of debug log does not constitute a problem. The string is a constant, included in the binary (whether debug is enabled or not). And the cost of checking the level can be paid inside the Debug() call.

"round", prs.ProposalPOLRound)
if logger.DebugOn() {
logger.Debug("Picked rs.Prevotes(prs.ProposalPOLRound) to send",
"round", prs.ProposalPOLRound)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the only non constant field passed to the logger is an integer. Not sure whether this code around is need also in this case.

"index", index)

if ps.logger.DebugOn() {
ps.logger.Debug("setHasProposalBlockPart",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that the lazy print does not solve the problem here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the lazy print is adding new objects to heap

if logger.DebugOn() {
logger.Debug("propose step; not proposing since node is not in the validator set",
"addr", addr,
"vals", cs.Validators)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case that is worth considering, as cs.Validators is a complex struct with a string method etc. etc.

"timestamp_difference", cs.ProposalReceiveTime.Sub(cs.Proposal.Timestamp))
if logger.DebugOn() {
logger.Debug("prevote step: Proposal is timely",
"timestamp", cs.Proposal.Timestamp.Format(time.RFC3339Nano),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case in which makes sense, as we expend time formatting timestamps.

if logger.DebugOn() {
logger.Debug(
"entering prevote wait step with invalid args",
"current", log.NewLazySprintf("%v/%v/%v", cs.Height, cs.Round, cs.Step),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the log.NewLazySprintf in this case, right? The idea of this feature was to avoid paying the cost of printing when we are not printing because the logger level is not the desired one (debug).

@andynog
Copy link
Collaborator Author

andynog commented May 7, 2024

That's amazing @ValarDragon, if it save 20GB of RAM 👀, please let us know if you re-profile

I've added the code in all debugs. I've seen @cason comments that we might not need in all the places and I've seen in your commit the hot spot ones. If you confirm these are the ones causing issues I can revert some of the logic for DebugOn on places that wouldn't make much of a difference in performance.

Thanks again!

@ValarDragon
Copy link
Contributor

Yeah I think just doing it on the hot spot ones sounds great to me!

@melekes
Copy link
Collaborator

melekes commented May 14, 2024

Yeah I think just doing it on the hot spot ones sounds great to me!

agree 👍

@andynog
Copy link
Collaborator Author

andynog commented May 24, 2024

Hi @ValarDragon, just following up here. Are these debug logs something you had a chance to validate in performance gains? Should we proceed with these changes? Please let us know. Thanks !

@ValarDragon
Copy link
Contributor

ValarDragon commented May 30, 2024

Ah sorry missed this ping (please feel free to ping on slack if I miss something in the future)

Definitely noticing the cost of logging in:

  • peerstate.SetHasVote (which got worse when we lowered the gossip routines timeout!),
  • mconnection.Send
  • StringShort in AddVote

Can we please proceed with either removing those, or getting those costs not always paid?

@andynog
Copy link
Collaborator Author

andynog commented May 30, 2024

Thanks for the follow-up @ValarDragon. I'll try to constrain this PR only on those places that are hot spots that you mentioned above and then mark it ready for review.

@ValarDragon
Copy link
Contributor

I've removed several more that are bottlenecks in the p2p stack. Forgot to post here, will reply with more as I remember / re-see them. (Maybe when we do our SDK v50 upgrade, since I'll have to move all the commits vo v0.38.x line)

ch *Channel) recvPacketMsg is a big one to remove

github-merge-queue bot pushed a commit that referenced this pull request Jul 8, 2024
)

Delete an expensive debug log that has already been slated for deletion.
On the osmosis fork (with many other debug logs removed), this debug log
ended up taking 1.5% of all CPU time when we had many large blocks. (All
from outbound gossip curiously enough) This caused 12GB of total memory
allocation over an hour in that benchmark (out of 180GB total. With
other optimizations in PR's / merged since that benchmark, would have
been 12GB / ~120GB total)

I am just making a PR to delete it, since it already had a code comment
indicating we should delete it later. (Rather than waiting for #2989

Note that under normal benchmarks (e.g. the last one I did), this is
fairly low impact. (68MB over an hour, out of 69 GB, so .1%)

---

#### PR checklist

- [x] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog) - didn't add one, since its just deleting a debug log
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mergify bot pushed a commit that referenced this pull request Jul 10, 2024
)

Delete an expensive debug log that has already been slated for deletion.
On the osmosis fork (with many other debug logs removed), this debug log
ended up taking 1.5% of all CPU time when we had many large blocks. (All
from outbound gossip curiously enough) This caused 12GB of total memory
allocation over an hour in that benchmark (out of 180GB total. With
other optimizations in PR's / merged since that benchmark, would have
been 12GB / ~120GB total)

I am just making a PR to delete it, since it already had a code comment
indicating we should delete it later. (Rather than waiting for #2989

Note that under normal benchmarks (e.g. the last one I did), this is
fairly low impact. (68MB over an hour, out of 69 GB, so .1%)

---

#### PR checklist

- [x] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog) - didn't add one, since its just deleting a debug log
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 1964898)
melekes added a commit that referenced this pull request Jul 10, 2024
…ckport #3412) (#3488)

Delete an expensive debug log that has already been slated for deletion.
On the osmosis fork (with many other debug logs removed), this debug log
ended up taking 1.5% of all CPU time when we had many large blocks. (All
from outbound gossip curiously enough) This caused 12GB of total memory
allocation over an hour in that benchmark (out of 180GB total. With
other optimizations in PR's / merged since that benchmark, would have
been 12GB / ~120GB total)

I am just making a PR to delete it, since it already had a code comment
indicating we should delete it later. (Rather than waiting for #2989

Note that under normal benchmarks (e.g. the last one I did), this is
fairly low impact. (68MB over an hour, out of 69 GB, so .1%)

---

#### PR checklist

- [x] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog) - didn't add one, since its just deleting a debug log
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3412 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@andynog
Copy link
Collaborator Author

andynog commented Oct 29, 2024

I'll close this PR. The implementation here is not the best way to solve the problem in #2847. I'm working on a new PR that has a better solution to the problem.

@andynog andynog closed this Oct 29, 2024
@andynog andynog deleted the andy/2847-log-debug-only-if-on branch October 29, 2024 20:10
mattac21 pushed a commit that referenced this pull request Sep 9, 2025
)

Delete an expensive debug log that has already been slated for deletion.
On the osmosis fork (with many other debug logs removed), this debug log
ended up taking 1.5% of all CPU time when we had many large blocks. (All
from outbound gossip curiously enough) This caused 12GB of total memory
allocation over an hour in that benchmark (out of 180GB total. With
other optimizations in PR's / merged since that benchmark, would have
been 12GB / ~120GB total)

I am just making a PR to delete it, since it already had a code comment
indicating we should delete it later. (Rather than waiting for #2989

Note that under normal benchmarks (e.g. the last one I did), this is
fairly low impact. (68MB over an hour, out of 69 GB, so .1%)

---

#### PR checklist

- [x] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog) - didn't add one, since its just deleting a debug log
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mattac21 pushed a commit that referenced this pull request Sep 10, 2025
)

Delete an expensive debug log that has already been slated for deletion.
On the osmosis fork (with many other debug logs removed), this debug log
ended up taking 1.5% of all CPU time when we had many large blocks. (All
from outbound gossip curiously enough) This caused 12GB of total memory
allocation over an hour in that benchmark (out of 180GB total. With
other optimizations in PR's / merged since that benchmark, would have
been 12GB / ~120GB total)

I am just making a PR to delete it, since it already had a code comment
indicating we should delete it later. (Rather than waiting for #2989

Note that under normal benchmarks (e.g. the last one I did), this is
fairly low impact. (68MB over an hour, out of 69 GB, so .1%)

---

#### PR checklist

- [x] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog) - didn't add one, since its just deleting a debug log
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request logs Anything relating to logging wip Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't pay vote Debug logging cost in normal operation

4 participants