Skip to content

perf(p2p)!: Remove PeerSendBytesTotal metric (backport #3184)#3491

Merged
melekes merged 1 commit intov1.xfrom
mergify/bp/v1.x/pr-3184
Jul 10, 2024
Merged

perf(p2p)!: Remove PeerSendBytesTotal metric (backport #3184)#3491
melekes merged 1 commit intov1.xfrom
mergify/bp/v1.x/pr-3184

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Jul 10, 2024

Component of #2840. As we discussed in the issue, we want to deprecate the peer send bytes total metric because:

  • This is a costly metric (see pprof)
  • This increases the cardinality of labels in Prometheus, which can slow things down in Prometheus significantly.
  • Its not really useful at this level of abstraction. We don't have reported usecases of this being used yet. If we want to do it in the future, we should more likely architect it as updating a live go variable, and then sending to our metrics instance on a polling ticker (ala PeerSendQueue metric). The right levels that are interesting for outbound traffic could be the consensus gossip routines or blocksync.

To see its costly, 21% of all CPU time for live osmosis nodes in the latest Osmosis patch release is in peer.Send. over half of that time is in these metrics:
image

Please advise for how we should go about deprecating / removing this metric? Feel free to close this PR / say you want to handle it separately.


PR checklist

  • Tests written/updated - N/A
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments - I didn't find any relevant docs
  • Title follows the Conventional Commits spec

This is an automatic backport of pull request #3184 done by [Mergify](https://mergify.com).

Component of #2840. As we discussed in the issue, we want to deprecate
the peer send bytes total metric because:
- This is a costly metric (see pprof)
- This increases the cardinality of labels in Prometheus, which can slow
things down in Prometheus significantly.
- Its not really useful at this level of abstraction. We don't have
reported usecases of this being used yet. If we want to do it in the
future, we should more likely architect it as updating a live go
variable, and then sending to our metrics instance on a polling ticker
(ala PeerSendQueue metric). The right levels that are interesting for
outbound traffic could be the consensus gossip routines or blocksync.

To see its costly, 21% of all CPU time for live osmosis nodes in the
latest Osmosis patch release is in peer.Send. over half of that time is
in these metrics:

![image](https://github.com/cometbft/cometbft/assets/6440154/efba6e22-da43-4b97-ad1b-600a13738b5f)

Please advise for how we should go about deprecating / removing this
metric? Feel free to close this PR / say you want to handle it
separately.

---

#### PR checklist

- [X] Tests written/updated - N/A
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I didn't find any relevant docs
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 80114b2)
@mergify mergify bot requested a review from a team as a code owner July 10, 2024 12:53
@mergify mergify bot requested a review from a team July 10, 2024 12:53
@melekes melekes merged commit d0e3aeb into v1.x Jul 10, 2024
@melekes melekes deleted the mergify/bp/v1.x/pr-3184 branch July 10, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants