[Merged by Bors] - collect bandwidth metrics per transport#4805
[Merged by Bors] - collect bandwidth metrics per transport#4805jxs wants to merge 5 commits intosigp:unstablefrom
Conversation
mxinden
left a comment
There was a problem hiding this comment.
A bit of a hack, but neat!
Long term, I would consider a proper solution in rust-libp2p the cleaner solution, especially in case you ever introduce a third transport.
That said, for now, ...
beacon_node/network/src/metrics.rs
Outdated
| pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> = | ||
| try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p"); | ||
|
|
||
| pub static ref INBOUND_LIBP2P_TCP_BYTES: Result<IntGauge> = | ||
| try_create_int_gauge("libp2p_inbound_tcp_bytes", "The inbound tcp bandwidth over libp2p"); |
There was a problem hiding this comment.
| pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> = | |
| try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p"); | |
| pub static ref INBOUND_LIBP2P_TCP_BYTES: Result<IntGauge> = | |
| try_create_int_gauge("libp2p_inbound_tcp_bytes", "The inbound tcp bandwidth over libp2p"); | |
| pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> = | |
| try_create_int_gauge("libp2p_inbound_bytes", "The inbound bandwidth over libp2p"); |
Best practice would be to use a metric family for all bandwidth, differentiated by a Prometheus label, e.g. transport="tcp|quic".
There was a problem hiding this comment.
yeah makes sense thanks, went ahead and also put the direction into the same metric
beacon_node/network/src/metrics.rs
Outdated
| * Bandwidth metrics | ||
| */ | ||
| pub static ref INBOUND_LIBP2P_QUIC_BYTES: Result<IntGauge> = | ||
| try_create_int_gauge("libp2p_inbound_quic_bytes", "The inbound quic bandwidth over libp2p"); |
There was a problem hiding this comment.
This metrics only ever goes up, right? Shouldn't it be a counter instead of a gauge?
There was a problem hiding this comment.
I think with counters we have to increase the counter, with gauges we can set the value. I think @jxs has to set the value based on the bandwidth at regular intervals. I'll let him comment tho.
Agree that a counter would be better if we can set it easily.
There was a problem hiding this comment.
You may use Counter::inner. Note that using a Gauge in a place where a Counter is correct might result in problems e.g. around Grafana's query suggestions.
There was a problem hiding this comment.
yup it was because of @AgeManning's arguments ehe, managed to switch to a Counter and reset and inc_by_value thanks Max, ptal
AgeManning
left a comment
There was a problem hiding this comment.
Nice, this is very helpful.
Agree with Max's comments
Co-authored-by: Jack McPherson <jmcph4.github@gmail.com>
mxinden
left a comment
There was a problem hiding this comment.
Cool. Again, long term I think a solution in rust-libp2p would be much cleaner. That said, this should provide us with the numbers we need to evaluate QUIC. 👍
AgeManning
left a comment
There was a problem hiding this comment.
Ok, looks good to me. Lets go
|
Just waiting for imminent deneb merge |
|
let's go! |
## Issue Addressed Following the conversation on libp2p/rust-libp2p#3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's [window size limitation](libp2p/rust-yamux#162) or if the bottleneck is at the gossipsub level. ## Proposed Changes introduce new quic and tcp bandwidth metric gauges. cc @mxinden (turned out to be easier, Thomas gave me a hint)
|
Pull request successfully merged into unstable. Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page.
|
Issue Addressed
Following the conversation on libp2p/rust-libp2p#3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's window size limitation or if the bottleneck is at the gossipsub level.
Proposed Changes
introduce new quic and tcp bandwidth metric gauges.
cc @mxinden (turned out to be easier, Thomas gave me a hint)