metrics: add metric for proposal timestamp difference#7550
Conversation
internal/consensus/metrics.go
Outdated
| "message and the local time when the message was received. " + | ||
| "Only calculated when a new is block proposed.", | ||
| Buckets: []float64{-15, -8, -5, -1, -.5, -.100, 0, .5, 1, 3, 15}, | ||
| Buckets: []float64{-15, -3, -1, -.5, 0, .100, 1, 5, 8, 15}, |
There was a problem hiding this comment.
No real objection, but is there some reason we are not symmetric about zero?
There was a problem hiding this comment.
One side is much more likely to occur than the other. We expect proposal message timestamps to be in the past much more frequently than the other way around.
There was a problem hiding this comment.
That makes sense, but I'm not sure how having the bucket sizes differ helps. That said, it's fine—the only reason I mention it is that changing the bucket distribution later screws up correlations to existing timeseries from previous runs.
There was a problem hiding this comment.
Yeah, this definitely wasn't decided using an exact science but to expand a little on my thinking: Proposal timestamps will only be ahead of your local time if your local clock is behind that of the proposer, likely because one of you is not well synced to NTP. If a message is observed 'in the future' of your clock, it seems likely to just be due to clock skew which should be pretty small, hence having a few tightly packed buckets. 15 seconds was added at the top of that range to help people differentiate between 'pretty skewed' and 'nowhere close at all'
There was a problem hiding this comment.
Ideally those numbers should be derived from the consensus parameters:
d = delay
p = precision = 2 * max_clock_skew
Then have the large buckets catch how many times and how badly the proposal was not timely, while the others show how accurate d and p are.
{.., -2dp, -dp, -.5dp, -2.dp , 0, .2p, .5p, p, 2p, ..}
where dp = d + p.
Ideally nothing should show outside (-dp, p)
This said, it doesn't seem straightforward to get the consensus parameters here and also they can change through governance and then @creachadair comment comes in.
But i think we are pretty sure that p should be in the order of tens of msec. While d is a magnitude higher.
For example p = 20msec and d = 1sec seem pretty conservative.
So the buckets right of 0 seem a bit large imho, maybe ..., 0, .01, .02, 0.05, 0.1}, with everything above 0.1 sec caught in inf.
There was a problem hiding this comment.
I consider this histogram in terms of what we want to learn from it. From this perspective, I can see five "regions" (intervals) of interest:
timestamp - receiveTime < -Precision: definitely untimely proposal-Precision < timestamp - receiveTime < 0: awkward timestamp, as we shouldn't see this case occurringtimestamp - receiveTime < msgDelay: the typical casemsgDelay < timestamp - receiveTime < msgDelay + Precision: unlikely case, which observation indicates thatmsgDelayhas possibly been underestimated.timestamp - msgDelay > msgDelay + Precision: untimely proposal
I'm not sure which of this regions should have more than one bucket, to retrieve additional data, neither how to put generic numbers to these intervals.
There was a problem hiding this comment.
In terms of the proposed buckets, I think that 4-5 buckets in the negative side are too much. I think that for this situation a "little" negative (say, 20ms like suggested) and more than "a little" negative (so, likely to be untimely) are enough.
For positive numbers, maybe a geometric distribution of buckets (say 0.1, 1, 10) would be ideal in a first moment. Then we should try to identify the "real" msgDelay, to somehow distinguish between something like 0.5, 1, 1.5, and 2s, assuming that most of the values will be in the 0.1-1 and 1-10 buckets.
I don't know, it is an open question also for me, I can elaborate on this better and comment here.
There was a problem hiding this comment.
My thinking with these values was to try to create some buckets that implied an action for node operators and networks.
In the case of precision, this roughly boils down to
- your ntp values are slightly out of alignment but to an acceptable degree -> no change
- your ntp servers are quite out of alignment -> consider changing servers or examining your ntp settings
- your values are so far apart as to imply one or both of you are not at all sync'd to ntp -> restart your ntp daemon/debug your ntp connection.
I think overly narrow buckets don't really help operators disambiguate between these cases. Perhaps reasonable buckets for this are .025, .5 and 10?
To @creachadair's point above, I'd prefer to not resize these in the future. We can just keep both ranges, for message delay buckets: .1, .5, 1 1.5, 2, 10.
However, I have also opened #7551 to help identify the 'real' message delay on networks where applicable.
There was a problem hiding this comment.
Negative values in this measurement will only occur if:
- The receiver's clock is ahead of the sender's clock by
s - And the message delay is
d < s
As previous commented, message delays are expected to be and order of magnitude higher than clock skews, so the situation above should be very uncommon...
There was a problem hiding this comment.
Flipped the subtraction. Now, the 'negative' side is values where the receive time is before the proposal time, which should be quite rare.
To clarify, the metrics that I'm proposing would be:
[-10, -.5, -.025, 0, .1, .5, 1, 1.5, 2, 10]. I've updated the pull request to reflect this.
These metrics are now
# HELP tendermint_consensus_proposal_timestamp_difference Difference in seconds between the timestamp in the proposal message and the local time when the message was received. Only calculated when a new block is proposed.
# TYPE tendermint_consensus_proposal_timestamp_difference histogram
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="-10"} 0
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="-0.5"} 0
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="-0.025"} 0
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="0"} 0
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="0.1"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="0.5"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="1"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="1.5"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="2"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="10"} 21
tendermint_consensus_proposal_timestamp_difference_bucket{chain_id="test-chain-BCFLOg",is_timely="true",le="+Inf"} 21
tendermint_consensus_proposal_timestamp_difference_sum{chain_id="test-chain-BCFLOg",is_timely="true"} 0.180253306
tendermint_consensus_proposal_timestamp_difference_count{chain_id="test-chain-BCFLOg",is_timely="true"} 21
tychoish
left a comment
There was a problem hiding this comment.
lgtm modulo what @creachadair said, which I wish I had spotted on my own.
a6e5a89 to
693bae7
Compare
how did you get these? looks like proposals are counted in multiple buckets (?). |
Yeah, this is the structure of prometheus histogram metrics. The general structure is that each bucket keeps track of how many observations were less than or equal to the bucket's value. I.e. |
9d120d6 to
0b509ff
Compare
cason
left a comment
There was a problem hiding this comment.
I believe the discussion regarding the histogram buckets is still open.
internal/consensus/metrics.go
Outdated
| "message and the local time when the message was received. " + | ||
| "Only calculated when a new is block proposed.", | ||
| Buckets: []float64{-15, -8, -5, -1, -.5, -.100, 0, .5, 1, 3, 15}, | ||
| Buckets: []float64{-15, -3, -1, -.5, 0, .100, 1, 5, 8, 15}, |
There was a problem hiding this comment.
I consider this histogram in terms of what we want to learn from it. From this perspective, I can see five "regions" (intervals) of interest:
timestamp - receiveTime < -Precision: definitely untimely proposal-Precision < timestamp - receiveTime < 0: awkward timestamp, as we shouldn't see this case occurringtimestamp - receiveTime < msgDelay: the typical casemsgDelay < timestamp - receiveTime < msgDelay + Precision: unlikely case, which observation indicates thatmsgDelayhas possibly been underestimated.timestamp - msgDelay > msgDelay + Precision: untimely proposal
I'm not sure which of this regions should have more than one bucket, to retrieve additional data, neither how to put generic numbers to these intervals.
internal/consensus/metrics.go
Outdated
| "message and the local time when the message was received. " + | ||
| "Only calculated when a new is block proposed.", | ||
| Buckets: []float64{-15, -8, -5, -1, -.5, -.100, 0, .5, 1, 3, 15}, | ||
| Buckets: []float64{-15, -3, -1, -.5, 0, .100, 1, 5, 8, 15}, |
There was a problem hiding this comment.
In terms of the proposed buckets, I think that 4-5 buckets in the negative side are too much. I think that for this situation a "little" negative (say, 20ms like suggested) and more than "a little" negative (so, likely to be untimely) are enough.
For positive numbers, maybe a geometric distribution of buckets (say 0.1, 1, 10) would be ideal in a first moment. Then we should try to identify the "real" msgDelay, to somehow distinguish between something like 0.5, 1, 1.5, and 2s, assuming that most of the values will be in the 0.1-1 and 1-10 buckets.
I don't know, it is an open question also for me, I can elaborate on this better and comment here.
0b509ff to
d63ee70
Compare
|
What we want to capture here is the distribution of a value that is the sum of two values: the clock skew and the delay of the proposal message. The clock skew can be represented as a normal distribution centered at zero and with The latency for a message is more complex, in particular when using gossip. But traditional message delay distributions are like this: https://www.researchgate.net/figure/Examples-of-latency-distributions-on-a-wired-network-Starting-at-the-upper-left-the_fig2_253841639 The difference is that, in our case, as the proposer receives its proposal, we should have a small (1/n) concentration of values very close to zero, but yet positive. So we can consider a such distribution that has also a small peak around zero. |
0d3711f to
f2e4b7d
Compare
f2e4b7d to
48a4e21
Compare
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
cason
left a comment
There was a problem hiding this comment.
Approving the addition of metrics to measure the "delay" (adjusted by clock skew) of proposals, while I am not convinced, but also don't have a strong opinion, regarding the choice of slots boundaries for the histogram.
Moving this over from the proposer-based timestamps branch #7550.
This change adds a histogram that tracks the difference between the proposal timestamp and the local time of the validator. This metric aims to help validators determine, in aggregate, how far away their local time is from the the times proposed by the rest of the network.
The metrics added take the following form:
Will merge into
wb/proposer-based-timestampsonce #7415 is merged.