Add basic cause analysis for delayed head blocks#3232
Add basic cause analysis for delayed head blocks#3232michaelsproul wants to merge 2 commits intosigp:unstablefrom
Conversation
| // Observe the delay between when we observed the block and when we imported it. | ||
| let block_delays = self.block_times_cache.read().get_block_delays( | ||
| block_root, | ||
| self.slot_clock | ||
| .start_of(current_slot) | ||
| .unwrap_or_else(|| Duration::from_secs(0)), | ||
| ); | ||
|
|
||
| metrics::observe_duration( | ||
| &metrics::BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME, | ||
| block_delays | ||
| .imported | ||
| .unwrap_or_else(|| Duration::from_secs(0)), | ||
| ); |
There was a problem hiding this comment.
I moved this section of code in with the other metrics because I was getting strange outliers for import delay that I thought might have been because of the difference in observation point. However I think this code is actually fine as is, and the outliers must have been because of blocks off the canonical chain.
I'll do some more measurements before we merge.
| // log a debug warning and increment a metric. | ||
| if late_head { | ||
| metrics::inc_counter(&metrics::BEACON_BLOCK_HEAD_SLOT_START_DELAY_EXCEEDED_TOTAL); | ||
| let missed_attestation_deadline = attestable_delay >= attestation_deadline; |
There was a problem hiding this comment.
Since this if statement is dependent on attestable_delay, and attestable_delay is only written to during the early_attester code path could we have cases where the block is not added to the early_attester cache while still being classed as late?
I'm not well versed with the inner workings of the early_attester cache, but I assume almost all head blocks run through it?
I'm envisioning a scenario where a block is very late, misses the early attester cache, therefore doesn't get the metrics below, or the debug log, but is still cached in block_delays, gets recorded in the above metrics and is also sent out via SSE.
|
I'm going to close this for now, I don't think it's as useful as I thought, and ironing out the kinks looks like it will take some time |
Proposed Changes
Add a
reasonfield to theDelayed head blocklog to better help us categorise avoidable vs unavoidable attestation misses. The reason can be one of:There are also 4 new metrics to count blocks in each of the above categories.
I also modified the block delay timer to be aware of the early attestation cache, so that it only considers a block late if it fails to make it to the cache before the deadline.