Skip to content

Add basic cause analysis for delayed head blocks#3232

Closed
michaelsproul wants to merge 2 commits intosigp:unstablefrom
michaelsproul:delayed-head-reason
Closed

Add basic cause analysis for delayed head blocks#3232
michaelsproul wants to merge 2 commits intosigp:unstablefrom
michaelsproul:delayed-head-reason

Conversation

@michaelsproul
Copy link
Member

Proposed Changes

Add a reason field to the Delayed head block log to better help us categorise avoidable vs unavoidable attestation misses. The reason can be one of:

  • late: if the block arrived on or after the attestation deadline (4s mainnet, 1.66s gnosis)
  • borderline: if the block arrived less than 50ms before the attestation deadline (3.95s mainnet, 1.61s gnosis)
  • slow: if the block arrived more than 50ms before the attestation deadline, but block processing pushed it over
  • other: if the block was enshrined as head after the deadline but arrived on time and was processed quickly (i.e. it became head during a later re-org)

There are also 4 new metrics to count blocks in each of the above categories.

I also modified the block delay timer to be aware of the early attestation cache, so that it only considers a block late if it fails to make it to the cache before the deadline.

@michaelsproul michaelsproul added the ready-for-review The code is ready for review label May 31, 2022
@michaelsproul michaelsproul requested a review from macladson May 31, 2022 01:22
Comment on lines -2943 to -2956
// Observe the delay between when we observed the block and when we imported it.
let block_delays = self.block_times_cache.read().get_block_delays(
block_root,
self.slot_clock
.start_of(current_slot)
.unwrap_or_else(|| Duration::from_secs(0)),
);

metrics::observe_duration(
&metrics::BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME,
block_delays
.imported
.unwrap_or_else(|| Duration::from_secs(0)),
);
Copy link
Member Author

@michaelsproul michaelsproul May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this section of code in with the other metrics because I was getting strange outliers for import delay that I thought might have been because of the difference in observation point. However I think this code is actually fine as is, and the outliers must have been because of blocks off the canonical chain.

I'll do some more measurements before we merge.

// log a debug warning and increment a metric.
if late_head {
metrics::inc_counter(&metrics::BEACON_BLOCK_HEAD_SLOT_START_DELAY_EXCEEDED_TOTAL);
let missed_attestation_deadline = attestable_delay >= attestation_deadline;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this if statement is dependent on attestable_delay, and attestable_delay is only written to during the early_attester code path could we have cases where the block is not added to the early_attester cache while still being classed as late?

I'm not well versed with the inner workings of the early_attester cache, but I assume almost all head blocks run through it?

I'm envisioning a scenario where a block is very late, misses the early attester cache, therefore doesn't get the metrics below, or the debug log, but is still cached in block_delays, gets recorded in the above metrics and is also sent out via SSE.

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels May 31, 2022
@michaelsproul
Copy link
Member Author

I'm going to close this for now, I don't think it's as useful as I thought, and ironing out the kinks looks like it will take some time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

work-in-progress PR is a work-in-progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants