[Merged by Bors] - Detailed validator monitoring#2151
[Merged by Bors] - Detailed validator monitoring#2151paulhauner wants to merge 37 commits intounstablefrom
Conversation
bb284d5 to
c2cb527
Compare
realbigsean
left a comment
There was a problem hiding this comment.
This is awesome! Can't wait till it's in 🚀
I left a bunch of comments, most of them are just typos. The only sort of major thing I would be concerned about is opening a read lock on ValidatorMonitor whenever we register something, and subsequently opening a write lock on the SummaryMap (which is held in the ValidatorMonitor struct) during this registration. I'm not exactly sure if/how it could deadlock but wanted to bring it up
| ); | ||
| } | ||
|
|
||
| fn u64_to_i64(n: impl Into<u64>) -> i64 { |
There was a problem hiding this comment.
This is a bit odd defined in a for loop, you could move it outside of the ValidatorMonitor impl
| metrics::set_int_gauge( | ||
| &metrics::VALIDATOR_MONITOR_SLASHED, | ||
| &[id], | ||
| if validator.slashed { 0 } else { 1 }, |
| .and_then(|slot_start| seen_timestamp.checked_sub(slot_start)) | ||
| .and_then(|gross_delay| { | ||
| let production_delay = slot_clock.slot_duration() / 3; | ||
| gross_delay.checked_sub(production_delay) |
There was a problem hiding this comment.
it might actually be interesting to track where this is negative (and others like it)
There was a problem hiding this comment.
I agree, I think I might leave it for later works though. I'm not exactly sure how Prom/Graf will handle negative a duration and I'm keen to get this out the door.
| .start_of(data.slot) | ||
| .and_then(|slot_start| seen_timestamp.checked_sub(slot_start)) | ||
| .and_then(|gross_delay| { | ||
| let production_delay = slot_clock.slot_duration() / 2; |
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: realbigsean <sean@sigmaprime.io>
|
All comments addressed! Thanks for the review, super helpful. I added a couple of extra metrics in 16d1f08, so I'll have to update the dashboard and replace the PR you made (sorry!). For your convenience, here's a diff including all my changes since your review (excluding merging in EDIT: I've merged sigp/lighthouse-metrics#20 and added the link to this PR. |
realbigsean
left a comment
There was a problem hiding this comment.
Nice! Everything looks good
|
bors r+ |
## Issue Addressed - Resolves #2064 ## Proposed Changes Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators. Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint. Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored. See the new docs included in this PR for more info. ## TODO - [x] Track validator balance, `slashed` status, etc. - [x] ~~Register slashings in current epoch, not offense epoch~~ - [ ] Publish Grafana dashboard, update TODO link in docs - [x] ~~#2130 is merged into this branch, resolve that~~
|
Pull request successfully merged into unstable. Build succeeded: |
Issue Addressed
Proposed Changes
Adds a
ValidatorMonitorstruct which provides additional logging and Grafana metrics for specific validators.Use
lighthouse bn --validator-monitorto automatically enable monitoring for any validator that hits the subnet subscription HTTP API endpoint.Also, use
lighthouse bn --validator-monitor-pubkeysto supply a list of validators which will always be monitored.See the new docs included in this PR for more info.
TODO
slashedstatus, etc.Register slashings in current epoch, not offense epoch[Merged by Bors] - Disallow attestation production earlier than head #2130 is merged into this branch, resolve that