-
Notifications
You must be signed in to change notification settings - Fork 182
fix: some metrics get duplicated after GC #6081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe start_services function in src/daemon/mod.rs now begins by resetting the default Prometheus metrics registry to Default::default() via crate::metrics::default_registry() before initializing services. No public APIs or other logic were changed. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Daemon as Daemon::start_services
participant Metrics as Metrics Registry
participant Services as Service Initializer
Caller->>Daemon: start_services()
Daemon->>Metrics: reset to Default::default()
note right of Metrics: Clear previous metrics state
Daemon->>Services: initialize services
Services-->>Daemon: services started
Daemon-->>Caller: return
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/daemon/mod.rs (1)
536-538: Minor: prefer mem::take for clearer intent.Slightly more idiomatic and avoids constructing a temporary explicitly.
- // Cleanup the default prometheus metrics registry - *crate::metrics::default_registry() = Default::default(); + // Cleanup the default prometheus metrics registry + let _ = std::mem::take(&mut *crate::metrics::default_registry());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/daemon/mod.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/daemon/mod.rs (1)
src/metrics/mod.rs (1)
default_registry(26-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
🔇 Additional comments (1)
src/daemon/mod.rs (1)
536-538: Good fix: reset registry before (re)registering metrics.This prevents duplicate collector registration across service restarts triggered by GC. Placement is early enough to be contention-free and happens before any services spawn.
src/daemon/mod.rs
Outdated
| // Cleanup the default prometheus metrics registry | ||
| *crate::metrics::default_registry() = Default::default(); | ||
| let mut services = JoinSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify no early/static registrations are lost.
If any metrics are registered in global/lazy statics before start_services runs, they’ll be dropped by this reset and never re-registered. Please confirm none exist (or re-register them after this reset).
🏁 Script executed:
#!/bin/bash
# Find all places that touch the default registry and potential static/lazy initializers.
rg -nP --type rust -C3 'default_registry\s*\('
rg -nP --type rust -n 'lazy_static!|OnceCell|OnceLock|static\s+ref'Length of output: 10697
Do not reset the global/default Prometheus registry — it will drop statically-registered metrics.
Search found many modules that register into crate::metrics::default_registry() (examples: src/metrics/mod.rs, src/chain_sync/metrics.rs, src/libp2p/metrics.rs, src/libp2p/service.rs, src/libp2p/behaviour.rs, src/message_pool/msgpool/metrics.rs, src/utils/cache/lru.rs, plus a collector registered earlier in src/daemon/mod.rs). The reset at src/daemon/mod.rs:536–537 will remove those registrations. Remove or move the reset, or re-register/access those statics after the reset (e.g., call their init/touch functions) so metrics are not lost.
🤖 Prompt for AI Agents
In src/daemon/mod.rs around lines 536–538, the code resets the global Prometheus
registry which will drop statically-registered metrics; remove the global reset
(*crate::metrics::default_registry() = Default::default();) or move it only to a
place where you will immediately re-register all static collectors. Fix by
either deleting those two lines so existing metrics remain registered, or if a
reset is required, call the init/touch functions for every module that registers
metrics (e.g., metrics::init or the specific init functions in
src/metrics/mod.rs, src/chain_sync/metrics.rs, src/libp2p/*,
src/message_pool/msgpool/metrics.rs, src/utils/cache/lru.rs and the earlier
collector registration) immediately after the reset to restore static
registrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanabi1224 is this comment valid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed by using a dedicated metrics registry for collectors
|
@LesnyRumcajs CI now passed with retry |
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit