Skip to content

feat(l1): add peer table timing metrics (Kademlia baseline)#6459

Merged
azteca1998 merged 2 commits into
mainfrom
feat/kademlia-baseline-metrics
Apr 13, 2026
Merged

feat(l1): add peer table timing metrics (Kademlia baseline)#6459
azteca1998 merged 2 commits into
mainfrom
feat/kademlia-baseline-metrics

Conversation

@azteca1998

Copy link
Copy Markdown
Contributor

Summary

Add Prometheus histogram metrics to measure peer table operation latencies on the current flat IndexMap implementation. This provides the baseline for comparing against the k-bucket implementation in #6458.

Metrics added (behind #[cfg(feature = "metrics")]):

  • ethrex_kademlia_insert_contact_duration_seconds — times contact insertion via IndexMap::entry in do_new_contacts
  • ethrex_kademlia_iter_contacts_duration_seconds — times full table scan in do_get_closest_nodes

Grafana panels added to P2P Packets dashboard:

  • insert_contact p50/p99 latency
  • iter_contacts full-scan p50/p99 latency
  • Operations rate (ops/s)

Workflow

  1. Merge this PR → deploy to mainnet → collect baseline data
  2. Merge feat(l1): reintroduce proper Kademlia k-bucket routing table #6458 (k-bucket implementation) → deploy → compare latencies on the same Grafana panels

Test plan

  • cargo clippy passes with and without metrics feature
  • All 21 p2p tests pass
  • Verify metrics appear in Prometheus endpoint with --metrics flag

Add Prometheus histograms to measure peer table operation latencies on
the current flat IndexMap implementation. These metrics provide the
baseline for comparing against the k-bucket implementation in #6458.

Metrics added:
- ethrex_kademlia_insert_contact_duration_seconds: times contact
  insertion via IndexMap::entry in do_new_contacts
- ethrex_kademlia_iter_contacts_duration_seconds: times full table
  scan in do_get_closest_nodes

Also adds three Grafana panels to the P2P Packets dashboard for
visualizing p50/p99 latencies and operation rates.
@github-actions github-actions Bot added the L1 Ethereum client label Apr 8, 2026
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Kimi Code Review

The PR looks good overall. The metrics are properly feature-gated, histogram buckets are appropriately sized for the expected latency ranges, and the Grafana dashboard JSON is syntactically correct.

Minor observation:

In crates/networking/p2p/peer_table.rs, consider moving the use ethrex_metrics::p2p::METRICS_P2P; import to the top of the file with a #[cfg(feature = "metrics")] attribute instead of declaring it inside the measurement blocks (lines 961-962 and 1013-1014). While the current approach prevents unused import warnings, module-level imports are more idiomatic and easier to maintain. For example:

#[cfg(feature = "metrics")]
use ethrex_metrics::p2p::METRICS_P2P;

Correctness check:

  • The timing scope for insert_contact correctly measures only the HashMap entry operation (line 999-1010), excluding the discarded contact check
  • The timing scope for iter_contacts correctly captures the full scan of self.contacts (lines 941-956)
  • Histogram buckets cover appropriate ranges: 1µs–10ms for insertions and 10µs–100ms for full scans

Security/Performance:

  • No unsafe code introduced
  • Histogram observations are lightweight; 8–9 buckets per metric is reasonable
  • Feature gating prevents metrics overhead in non-monitoring builds

The implementation is correct and ready to merge.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

Findings:

  1. The new insert_contact histogram is timing unrelated async work, not just peer-table insertion. In peer_table.rs, the timer starts before the HashMap::entry mutation, but it is only observed after METRICS.record_new_discovery().await at peer_table.rs and peer_table.rs. That awaited call takes a mutex and updates the rolling rate window in metrics.rs, so the panel will spike on lock contention / event-queue work instead of actual contact insertion cost. If the goal is insertion latency, stop the timer before the await, or scope it only around entry.insert(...) / add_protocol(...).

  2. The new dashboard/panel naming overstates what is actually measured. The “full-scan” metric only covers do_get_closest_nodes in peer_table.rs, but discv5 also does a full-table scan in peer_table.rs and that path is not instrumented. Likewise, “contact insertion” excludes other insertion paths such as peer_table.rs and peer_table.rs. On a discv5-heavy node, these panels can look healthy while the dominant path regresses. Either instrument the missing paths or rename the metrics/panels to match the exact code paths being measured.

No consensus, EVM, trie, or validation-risk issues stood out in this diff beyond the observability accuracy concerns above.

I could not run a full cargo check in this environment because rustup is blocked by the sandbox’s read-only temp path.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

Lines of code report

Total lines added: 53
Total lines removed: 0
Total lines changed: 53

Detailed view
+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/blockchain/metrics/p2p.rs    | 202   | +35  |
+--------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_table.rs | 986   | +18  |
+--------------------------------------------+-------+------+

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have everything I need to write the review.


Review: feat(l1): add peer table timing metrics (Kademlia baseline) (#6459)

The PR is clean and well-scoped. The metrics infrastructure is consistent with the existing pattern, the feature-flag gating is correct, and the Grafana panel IDs (30, 31, 32, 103) don't conflict with any existing panel IDs in the dashboard. A few concerns worth addressing before merging to mainnet for baseline collection:


Concern 1 — Async overhead contaminates insert_contact timing (correctness)

crates/networking/p2p/peer_table.rs, lines 999–1017

The measurement window around the entry() call includes the Vacant branch's METRICS.record_new_discovery().await:

#[cfg(feature = "metrics")]
let insert_start = std::time::Instant::now();

match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        METRICS.record_new_discovery().await;  // <-- inside the window
    }
    ...
}

#[cfg(feature = "metrics")]
METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());

record_new_discovery() acquires an async Mutex over new_contacts_events (metrics.rs:195). Under any lock contention, this inflates the observation by tens to hundreds of microseconds — far exceeding the nanosecond-scale cost of an IndexMap::entry() operation. This also produces a bimodal distribution: Occupied entries measure only the map operation, while Vacant entries include the full async lock overhead.

If the k-bucket PR (#6458) keeps record_new_discovery inside the same measurement boundary, the comparison is apples-to-apples but the individual values are inflated and noisy. If it wraps only the bucket operation, the comparison will be unfair.

Recommendation: Stop the timer before the await, and record the discovery metric afterward:

#[cfg(feature = "metrics")]
let insert_start = std::time::Instant::now();
let is_new_contact = match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        true
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        false
    }
};
#[cfg(feature = "metrics")]
METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());

if is_new_contact {
    METRICS.record_new_discovery().await;
}

Concern 2 — Lowest bucket too coarse for insert_contact (data quality)

crates/blockchain/metrics/p2p.rs, line 104

.buckets(vec![
    0.000_001, 0.000_005, ...  // starts at 1µs
])

An IndexMap::entry() on a small-to-medium table is a sub-microsecond operation on warm hardware. With the lowest bucket at 1µs, the vast majority of observations will fall in the le=0.000001 bin, making p50/p99 resolution essentially useless for this operation. The histogram will likely always report "≤ 1µs" for both quantiles, which isn't actionable for k-bucket comparison.

Consider adding 100ns and 500ns buckets at the lower end:

.buckets(vec![
    0.000_000_1, 0.000_000_5, 0.000_001, 0.000_005,
    0.000_01, 0.000_05, 0.000_1, 0.000_5, 0.001, 0.01,
])

This concern is secondary and linked to Concern 1 — resolving the async overhead issue first would reveal the true operating range and inform the right bucket layout.


Minor: use inside #[cfg] blocks

peer_table.rs, lines 962 and 1015

#[cfg(feature = "metrics")]
{
    use ethrex_metrics::p2p::METRICS_P2P;
    METRICS_P2P.observe_...(...);
}

Works correctly, but the use at the top of each local block is unusual for this codebase. Not a blocker.


Everything else looks good

  • The #[cfg(feature = "metrics")] gates are placed correctly — the Histogram fields are compiled in whenever p2p.rs is compiled (i.e., under any(feature = "api", feature = "metrics")), and prometheus is available in both cases.
  • gather_metrics() registers both histograms correctly alongside existing metrics.
  • The iter_contacts timing in do_get_closest_nodes (lines 941–964) is clean — it wraps only the pure computation loop, with no async calls inside.
  • The iter_contacts bucket range (10µs–100ms) is appropriate for a full table scan.
  • Grafana PromQL expressions use $__rate_interval correctly and the instance filter regex is consistent with the rest of the dashboard.
  • The Grafana panel IDs (30, 31, 32, 103) are unique within the dashboard — no collisions.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@azteca1998 azteca1998 marked this pull request as ready for review April 8, 2026 18:02
@azteca1998 azteca1998 requested a review from a team as a code owner April 8, 2026 18:02
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 Apr 8, 2026
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Kimi Code Review

The PR adds Prometheus histogram metrics for Kademlia peer table operations and corresponding Grafana dashboard panels. Overall, the implementation is correct and follows established patterns in the codebase.

Minor suggestions:

  1. Code organization in peer_table.rs (lines 956-961 and 1009-1014): The #[cfg(feature = "metrics")] blocks for observing metrics duplicate the import statement use ethrex_metrics::p2p::METRICS_P2P;. Consider moving this import to the top of the file with a #[cfg(feature = "metrics")] attribute to avoid repetition, though the current approach is also acceptable for locality.

  2. Histogram bucket granularity: The buckets for kademlia_insert_contact_duration_seconds (1µs to 10ms) and kademlia_iter_contacts_duration_seconds (10µs to 100ms) are well-chosen for hash map operations. No changes needed.

  3. Grafana dashboard JSON (line 462): The comma placement after the previous panel's closing brace is correct JSON syntax for array continuation.

Verification notes:

  • The timing instrumentation correctly excludes early continue statements in batch_insert_nodes (measurement starts only when actual insertion work begins).
  • The do_get_closest_nodes timing captures only the contact iteration phase, excluding the final into_iter().collect() transformation, which is appropriate for a "full-scan" metric.
  • Metric names follow Prometheus conventions (*_duration_seconds suffix).
  • Panel IDs (30, 31, 32, 103) do not conflict with existing IDs in the dashboard.

Conclusion: Code is correct, secure, and idiomatic. No blocking issues.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@greptile-apps

greptile-apps Bot commented Apr 8, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds Prometheus histogram metrics to measure peer table operation latencies (insert_contact and iter_contacts) as a baseline for comparing against the upcoming k-bucket implementation.

  • The insert_contact timer in do_new_contacts also captures the METRICS.record_new_discovery().await call for new contacts, causing the Vacant and Occupied arms to measure different things — this will inflate p50/p99 for new-contact events and make the baseline less reliable for comparison with feat(l1): reintroduce proper Kademlia k-bucket routing table #6458.

Confidence Score: 3/5

Safe to merge functionally, but the insert_contact metric will measure skewed latencies for new contacts, undermining its value as a baseline.

The code is correct and won't cause crashes or regressions. However, the stated purpose of the PR is to establish a reliable baseline for comparing with the k-bucket implementation, and the timer boundary bug means the insert metric conflates IndexMap time with async notification overhead for new contacts, reducing comparability.

crates/networking/p2p/peer_table.rs — timer placement in do_new_contacts

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
crates/blockchain/metrics/p2p.rs Adds two Prometheus Histogram fields and their observe methods to MetricsP2P; buckets and registration look correct.
crates/networking/p2p/peer_table.rs Metrics instrumentation added, but the insert_contact timer encompasses the async record_new_discovery() call for new contacts, skewing the measurement.
metrics/provisioning/grafana/dashboards/common_dashboards/p2p_packets.json Three new Grafana panels added (insert p50/p99, iter p50/p99, ops rate); PromQL queries and panel IDs look correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant PeerTableServer
    participant IndexMap
    participant Prometheus

    Note over PeerTableServer: do_new_contacts()
    Caller->>PeerTableServer: new_contacts(nodes)
    loop for each node
        PeerTableServer->>PeerTableServer: insert_start = Instant::now()
        PeerTableServer->>IndexMap: contacts.entry(node_id)
        alt Vacant
            IndexMap-->>PeerTableServer: VacantEntry
            PeerTableServer->>IndexMap: insert(Contact)
            PeerTableServer->>PeerTableServer: record_new_discovery().await ⚠️ included in timer
        else Occupied
            IndexMap-->>PeerTableServer: OccupiedEntry
            PeerTableServer->>IndexMap: add_protocol()
        end
        PeerTableServer->>Prometheus: observe(insert_start.elapsed())
    end

    Note over PeerTableServer: do_get_closest_nodes()
    Caller->>PeerTableServer: get_closest_nodes(node_id)
    PeerTableServer->>PeerTableServer: scan_start = Instant::now()
    PeerTableServer->>IndexMap: iterate all contacts
    IndexMap-->>PeerTableServer: Vec<Node>
    PeerTableServer->>Prometheus: observe(scan_start.elapsed())
    PeerTableServer-->>Caller: closest nodes
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 999-1017

Comment:
**Timer includes async `record_new_discovery()` for new contacts**

`insert_start` is created before `self.contacts.entry(node_id)` but the observation happens after the entire `match` block — which, for the `Vacant` arm, contains `METRICS.record_new_discovery().await`. This means the measured duration for genuinely new contacts includes the async overhead of `record_new_discovery()`, not just the IndexMap entry operation, making the metric inconsistent across the two arms and potentially noisy compared to the Occupied arm baseline.

To measure only the IndexMap operation, move the observation inside the match arms before the async call:

```rust
match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
        METRICS.record_new_discovery().await;
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(l1): add peer table timing metrics ..." | Re-trigger Greptile

Comment on lines 999 to +1017
@@ -996,6 +1009,12 @@ impl PeerTableServer {
occupied_entry.get_mut().add_protocol(protocol);
}
}

#[cfg(feature = "metrics")]
{
use ethrex_metrics::p2p::METRICS_P2P;
METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Timer includes async record_new_discovery() for new contacts

insert_start is created before self.contacts.entry(node_id) but the observation happens after the entire match block — which, for the Vacant arm, contains METRICS.record_new_discovery().await. This means the measured duration for genuinely new contacts includes the async overhead of record_new_discovery(), not just the IndexMap entry operation, making the metric inconsistent across the two arms and potentially noisy compared to the Occupied arm baseline.

To measure only the IndexMap operation, move the observation inside the match arms before the async call:

match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
        METRICS.record_new_discovery().await;
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
    }
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 999-1017

Comment:
**Timer includes async `record_new_discovery()` for new contacts**

`insert_start` is created before `self.contacts.entry(node_id)` but the observation happens after the entire `match` block — which, for the `Vacant` arm, contains `METRICS.record_new_discovery().await`. This means the measured duration for genuinely new contacts includes the async overhead of `record_new_discovery()`, not just the IndexMap entry operation, making the metric inconsistent across the two arms and potentially noisy compared to the Occupied arm baseline.

To measure only the IndexMap operation, move the observation inside the match arms before the async call:

```rust
match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
        METRICS.record_new_discovery().await;
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        #[cfg(feature = "metrics")]
        {
            use ethrex_metrics::p2p::METRICS_P2P;
            METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        }
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

  1. The new insert_contact histogram is counting more than insertions. In peer_table.rs line 1002 through peer_table.rs line 1016, observe_insert_contact_duration() runs for both Entry::Vacant and Entry::Occupied. Because Prometheus histograms increment _count on every observe, the Grafana “Peer Table Operations Rate” panel in p2p_packets.json line 610 is effectively reporting insertions plus updates-to-existing-contacts, while the metric name and panel title say “insert_contact”. That makes the panel semantically wrong and can hide regressions in actual insert behavior. Either move the observation into the vacant branch only, or rename the metric/panel to an “upsert” or “contact update” metric.

  2. The same timing point also includes unrelated async metrics work on the vacant path. In peer_table.rs line 999 through peer_table.rs line 1016, the timer starts before self.contacts.entry(...), but for vacant entries it is only observed after METRICS.record_new_discovery().await. That await can block on the metrics mutex and scheduler, so the histogram is not isolating peer-table insertion latency; it is partially measuring contention in the metrics subsystem. If the goal is Kademlia table latency, record the histogram immediately around the table mutation and keep record_new_discovery().await outside the timed region.

No consensus, gas-accounting, trie, RLP, or transaction-validation concerns in this PR; it is limited to observability. I couldn’t run cargo check in this environment because the sandboxed Rustup setup is read-only, so this review is based on static inspection of the diff and surrounding code.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have full context. Let me write the review.


Review: feat(l1): add peer table timing metrics (Kademlia baseline)

Overall: Straightforward observability addition. The iter_contacts instrumentation is clean. The insert_contact instrumentation has a meaningful measurement correctness issue, and there are coverage gaps worth noting.


Bug: insert_contact timer includes async overhead (peer_table.rs ~1002–1015)

This is the main concern. In do_new_contacts, the timing window spans the entire match block including the METRICS.record_new_discovery().await call in the Vacant branch:

let insert_start = std::time::Instant::now();

match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        METRICS.record_new_discovery().await;   // ← inside timing window
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
    }
}

METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());

The result is a bimodal distribution: Occupied entries measure a pure map operation (~ns), while Vacant entries include async mutex contention from record_new_discovery. The metric name insert_contact_duration_seconds implies it's measuring the map operation, but it's actually measuring something much larger for new contacts — which is exactly the interesting case when comparing against k-buckets.

Fix: Stop the timer before the await:

match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        #[cfg(feature = "metrics")]
        METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
        METRICS.record_new_discovery().await;
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        #[cfg(feature = "metrics")]
        METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
    }
}

This also lets you drop the outer #[cfg(feature = "metrics")] block at the end.


Coverage gap: handle_insert_if_new is not instrumented (peer_table.rs ~743)

There is a separate insertion path at handle_insert_if_new (~line 743) that calls self.contacts.entry(msg.node.node_id()) directly and is not timed. Depending on the intent, this may be acceptable if this handler is rarely exercised, but it's worth noting that insert_contact observations will miss this path.


Coverage gap: do_get_nodes_at_distances is not timed (peer_table.rs ~973)

do_get_nodes_at_distances is also a full self.contacts.iter() scan (used for discv5 FINDNODE responses), equivalent in complexity to do_get_closest_nodes. It is not included in iter_contacts observations. If this PR's goal is a baseline for the k-bucket comparison, leaving this path unmeasured will make the discv5 path invisible in the dashboard. This may be intentional for the baseline scope, but worth documenting.


Minor: insert_contact lowest bucket may be too coarse

The lowest explicit bucket is 1µs (0.000_001). On modern hardware, a pure IndexMap::entry() lookup typically completes in 50–300ns. With the current buckets, many Occupied-path observations will land in the le=0.000001 bucket, meaning the p50 for occupied entries will always appear as ≤1µs with no finer resolution. Consider adding 0.0000002 (200ns) and 0.0000005 (500ns) buckets if sub-microsecond resolution is useful for the comparison. (Less critical if the async-overhead fix above is applied, since the distribution will shift upward for vacant entries.)


What's correct

  • iter_contacts timing in do_get_closest_nodes is correct: the timer wraps the loop with no async calls, so it measures exactly the scan duration.
  • Histogram buckets for iter_contacts are well-chosen for an O(n) scan (10µs–100ms range).
  • PromQL expressions (histogram_quantile with rate(..._bucket[$__rate_interval])) and the _count-based ops/s panel are correct.
  • Panel IDs (30, 31, 32, 103) and grid positions are non-conflicting with the existing dashboard.
  • Feature-gating with #[cfg(feature = "metrics")] is consistently applied.

Automated review by Claude (Anthropic) · sonnet · custom prompt

#[cfg(feature = "metrics")]
{
use ethrex_metrics::p2p::METRICS_P2P;
METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timing window here includes the METRICS.record_new_discovery().await call inside the Vacant arm (line 1005 on the PR branch). That async operation will dominate the measurement for new contacts, making the histogram reflect "insertion + async metrics recording" rather than the IndexMap::entry cost you're trying to baseline.

Consider moving insert_start to just before the match and observing just after, but wrapping only the match itself — or narrower: start the timer immediately before contacts.entry() and observe right after the match closes but before record_new_discovery. Alternatively, you could move the record_new_discovery call to after the metrics observation:

let new_contact = match self.contacts.entry(node_id) {
    Entry::Vacant(vacant_entry) => {
        vacant_entry.insert(Contact::new(node, protocol));
        true
    }
    Entry::Occupied(mut occupied_entry) => {
        occupied_entry.get_mut().add_protocol(protocol);
        false
    }
};

#[cfg(feature = "metrics")]
{
    use ethrex_metrics::p2p::METRICS_P2P;
    METRICS_P2P.observe_insert_contact_duration(insert_start.elapsed().as_secs_f64());
}

if new_contact {
    METRICS.record_new_discovery().await;
}

]
}
,
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The trailing comma before the new JSON block (}, on the previous context line followed by + ,) produces }, , which is unusual formatting. It works because the previous element already had its comma, but this leading-comma style is inconsistent with the rest of the file. Consider appending the comma to the preceding } instead (i.e., adding the new panels as a natural continuation of the array).

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in ethrex_l1 Apr 9, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to In Review in ethrex_l1 Apr 13, 2026
@azteca1998 azteca1998 added this pull request to the merge queue Apr 13, 2026
Merged via the queue into main with commit 5f41924 Apr 13, 2026
54 checks passed
@azteca1998 azteca1998 deleted the feat/kademlia-baseline-metrics branch April 13, 2026 15:05
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants