feat(l1): reintroduce proper Kademlia k-bucket routing table by azteca1998 · Pull Request #6458 · lambdaclass/ethrex

azteca1998 · 2026-04-08T12:44:23Z

Summary

Replace flat IndexMap<H256, Contact> with proper Kademlia k-bucket routing table (256 buckets, 16 contacts per bucket, 10 replacements per bucket)
Use raw XOR distance (H256 comparison) for closest-node queries instead of logarithmic distance
Store local_node_id in PeerTableServer and remove redundant parameter from new_contacts/new_contact_records protocol methods
Automatic replacement promotion when contacts are pruned from buckets

Already addressed by prior work:

GenServer actor pattern (already in place)
Single struct ownership, no Arc<Mutex<...>> per field (already in place)

Test plan

All 21 existing p2p unit tests pass
RPC crate compiles cleanly
Full cargo check passes with no warnings
Manual testing with live network to verify peer discovery behavior

Replace the flat IndexMap-based contact storage with a proper Kademlia routing table using 256 k-buckets (16 contacts per bucket) with replacement lists, as described in the Kademlia paper. Key changes: - Add KBucket struct with main contact list and replacement list - Store local_node_id in PeerTableServer for bucket computation - Use XOR distance (raw H256 comparison) for closest-node queries instead of logarithmic distance for accurate ordering - Promote replacements when contacts are pruned from buckets - Remove redundant local_node_id parameter from new_contacts and new_contact_records protocol methods Closes #4245

github-actions · 2026-04-08T12:47:00Z

Lines of code report

Total lines added: 258
Total lines removed: 7
Total lines changed: 265

Detailed view

+-----------------------------------------------+-------+------+
| File                                          | Lines | Diff |
+-----------------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs             | 667   | +1   |
+-----------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/initializers.rs          | 367   | +4   |
+-----------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/discv4/server.rs | 693   | -7   |
+-----------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/discv5/server.rs | 1518  | +10  |
+-----------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_table.rs    | 1229  | +243 |
+-----------------------------------------------+-------+------+

github-actions · 2026-04-08T12:47:05Z

🤖 Codex Code Review

Findings

High: replacement-cache entries are treated as “present” for dedup, but almost every read/update path ignores them, so saturated buckets will silently drop discovery state. contact_exists() checks both main and replacement entries (peer_table.rs:927), while get_contact(), get_contact_mut(), and iter_contacts() only traverse bucket.contacts (peer_table.rs:915, peer_table.rs:921, peer_table.rs:945). On top of that, insert_contact() discards the KBucket::insert() return value and always reports success (peer_table.rs:935). That means a full bucket can accept a node only into replacements, yet callers still behave as if it is active: insert_if_new() returns true (peer_table.rs:830), discv4::handle_ping() sends a validation ping (discv4/server.rs:510), and later handle_pong() / get_contact() cannot find that node (discv4/server.rs:542). The same bug hits ENR/session updates: new_contact_records() branches on contact_exists(), then fails to update anything if the node is only in replacements (peer_table.rs:1151). This breaks discovery behavior once any bucket fills up.
Medium: do_new_contacts() can accumulate duplicate replacement entries for the same node. It only checks get_contact_mut() before inserting (peer_table.rs:1129), so if a node is already in replacements but not in contacts, the code inserts it again. insert_replacement() then blindly push()es and evicts FIFO (peer_table.rs:105). Repeated Neighbors/NODES responses for the same peer will crowd out other replacement candidates and make promotion behavior nondeterministic.

I did not run cargo test; in this environment cargo/rustup tried to write under read-only ~/.cargo / ~/.rustup, so verification was blocked.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

github-actions · 2026-04-08T12:48:56Z

🤖 Claude Code Review

Here is the review of PR #6458:

Review: feat(l1): reintroduce proper Kademlia k-bucket routing table

Overall the approach is solid — the structural change from a flat IndexMap to 256 k-buckets is correct and the cleanup of redundant local_node_id parameters is a clear improvement. There are a few correctness issues worth addressing before merging.

Bug 1 — `do_new_contacts` may produce duplicate replacement-list entries

File: crates/networking/p2p/peer_table.rs

do_new_contacts uses get_contact_mut to detect whether a node already exists, but get_contact_mut only searches the main contact list, not the replacement list:

if let Some(contact) = self.get_contact_mut(&node_id) {
    contact.add_protocol(protocol);
} else {
    let contact = Contact::new(node, protocol);
    if self.insert_contact(node_id, contact) { ... }
}

When a node is already in the replacement list, get_contact_mut returns None, so the else branch runs. insert_contact → KBucket::insert finds the main list still full (that's why the node was in replacements), so it calls insert_replacement — adding a second copy of the same node to the replacement list.

do_new_contact_records correctly uses contact_exists (which checks both lists). The fix for do_new_contacts is to replace the get_contact_mut guard with contact_exists and only call add_protocol on the main-list node (or accept the skip for replacement-list nodes, as do_new_contact_records does):

if self.contact_exists(&node_id) {
    if let Some(contact) = self.get_contact_mut(&node_id) {
        contact.add_protocol(protocol);
    }
} else { ... }

Bug 2 — `insert_contact` discards `KBucket::insert`'s return value, over-counting metrics

File: crates/networking/p2p/peer_table.rs

KBucket::insert returns true only when the contact enters the main list, and false when it overflows to the replacement list. But insert_contact ignores that value:

fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool {
    let Some(idx) = self.bucket_for(&node_id) else { return false; };
    self.buckets[idx].insert(node_id, contact);  // return value discarded
    true  // always true if not local node
}

Every call site does if self.insert_contact(...) { METRICS.record_new_discovery().await; }, so the metric fires for contacts that only reached the replacement list and are not yet discoverable. If accurate metrics matter, propagate the inner return:

fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool {
    let Some(idx) = self.bucket_for(&node_id) else { return false; };
    self.buckets[idx].insert(node_id, contact)
}

The docstring currently says "Returns true if inserted (into main list or replacement list)" — which is technically consistent with the current implementation, but conflicts with the caller's intent of only counting genuine discoveries.

Correctness — ENR updates silently dropped for replacement-list nodes

File: crates/networking/p2p/peer_table.rs, do_new_contact_records

contact_exists returns true for replacement-list nodes, so the "update existing contact" branch is taken. But get_contact/get_contact_mut return None for those nodes, causing should_update to default to false and the mutable update block to be skipped entirely. A replacement-list contact whose ENR is refreshed before it gets promoted will therefore be promoted with stale ENR data.

A comment explaining this intentional skip would remove ambiguity. If updating replacement-list ENRs is desired, a separate get_replacement_mut accessor would be needed.

Correctness — `remove_and_promote` uses LIFO, not FIFO

File: crates/networking/p2p/peer_table.rs, KBucket::remove_and_promote

fn remove_and_promote(&mut self, node_id: &H256) -> Option<H256> {
    ...
    if let Some((replacement_id, replacement)) = self.replacements.pop() {  // LIFO

insert_replacement evicts from the front (oldest, remove(0)) but remove_and_promote promotes from the back (newest, pop()). Standard Kademlia promotes the longest-waiting candidate. The inconsistency should be intentional or corrected — self.replacements.remove(0) here if FIFO promotion is desired.

Minor — `do_get_nodes_at_distances` accepts a redundant `local_node_id` parameter

File: crates/networking/p2p/peer_table.rs

fn do_get_nodes_at_distances(&self, local_node_id: H256, distances: &[u32]) -> Vec<NodeRecord> {
    self.iter_contacts()
        .filter_map(|(contact_id, contact)| {
            let dist = distance(&local_node_id, contact_id) as u32;

Now that PeerTableServer stores self.local_node_id, the parameter is redundant. The caller passes msg.local_node_id from the message, which should always equal self.local_node_id. This should use self.local_node_id directly to remove the surface for divergence.

Minor — `KBucket` fields are `pub`

KBucket.contacts and KBucket.replacements are public, allowing callers to bypass all bucket invariants (capacity enforcement, replacement promotion). Unless external test introspection is required, these should be pub(crate) at most.

Positive notes

The fix for do_get_closest_nodes (finding the farthest node before replacing, instead of break-on-first) is a real correctness improvement.
The borrow-checker split in do_new_contact_records (immutable should_update check before mutable update) is clean.
Removal of the flat contacts: IndexMap and the old distance() wrapper is a good cleanup.
Keeping the logarithmic distance() function in do_get_nodes_at_distances for discv5 (where distances are specified as log₂ buckets) is correct and intentional.

Automated review by Claude (Anthropic) · sonnet · custom prompt

github-actions · 2026-04-08T13:11:49Z

🤖 Codex Code Review

Medium: replacement entries are treated as “already present” for deduplication, but most of the table logic cannot read or update them. contact_exists() checks both contacts and replacements at peer_table.rs:927, while get_contact() and get_contact_mut() only search contacts at peer_table.rs:916 and peer_table.rs:922. That breaks do_new_contact_records(): the contact_exists() branch at peer_table.rs:1151 can be taken for a replacement-only node, but the subsequent reads/writes become no-ops, so fresher ENRs, protocol flags, and fork-id validation never land. The same asymmetry can also make callers observe “known node” while get_contact() still returns None. I would either make the accessors search replacements too, or keep replacements out of contact_exists() and treat them as cache-only until promotion.
Medium: do_new_contacts() can insert duplicate node IDs into the replacement list. When a node already exists only in replacements, get_contact_mut() misses it at peer_table.rs:1129, so the code falls through to insert_contact() at peer_table.rs:1134. KBucket::insert_replacement() does not deduplicate at peer_table.rs:106, so repeated Neighbors/NODES messages can fill a bucket’s replacement cache with the same peer and evict distinct candidates. That is a correctness and discovery-quality regression; the simplest fix is to check contact_exists() before inserting, or dedup inside insert_replacement().

No EVM, gas-accounting, trie, or consensus-rule logic is touched here; the review surface is discovery/routing behavior only.

I could not run the affected tests in this sandbox because Cargo needs to resolve the git dependency libssz, and network access is blocked.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

greptile-apps · 2026-04-08T13:12:47Z

Greptile Summary

Replaces the flat IndexMap<H256, Contact> with a proper 256-bucket Kademlia routing table (KBucket with 16 main slots and 10 replacements each), uses raw XOR distance for get_closest_nodes, and removes the redundant local_node_id parameter from new_contacts/new_contact_records by storing it in PeerTableServer.

P1 — duplicate replacement entries: do_new_contacts uses get_contact_mut to check existence, but that helper only searches the main bucket list. A node already in the replacement list will pass the None branch and be inserted a second time into replacements on every subsequent re-announcement, silently corrupting the replacement queue.
P2 — metric overcounting: insert_contact ignores KBucket::insert's bool return (which signals main-list vs replacement-list placement) and always returns true, so METRICS.record_new_discovery() fires even for replacement-only insertions.

Confidence Score: 4/5

Safe to merge after fixing the replacement-list duplicate insertion bug in do_new_contacts.

One P1 defect (duplicate replacement entries in do_new_contacts) needs to be resolved before merge; remaining findings are P2 quality issues that do not block correctness of the primary discovery path.

crates/networking/p2p/peer_table.rs — specifically do_new_contacts (line ~1129) and insert_contact (line ~937)

Vulnerabilities

No security concerns identified. The IP-mismatch guard in do_validate_contact is preserved, XOR-distance comparisons are pure arithmetic with no external input reachable, and the routing table actor model keeps all mutations single-threaded.

Important Files Changed

Filename	Overview
crates/networking/p2p/peer_table.rs	Core Kademlia routing table refactor — adds KBucket struct with main/replacement lists; do_new_contacts has a duplicate-insertion bug for replacement-list entries and insert_contact discards bucket insert result causing metric overcounting
crates/networking/p2p/discv4/server.rs	Removes redundant local_node_id parameter from new_contacts calls; straightforward cleanup matching updated PeerTableServer API
crates/networking/p2p/discv5/server.rs	Removes redundant local_node_id parameter from new_contacts/new_contact_records calls; test helpers updated to pass local_node_id to PeerTableServer::spawn; get_nodes_at_distances still passes local_node_id redundantly
cmd/ethrex/initializers.rs	Passes local_node.node_id() to PeerTableServer::spawn as required by new API; trivial change
cmd/ethrex/l2/initializers.rs	Same local_node_id threading as L1 initializer; no issues
crates/networking/rpc/test_utils.rs	Test utility updated to pass a default H256 as local_node_id to PeerTableServer::spawn; acceptable for testing context
test/tests/p2p/discovery/discv5_server_tests.rs	Integration tests updated to work with new routing table; no functional changes to test logic

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[do_new_contacts / do_new_contact_records] --> B{discarded or local node?}
    B -- yes --> Z[skip]
    B -- no --> C{get_contact_mut found?\nmain list only}
    C -- yes --> D[add_protocol on existing contact]
    C -- no --> E{contact_exists?\nchecks replacements too}
    E -- yes, do_new_contact_records --> F[update replacement entry]
    E -- no / do_new_contacts skips this check --> G[insert_contact]
    G --> H[bucket_index XOR distance]
    H --> I{main list full?}
    I -- no --> J[push to contacts main list\nreturn true]
    I -- yes --> K[insert_replacement\nreturn false — IGNORED by insert_contact]
    K --> L[⚠ duplicate if node already in replacements]
    J --> M[record_new_discovery metric]
    K --> M

Comments Outside Diff (1)

crates/networking/p2p/peer_table.rs, line 387-391 (link)

Redundant local_node_id parameter in get_nodes_at_distances

PeerTableServer now owns self.local_node_id, so passing local_node_id from outside is redundant — callers must pass the same value that the routing table was constructed with, otherwise distance calculations diverge from bucket placement. do_get_nodes_at_distances could simply use self.local_node_id and the parameter could be dropped from both the protocol definition and the public API.

Prompt To Fix With AI

This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 387-391

Comment:
**Redundant `local_node_id` parameter in `get_nodes_at_distances`**

`PeerTableServer` now owns `self.local_node_id`, so passing `local_node_id` from outside is redundant — callers must pass the same value that the routing table was constructed with, otherwise distance calculations diverge from bucket placement. `do_get_nodes_at_distances` could simply use `self.local_node_id` and the parameter could be dropped from both the protocol definition and the public API.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 1129-1138

Comment:
**Replacement-list contacts are silently duplicated**

`get_contact_mut` only searches `bucket.contacts` (the main list), not `bucket.replacements`. When a node is already sitting in the replacement list and `do_new_contacts` is called for it again, the `if let Some(contact)` branch is skipped, a brand-new `Contact` is constructed, and `insert_contact` → `KBucket::insert_replacement` pushes a **second copy** of the same node ID into `replacements`. The protocol field on the original replacement entry is also never updated.

`do_new_contact_records` avoids this by checking `self.contact_exists(&node_id)` (which covers both lists). `do_new_contacts` should do the same:

```rust
async fn do_new_contacts(&mut self, nodes: Vec<Node>, protocol: DiscoveryProtocol) {
    for node in nodes {
        let node_id = node.node_id();
        if self.discarded_contacts.contains(&node_id) || node_id == self.local_node_id {
            continue;
        }
        if let Some(contact) = self.get_contact_mut(&node_id) {
            contact.add_protocol(protocol);
        } else if self.contact_exists(&node_id) {
            // Node is in the replacement list; update its protocol entry there.
            if let Some(idx) = self.bucket_for(&node_id) {
                if let Some((_, c)) = self.buckets[idx]
                    .replacements
                    .iter_mut()
                    .find(|(id, _)| id == &node_id)
                {
                    c.add_protocol(protocol);
                }
            }
        } else {
            let contact = Contact::new(node, protocol);
            if self.insert_contact(node_id, contact) {
                METRICS.record_new_discovery().await;
            }
        }
    }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 937-943

Comment:
**`insert_contact` discards `KBucket::insert`'s return value, over-counting discoveries**

`KBucket::insert` returns `false` when the bucket is full and the contact is placed in the replacement list instead. `insert_contact` ignores that value and unconditionally returns `true` for any non-local node — so callers (e.g. `do_new_contacts`, `handle_insert_if_new`) record `METRICS.record_new_discovery()` even for contacts that only land in the replacement list.

```rust
fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool {
    let Some(idx) = self.bucket_for(&node_id) else {
        return false;
    };
    self.buckets[idx].insert(node_id, contact)   // propagate the bucket's return value
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 387-391

Comment:
**Redundant `local_node_id` parameter in `get_nodes_at_distances`**

`PeerTableServer` now owns `self.local_node_id`, so passing `local_node_id` from outside is redundant — callers must pass the same value that the routing table was constructed with, otherwise distance calculations diverge from bucket placement. `do_get_nodes_at_distances` could simply use `self.local_node_id` and the parameter could be dropped from both the protocol definition and the public API.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(l1): reintroduce proper Kademlia k-..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T13:12:51Z

+            if let Some(contact) = self.get_contact_mut(&node_id) {
+                // Contact already exists, just add the protocol
+                contact.add_protocol(protocol);
+            } else {
+                let contact = Contact::new(node, protocol);
+                if self.insert_contact(node_id, contact) {
                    METRICS.record_new_discovery().await;
                }
-                Entry::Occupied(mut occupied_entry) => {
-                    // Contact already exists, just add the protocol
-                    occupied_entry.get_mut().add_protocol(protocol);
-                }
            }
        }


Replacement-list contacts are silently duplicated

get_contact_mut only searches bucket.contacts (the main list), not bucket.replacements. When a node is already sitting in the replacement list and do_new_contacts is called for it again, the if let Some(contact) branch is skipped, a brand-new Contact is constructed, and insert_contact → KBucket::insert_replacement pushes a second copy of the same node ID into replacements. The protocol field on the original replacement entry is also never updated.

do_new_contact_records avoids this by checking self.contact_exists(&node_id) (which covers both lists). do_new_contacts should do the same:

async fn do_new_contacts(&mut self, nodes: Vec<Node>, protocol: DiscoveryProtocol) { for node in nodes { let node_id = node.node_id(); if self.discarded_contacts.contains(&node_id) || node_id == self.local_node_id { continue; } if let Some(contact) = self.get_contact_mut(&node_id) { contact.add_protocol(protocol); } else if self.contact_exists(&node_id) { // Node is in the replacement list; update its protocol entry there. if let Some(idx) = self.bucket_for(&node_id) { if let Some((_, c)) = self.buckets[idx] .replacements .iter_mut() .find(|(id, _)| id == &node_id) { c.add_protocol(protocol); } } } else { let contact = Contact::new(node, protocol); if self.insert_contact(node_id, contact) { METRICS.record_new_discovery().await; } } } }

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/peer_table.rs Line: 1129-1138 Comment: **Replacement-list contacts are silently duplicated** `get_contact_mut` only searches `bucket.contacts` (the main list), not `bucket.replacements`. When a node is already sitting in the replacement list and `do_new_contacts` is called for it again, the `if let Some(contact)` branch is skipped, a brand-new `Contact` is constructed, and `insert_contact` → `KBucket::insert_replacement` pushes a **second copy** of the same node ID into `replacements`. The protocol field on the original replacement entry is also never updated. `do_new_contact_records` avoids this by checking `self.contact_exists(&node_id)` (which covers both lists). `do_new_contacts` should do the same: ```rust async fn do_new_contacts(&mut self, nodes: Vec<Node>, protocol: DiscoveryProtocol) { for node in nodes { let node_id = node.node_id(); if self.discarded_contacts.contains(&node_id) || node_id == self.local_node_id { continue; } if let Some(contact) = self.get_contact_mut(&node_id) { contact.add_protocol(protocol); } else if self.contact_exists(&node_id) { // Node is in the replacement list; update its protocol entry there. if let Some(idx) = self.bucket_for(&node_id) { if let Some((_, c)) = self.buckets[idx] .replacements .iter_mut() .find(|(id, _)| id == &node_id) { c.add_protocol(protocol); } } } else { let contact = Contact::new(node, protocol); if self.insert_contact(node_id, contact) { METRICS.record_new_discovery().await; } } } } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-08T13:12:52Z

+    fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool {
+        let Some(idx) = self.bucket_for(&node_id) else {
+            return false;
+        };
+        self.buckets[idx].insert(node_id, contact);
+        true
+    }


insert_contact discards KBucket::insert's return value, over-counting discoveries

KBucket::insert returns false when the bucket is full and the contact is placed in the replacement list instead. insert_contact ignores that value and unconditionally returns true for any non-local node — so callers (e.g. do_new_contacts, handle_insert_if_new) record METRICS.record_new_discovery() even for contacts that only land in the replacement list.

fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool { let Some(idx) = self.bucket_for(&node_id) else { return false; }; self.buckets[idx].insert(node_id, contact) // propagate the bucket's return value }

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/networking/p2p/peer_table.rs Line: 937-943 Comment: **`insert_contact` discards `KBucket::insert`'s return value, over-counting discoveries** `KBucket::insert` returns `false` when the bucket is full and the contact is placed in the replacement list instead. `insert_contact` ignores that value and unconditionally returns `true` for any non-local node — so callers (e.g. `do_new_contacts`, `handle_insert_if_new`) record `METRICS.record_new_discovery()` even for contacts that only land in the replacement list. ```rust fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool { let Some(idx) = self.bucket_for(&node_id) else { return false; }; self.buckets[idx].insert(node_id, contact) // propagate the bucket's return value } ``` How can I resolve this? If you propose a fix, please make it concise.

- Fix clippy lints: collapsible if, let...else to ? operator - Fix duplicate replacement entries: use contact_exists() in do_new_contacts to check both main and replacement lists - Fix insert_contact: propagate KBucket::insert return value so metrics only count contacts inserted into main list - Fix remove_and_promote: use FIFO (promote oldest replacement) instead of LIFO, matching standard Kademlia behavior - Remove redundant local_node_id param from get_nodes_at_distances protocol method (now uses self.local_node_id) - Make KBucket fields pub(crate) instead of pub

github-actions · 2026-04-08T13:16:59Z

🤖 Claude Code Review

Now I have everything I need for a thorough review. Let me compile my findings.

Review: feat(l1): reintroduce proper Kademlia k-bucket routing table (#6458)

The overall structure is sound — moving from a flat IndexMap to proper k-buckets is the right direction, and the GenServer ownership pattern is preserved cleanly. The API cleanup (moving local_node_id into the struct) is a net improvement. However, there are a few correctness issues in the bucket mechanics that need attention before merge.

Bugs

1. do_new_contacts can create duplicate replacement-list entries

peer_table.rs:1129

if let Some(contact) = self.get_contact_mut(&node_id) {
    contact.add_protocol(protocol);
} else {
    // ...
    self.insert_contact(node_id, contact);  // ← runs even when node is in replacements
}

get_contact_mut (line 922) only searches bucket.contacts — it skips bucket.replacements. If a node was previously added to the replacement list (because the main bucket was full), get_contact_mut returns None, and the code inserts the node into replacements again, creating duplicates. Over time a replacement slot can contain the same node ID multiple times.

The correct guard is contact_exists() (which checks both lists), matching what do_new_contact_records already does (line 1151).

2. Replacement promotion is LIFO instead of FIFO — diverges from Kademlia spec

peer_table.rs:120

fn remove_and_promote(&mut self, node_id: &H256) -> Option<H256> {
    // ...
    if let Some((replacement_id, replacement)) = self.replacements.pop() {  // ← newest

insert_replacement (line 108) evicts the oldest entry (remove(0)) when the list is full, treating it as a FIFO queue. But remove_and_promote promotes the newest entry (pop()), making promotion LIFO. These two policies are inconsistent, and the LIFO promotion deviates from the Kademlia specification (§2.4) which requires the oldest queued candidate to be promoted first.

Fix: change self.replacements.pop() to self.replacements.remove(0) (or switch the storage to VecDeque for efficiency).

3. Silent ENR record loss for nodes that are only in the replacement list

peer_table.rs:1151–1179

When contact_exists returns true for a replacement-list node, the code then calls get_contact(&node_id) (line 1155), which returns None because it only searches main contacts. This causes should_update = false (via .unwrap_or(false) on line 1160). Then get_contact_mut on line 1166 also returns None, so the update block is silently skipped entirely. The incoming ENR record — including its fork-ID — is dropped with no diagnostic.

This is a consequence of the inconsistency between contact_exists (checks both lists) and get_contact/get_contact_mut (checks main list only). Either the replacement list needs its own update path, or the API contract must be clarified.

4. insert_contact discards KBucket::insert's return value, skewing metrics

peer_table.rs:937–943

fn insert_contact(&mut self, node_id: H256, contact: Contact) -> bool {
    let Some(idx) = self.bucket_for(&node_id) else { return false; };
    self.buckets[idx].insert(node_id, contact);  // return value ignored
    true   // ← true even when contact only reached the replacement list
}

KBucket::insert returns true for main-list insertions and false for replacement-list insertions (line 93–103), but insert_contact always returns true (as long as the bucket exists). Consequences:

METRICS.record_new_discovery() fires for replacement-list additions — those are not yet "discovered" from a connectivity standpoint.
InsertIfNew (used by discovery) returns true to the caller even when the node only reached the replacement list, which may cause incorrect decision-making upstream.

Minor Issues

5. do_get_nodes_at_distances still accepts local_node_id as a parameter

peer_table.rs:1109

fn do_get_nodes_at_distances(&self, local_node_id: H256, distances: &[u32]) -> Vec<NodeRecord> {

The whole point of this PR is to store local_node_id in self. This function was partially migrated (switched to iter_contacts) but still takes the local ID as a parameter instead of using self.local_node_id. Left as-is it creates an inconsistency that forces callers to track and pass the value.

6. Redundant variable in remove_and_promote

peer_table.rs:121

if let Some((replacement_id, replacement)) = self.replacements.pop() {
    let promoted_id = replacement_id;           // ← dead rename
    self.contacts.push((replacement_id, replacement));
    Some(promoted_id)

promoted_id is just a rename of replacement_id; the binding serves no purpose. Should be Some(replacement_id) directly.

7. pub fields on KBucket

peer_table.rs:66–67

pub struct KBucket {
    pub contacts: Vec<(H256, Contact)>,
    pub replacements: Vec<(H256, Contact)>,
}

Exposing both fields as fully pub bypasses encapsulation and allows external code to mutate the invariants (e.g., inserting duplicates, reordering). If external crates only need read access for diagnostics/tests, pub(crate) would be more appropriate. If true external visibility is needed, consider accessor methods.

What Looks Good

The bucket_index / xor_distance math is correct: U256::bits() - 1 correctly gives floor(log2(XOR)), producing bucket indices 0–255 matching NUMBER_OF_BUCKETS = 256.
The do_get_closest_nodes rewrite correctly identifies the farthest element (via max_by_key) before replacing, fixing the previous bug where only the first farther element was replaced.
Removing TARGET_CONTACTS = 100_000 from TargetReached is the right call — the new table caps at 256 × 16 = 4 096 main contacts, making the old threshold permanently unreachable.
The refactoring from IndexMap::Entry to get_contact_mut / insert_contact is clean and readable throughout the handler methods.
The API signature cleanup (local_node_id removed from new_contacts/new_contact_records) is an improvement.

Automated review by Claude (Anthropic) · sonnet · custom prompt

- Run rustfmt on all affected files - Add get_any/get_contact_or_replacement accessors that search both main and replacement lists - Fix do_new_contact_records: ENR updates for replacement-list nodes are no longer silently dropped

Add Prometheus histograms for measuring k-bucket operation latencies: - ethrex_kademlia_insert_contact_duration_seconds: times insert_contact - ethrex_kademlia_iter_contacts_duration_seconds: times full table scans (via collect_contacts, used by get_closest_nodes) Add three Grafana panels to the P2P Packets dashboard: - insert_contact p50/p99 latency - iter_contacts full-scan p50/p99 latency - Operations rate (ops/s) for both

Add Prometheus histograms to measure peer table operation latencies on the current flat IndexMap implementation. These metrics provide the baseline for comparing against the k-bucket implementation in #6458. Metrics added: - ethrex_kademlia_insert_contact_duration_seconds: times contact insertion via IndexMap::entry in do_new_contacts - ethrex_kademlia_iter_contacts_duration_seconds: times full table scan in do_get_closest_nodes Also adds three Grafana panels to the P2P Packets dashboard for visualizing p50/p99 latencies and operation rates.

Fixes bug where add_protocol was silently skipped for contacts in the replacement list, since get_contact_mut only searches the main bucket list.

ElFantasma

👍

## Summary Add Prometheus histogram metrics to measure peer table operation latencies on the current flat `IndexMap` implementation. This provides the **baseline** for comparing against the k-bucket implementation in #6458. **Metrics added** (behind `#[cfg(feature = "metrics")]`): - `ethrex_kademlia_insert_contact_duration_seconds` — times contact insertion via `IndexMap::entry` in `do_new_contacts` - `ethrex_kademlia_iter_contacts_duration_seconds` — times full table scan in `do_get_closest_nodes` **Grafana panels** added to P2P Packets dashboard: - insert_contact p50/p99 latency - iter_contacts full-scan p50/p99 latency - Operations rate (ops/s) ## Workflow 1. Merge this PR → deploy to mainnet → collect baseline data 2. Merge #6458 (k-bucket implementation) → deploy → compare latencies on the same Grafana panels ## Test plan - [x] `cargo clippy` passes with and without `metrics` feature - [x] All 21 p2p tests pass - [ ] Verify metrics appear in Prometheus endpoint with `--metrics` flag

avilagaston9

LGTM. It'd be great to add some unit tests for the KBucket struct itself — insert, remove_and_promote, bucket_index edge cases, replacement eviction, etc.

avilagaston9 · 2026-04-14T13:49:04Z


-        for (contact_id, contact) in &self.contacts {
-            let dist = Self::distance(&node_id, contact_id);
+        for (contact_id, contact) in self.collect_contacts() {


nit: collect_contacts() clones every contact across all 256 buckets into a new Vec just to iterate over them. You can use iter_contacts() here instead (yields references, zero allocations) and only clone the 16 selected nodes. After that collect_contacts() becomes dead code and can be removed.

…able

- Replace collect_contacts() with iter_contacts() in do_get_closest_nodes to avoid cloning all contacts across 256 buckets. Remove the now-dead collect_contacts() method. - Add unit tests for KBucket: insert, contains, get/get_any, remove_and_promote, replacement eviction, and bucket_index edge cases.

…talls Fixes snapsync failures where peer count stays constant and sync eventually fails with "Failed to receive block headers" after hours of operation. Root cause: After PR #6458 introduced Kademlia k-buckets, peers that became unresponsive during sync weren't marked as disposable, so they remained in the routing table indefinitely. New peers went into replacement lists but were never promoted because dead peers weren't pruned. Changes: - Enhanced prune() to remove disposable contacts from both main and replacement lists, with automatic promotion of replacements - Mark peers as disposable when they timeout during RLPx operations (block headers, block bodies, sync head requests) - Added periodic pruning in the snap_sync main loop to ensure dead peers are regularly removed and replaced Evidence from CI artifacts showed peer count stuck at 6 throughout 3h35m sync before failure. This fix enables peer rotation so healthy peers from replacement lists can take over when active peers become unresponsive.

…6458)" This reverts commit 648719f.

…e" (lambdaclass#6505) **Motivation** We observed that lambdaclass#6458 caused a performance regression in snapsync. **Description** We are reverting for now, until the cause is addressed.

github-actions Bot assigned azteca1998 Apr 8, 2026

github-actions Bot added the L1 Ethereum client label Apr 8, 2026

github-project-automation Bot added this to ethrex_l1 Apr 8, 2026

azteca1998 marked this pull request as ready for review April 8, 2026 13:09

azteca1998 requested review from a team, ManuelBilbao, avilagaston9 and ilitteri as code owners April 8, 2026 13:09

ethrex-project-sync Bot moved this to In Review in ethrex_l1 Apr 8, 2026

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

azteca1998 added 2 commits April 8, 2026 15:51

azteca1998 mentioned this pull request Apr 8, 2026

feat(l1): add peer table timing metrics (Kademlia baseline) #6459

Merged

3 tasks

ElFantasma requested changes Apr 8, 2026

View reviewed changes

Comment thread crates/networking/p2p/peer_table.rs

github-project-automation Bot moved this from In Review to In Progress in ethrex_l1 Apr 8, 2026

fix(l1): use get_contact_or_replacement_mut for protocol updates

ac5f0f2

Fixes bug where add_protocol was silently skipped for contacts in the replacement list, since get_contact_mut only searches the main bucket list.

ElFantasma approved these changes Apr 10, 2026

View reviewed changes

edg-l approved these changes Apr 13, 2026

View reviewed changes

avilagaston9 approved these changes Apr 14, 2026

View reviewed changes

github-project-automation Bot moved this from In Progress to In Review in ethrex_l1 Apr 14, 2026

azteca1998 added 2 commits April 15, 2026 15:37

merge: resolve conflicts with main in metrics descriptions and peer_t…

1221d81

…able

azteca1998 enabled auto-merge April 15, 2026 13:46

azteca1998 disabled auto-merge April 15, 2026 14:37

style(l1): fix rustfmt in peer_table tests

e1c629e

azteca1998 enabled auto-merge April 15, 2026 14:40

azteca1998 added this pull request to the merge queue Apr 15, 2026

Merged via the queue into main with commit 648719f Apr 15, 2026
56 of 57 checks passed

azteca1998 deleted the feat/reintroduce-kademlia-table branch April 15, 2026 16:56

github-project-automation Bot moved this from In Review to Done in ethrex_l1 Apr 15, 2026

azteca1998 mentioned this pull request Apr 16, 2026

fix(l1): mark unresponsive peers as disposable to prevent snapsync stalls #6497

Closed

6 tasks

This was referenced Apr 20, 2026

perf(p2p): randomize contact selection for RLPx connection initiation #6503

Closed

perf(p2p): add flat connection pool decoupled from Kademlia routing table #6504

Closed

iovoid added a commit that referenced this pull request Apr 20, 2026

Revert "feat(l1): reintroduce proper Kademlia k-bucket routing table (#…

5bf3e6e

…6458)" This reverts commit 648719f.

iovoid mentioned this pull request Apr 20, 2026

revert(l1): revert "reintroduce proper Kademlia k-bucket routing table" #6505

Merged

azteca1998 mentioned this pull request Apr 21, 2026

feat(l1): kademlia k-bucket routing table v2 #6511

Merged

Conversation

azteca1998 commented Apr 8, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

github-actions Bot commented Apr 8, 2026

🤖 Codex Code Review

Uh oh!

github-actions Bot commented Apr 8, 2026

🤖 Claude Code Review

Review: feat(l1): reintroduce proper Kademlia k-bucket routing table

Bug 1 — do_new_contacts may produce duplicate replacement-list entries

Bug 2 — insert_contact discards KBucket::insert's return value, over-counting metrics

Correctness — ENR updates silently dropped for replacement-list nodes

Correctness — remove_and_promote uses LIFO, not FIFO

Minor — do_get_nodes_at_distances accepts a redundant local_node_id parameter

Minor — KBucket fields are pub

Positive notes

Uh oh!

github-actions Bot commented Apr 8, 2026

🤖 Codex Code Review

Uh oh!

greptile-apps Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 8, 2026

🤖 Claude Code Review

Review: feat(l1): reintroduce proper Kademlia k-bucket routing table (#6458)

Bugs

Minor Issues

What Looks Good

Uh oh!

Uh oh!

ElFantasma left a comment

Choose a reason for hiding this comment

Uh oh!

avilagaston9 left a comment

Choose a reason for hiding this comment

Uh oh!

avilagaston9 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Apr 8, 2026 •

edited

Loading

Bug 1 — `do_new_contacts` may produce duplicate replacement-list entries

Bug 2 — `insert_contact` discards `KBucket::insert`'s return value, over-counting metrics

Correctness — `remove_and_promote` uses LIFO, not FIFO

Minor — `do_get_nodes_at_distances` accepts a redundant `local_node_id` parameter

Minor — `KBucket` fields are `pub`

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading