revert(l1): revert "reintroduce proper Kademlia k-bucket routing table"#6505
Conversation
Greptile SummaryThis PR reverts #6458 ("reintroduce proper Kademlia k-bucket routing table") due to a snapsync performance regression. The revert replaces the 256-bucket Kademlia structure in Confidence Score: 5/5Safe to merge — clean revert with all call sites updated consistently and no logic errors introduced. All findings are P2 style suggestions. The revert is mechanically consistent: spawn signatures, protocol call sites (discv4, discv5), test utilities, and Grafana dashboards are all updated in lock-step. No files require special attention.
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/peer_table.rs | Core revert: replaces 256-bucket Kademlia structure with a flat IndexMap, removes local_node_id from server state (now passed per-call to distance-aware methods) |
| crates/networking/p2p/discv4/server.rs | Updated new_contacts call sites to pass local_node_id explicitly, matching the reverted peer_table API |
| crates/networking/p2p/discv5/server.rs | Updated new_contacts, new_contact_records, and get_nodes_at_distances call sites to pass local_node_id explicitly |
| cmd/ethrex/initializers.rs | PeerTableServer::spawn call updated to drop local_node_id argument |
| crates/blockchain/metrics/p2p.rs | Metric help-string descriptions updated to remove Kademlia terminology; metric names (used by Grafana) are intentionally unchanged |
| cmd/ethrex/l2/initializers.rs | PeerTableServer::spawn call updated to drop local_node_id argument (L2 path) |
| crates/networking/rpc/test_utils.rs | PeerTableServer::spawn call in test utilities updated to match new signature |
| test/tests/p2p/discovery/discv5_server_tests.rs | Test helpers updated to pass local_node_id to new_contacts, matching the reverted API |
| metrics/provisioning/grafana/dashboards/common_dashboards/p2p_packets.json | Grafana dashboard JSON updated to reflect the reverted peer table implementation |
Sequence Diagram
sequenceDiagram
participant DS as DiscoveryServer (v4/v5)
participant PT as PeerTableServer
participant IM as IndexMap (flat contacts)
DS->>PT: new_contacts(nodes, local_node_id, protocol)
PT->>IM: entry(node_id) — insert if Vacant
DS->>PT: get_nodes_at_distances(local_node_id, distances)
PT->>IM: iter all contacts
PT-->>DS: Vec<NodeRecord> within requested log-distances
DS->>PT: new_contact_records(records, local_node_id)
PT->>IM: entry(node_id) — insert or update record
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 1134-1135
Comment:
**Redundant wrapper around `utils::distance`**
`Self::distance` is a private static method that immediately delegates to `utils::distance`. This indirection adds no value — call sites using `Self::distance` (e.g. `get_closest_nodes`) could directly call the already-imported `distance` from utils, matching what `do_get_nodes_at_distances` does on line 977.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Revert "feat(l1): reintroduce proper Kad..." | Re-trigger Greptile
🤖 Claude Code ReviewHere is my review of PR #6505: PR #6505 —
|
| fn distance(node_id_1: &H256, node_id_2: &H256) -> usize { | ||
| distance(node_id_1, node_id_2) |
There was a problem hiding this comment.
Redundant wrapper around
utils::distance
Self::distance is a private static method that immediately delegates to utils::distance. This indirection adds no value — call sites using Self::distance (e.g. get_closest_nodes) could directly call the already-imported distance from utils, matching what do_get_nodes_at_distances does on line 977.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 1134-1135
Comment:
**Redundant wrapper around `utils::distance`**
`Self::distance` is a private static method that immediately delegates to `utils::distance`. This indirection adds no value — call sites using `Self::distance` (e.g. `get_closest_nodes`) could directly call the already-imported `distance` from utils, matching what `do_get_nodes_at_distances` does on line 977.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
🤖 Kimi Code ReviewCritical Issues 1. Bug in The function uses Additionally, the replacement logic is buggy: for (i, (_, d)) in &mut nodes.iter().enumerate() {
if dist < *d {
nodes[i] = (contact.node.clone(), dist);
break; // Wrong: doesn't find the farthest node
}
}This replaces the first node with greater distance, not the farthest node in the list. Fix: Use full XOR distance ( let dist = *node_id ^ *contact_id; // XOR as H256
if nodes.len() < MAX_NODES_IN_NEIGHBORS_PACKET {
nodes.push((contact.node.clone(), dist));
} else if let Some((max_idx, (_, max_dist))) = nodes.iter().enumerate().max_by_key(|(_, (_, d))| *d) {
if dist < *max_dist {
nodes[max_idx] = (contact.node.clone(), dist);
}
}2. Changed from: self.peers.len() >= self.target_peersTo: self.contacts.len() >= TARGET_CONTACTS && self.peers.len() >= self.target_peersThis requires 100,000 contacts (hardcoded Fix: Revert to checking only peer count, or use a much lower threshold (e.g., 3. Unbounded memory growth in The Fix: Enforce the limit in if self.contacts.len() >= TARGET_CONTACTS {
// Evict oldest or random entry, or skip insertion
return;
}Minor Issues 4. Inefficient Scanning all contacts (potentially 100k) for every 5. Missing tests ( The Kademlia-specific tests were removed (lines 1321-1514), but no tests were added for the new flat-map implementation. Add tests for:
6. Unused Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
Lines of code reportTotal lines added: Detailed view |
🤖 Codex Code Review
No consensus/EVM/RLP-sensitive code was touched here; the review findings are confined to discovery/peer-table behavior. I couldn’t run the Rust tests in this sandbox because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Brings in main commits since the prior merge: #6516 EIP-8025 compliance (Electra-aligned ExecutionRequests typed container in NewPayloadRequest, MAX_CONSOLIDATION_REQUESTS_PER_PAYLOAD corrected from 1 to 2, to_encoded_requests() helper for EIP-7685 bytes, removal of ExecutionPayloadHeader/NewPayloadRequestHeader, new byte-oriented execution_program entrypoint that decodes the wire format internally and returns valid: false instead of erroring on post-decode failures), #6463 BAL withdrawal reverse check (DB->BAL direction so a malicious builder can't omit a withdrawal recipient from the BAL), #6505 Kademlia k-bucket revert (PeerTableServer::spawn no longer takes a node_id), plus snap-sync observability + dashboards (#6470), pivot-update crash fix (#6475), weighted peer selection (#6428), txpool_contentFrom/txpool_inspect RPC (#6446), block-by-block exec fallback (#6464), Amsterdam EELS branch pin (#6495), and rollup store SQLite v9->v10 migration (#6514). Conflict resolutions: - crates/common/types/stateless_ssz.rs: this branch had already moved the EIP-8025 SSZ types out of crates/common/types/eip8025_ssz.rs into stateless_ssz.rs and tucked the native-rollup containers below them. Kept that layout, applied #6516's content updates to the EIP-8025 section (renamed spec-limit constants, ExecutionRequests typed container with to_encoded_requests, dropped header types and their tests), pulled in the EncodedRequests import, and kept both the new test_execution_requests_to_encoded_bytes and the branch's stateless round-trip tests. - crates/guest-program/src/l1/program.rs: adopted #6516's new execution_program(bytes: &[u8], crypto) API with the internal decode_eip8025 call, the validate_eip8025_execution helper, and the decode-failure test. Rewrote all `eip-8025` feature gates as `experimental-devnet` and all `eip8025_ssz::` paths as `stateless_ssz::` to match this branch's renames. - crates/guest-program/bin/{sp1,risc0,zisk,openvm}/src/main.rs: applied #6516's simplification (drop decode_eip8025 import, pass &input straight to execution_program) under the experimental-devnet feature gate. Also flipped the rkyv::rancor::Error import gate from the old `eip-8025` name to `experimental-devnet` so the non-devnet build still has the import it needs. - crates/prover/src/backend/exec.rs: kept #6516's updated comment ("raw input bytes" instead of "(NewPayloadRequest, ExecutionWitness)") under the experimental-devnet feature gate. Auto-merged regions checked: crates/vm/backends/levm/mod.rs picked up all of #6463's Part B (DB->BAL) reverse check intact, and cmd/ethrex/l2/initializers.rs picked up #6505's PeerTableServer::spawn signature change. Verified cargo fmt --all clean, cargo check --workspace clean, cargo check --workspace --tests clean, and cargo check -p ethrex-guest-program --features experimental-devnet --tests clean.
Motivation
We observed that #6458 caused a performance regression in snapsync.
Description
We are reverting for now, until the cause is addressed.