Feature/Rework DiscV4#8616
Conversation
…nto feature/kademlia-discv4
- NodeSessionTests: collapse three flag-and-timeout tests into one [TestCaseSource] - IteratorNodeLookupTests: extract routing-table / find-neighbours stub helpers - KademliaDiscv4AdapterTests: reuse ConfigureBondCallback in ping test - KademliaSimulation: drop accidentally duplicated [TestFixture(3, 0)] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- EnrResponseHandler / PongMsgHandler: collapse Handle to expression body - DoubleEndedLru.GetByHash: collapse to ternary expression body - IteratorNodeLookup.ShouldStop: drop redundant if, return condition directly - IteratorNodeLookup.FindNeighbour: collapse unreachable check + send to single return - NodeSession.RecordStatsFor*Msg: dedupe via shared helper with switch expression (relies on Discovery*Out/Discovery*In pairs being adjacent in NodeStatsEventType) - KademliaDiscv4Adapter: drop #region - ITaskCompleter<T>: move to its own file - TalkReqAndRespHandler.HandleResponse: collapse to one line Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LukaszRozmej
left a comment
There was a problem hiding this comment.
Deep review
Findings below are observations, not pushed changes — please decide which to apply.
Important (real defects, but not blockers)
-
KBucketTree.cs:59,153— events fire underMcsLock.OnNodeAdded/OnNodeRemovedare invoked synchronously while_lockis held. The internal subscriber (KademliaNodeSource.Handler) uses non-blockingChannel.Writer.TryWrite, butDiscoveryApp.OnKademliaNodeRemovedre-raises a publicNodeRemovedevent with external (peer-manager) subscribers — any one re-entering the tree deadlocks. Fix: capture the node, release theusingscope, thenInvoke. -
LookupKNearestNeighbour.cs:65,107— racyTaskCompletionSourcereassignment, and the token is misused. Multiple workers race onif (roundComplete.TrySetResult()) roundComplete = new TaskCompletionSource(token);outside any lock. Alsonew TaskCompletionSource(token)treatsCancellationTokenas opaquestate— cancellation is not actually wired in. Fix:Interlocked.Exchangefor the swap, and drop the(token)arg or register cancellation viaRegisterToCompletionSource. -
Kademlia.cs:103-115—Bootstraponly catchesOperationCanceledException. Any non-cancellation exception fromPing(transient network error, etc.) propagates throughParallel.ForEachAsync, killsBootstrap, killsRun'swhile (true), and lands inActivateAsync.catch(Exception)(DiscoveryApp.cs:244). Discovery then dies for the rest of the process lifetime. Wrap individual pings the wayIteratorNodeLookup.FindNeighbourdoes, or catch+log insideRun's outer loop and continue. -
EnrResponseHandler.cs:12— accepts any unsolicited ENR response. UnlikePongMsgHandler(which validatespong.PingMdc == ping.Mdc), this handler takes the first ENR seen on the(nodeId, EnrResponse)key.EnrResponseMsgalready carriesRequestKeccak; capture the request hash on send and compare. -
KademliaNodeSource.cs:118-122—finallyawaitsdiscoverTaskbefore unsubscribing. If anything inside the consumerawait foreachthrows before cancellation, the producer keeps running andOnNodeAddedstays subscribed until cancel. Either unsubscribe first, orch.Writer.TryComplete()before the await. -
KBucket.GetAllWithHash()/DoubleEndedLru.GetAllWithHash()return the liveLinkedList<>. All current callers happen to go throughKBucketTree._lock, butIRoutingTable.IterateBuckets()exposesKBucketto the outside — a future caller iterating outside the lock will hitInvalidOperationException: Collection was modified. Either materialize under lock or document the lock requirement.
Nits
- Typo on public surface:
LookupFindNeighbourHardTimoutinKademliaConfigandDiscV4KademliaModule.cs:43— rename to…Timeout. KademliaNodeSourcector paramlookup2— leftover name from refactor.KBucketTree.GetAllAtDistanceRecursiveandKBucketTree.GetKNearestNeighbouruse LINQ.Where/.Select/.Takechains on hot lookup paths — convert to manualforeachper project style (.agents/rules/coding-style.mdrule: no LINQ on hot paths).KBucketTree.cs:130.Reverse()onIEnumerableallocates a buffer on every split.Hash256XorUtils.MaxDistancecan bepublic const intinstead of a property.Kademlia.cs:61comment: "logid" → "logic".KademliaDiscv4Adapter.cs:175comment: "does not seems" → "does not seem".
TODO assessment
Discv4/DiscV4KademliaModule.cs:43 — LookupFindNeighbourHardTimout = SendNodeTimeout; // TODO: This seems very low. — Doable now. SendNodeTimeout defaults to 500ms, used here as the per-FindNode hard timeout inside Kademlia lookup. At 500ms many slow peers get cut off before they respond. Add a dedicated KademliaLookupHardTimeoutMs on IDiscoveryConfig (default ~2000ms), or use 2 * SendNodeTimeout. Combine with the Timout typo rename.
Kademlia/Hash256XorUtils.cs:70 — // TODO: Just add a min/max range per bucket and randomized between them. — Defer. Needs threading per-bucket prefix ranges through IRoutingTable/KBucketTree and changing IKeyOperator.CreateRandomKeyAtDistance signatures. No correctness risk today — the function is covered by Hash256XorUtilsTests.TestGetRandomHash. Code-shape TODO, follow-up.
Kademlia/KBucketTree.cs:28 — // TODO: Double check and probably make lockless. — Defer. Non-trivial lockfree design, would need a lockless DoubleEndedLru and new concurrent stress tests. The lock itself is correct; throughput is the concern. But the concrete defect the TODO hints at — event invocation under the lock — is Important #1 above and can be fixed independently of going lockless.
Test coverage gaps worth filling
KBucketTree.SplitBucketLRU order preservation (the.Reverse()claim) — no assertion.KBucketTree.GetAllAtDistancedepth > targetDepth"collect all leaves" branch — untested.NodeHealthTracker.TryRefresh— entire Task.Run path (ping head → refresh on success → evict on failure) not exercised.LookupKNearestNeighbour— no mid-flight cancellation test; only oneAlphavalue implicitly exercised.KademliaDiscv4Adapter.RemoveMessageHandler— CAS retry loop never stressed with concurrent add/remove.EnrResponseHandler— no test that an unsolicited ENR response is rejected (would catch Important #4).DiscoveryPersistenceManager.LoadPersistedNodes—Nodeconstruction throwing and ping throwing paths not asserted to skip and continue.
Important: - KBucketTree: raise OnNodeAdded/OnNodeRemoved after releasing _lock to avoid re-entrancy deadlocks via external subscribers. - LookupKNearestNeighbour: atomic CAS swap of roundComplete TCS and drop the misused (token) state argument; mark `finished` access with Volatile. - Kademlia.Bootstrap: catch non-OCE exceptions from individual ping calls and from the Bootstrap iteration in Run, so a transient network error no longer kills the discovery loop. - EnrResponseHandler: validate resp.RequestKeccak against the MDC of the EnrRequest just sent. EnrRequestMsg.Hash is now also set by the serializer on send. - KademliaNodeSource: unsubscribe Handler and complete the channel writer before awaiting the producer task in the finally block. - DoubleEndedLru/KBucket: GetAll / GetAllWithHash now materialize arrays under the inner lock rather than returning the live LinkedList. Nits: - Rename LookupFindNeighbourHardTimout -> LookupFindNeighbourHardTimeout. - Rename KademliaNodeSource ctor param lookup2 -> lookup. - Replace LINQ chains on KBucketTree hot paths (GetAllAtDistance, GetKNearestNeighbour, SplitBucket) with manual loops. - Hash256XorUtils.MaxDistance -> public const int. - Drop two stale comments and fix `does not seems` typo. Tests: - KBucketTreeTests: split preserves LRU order; GetAllAtDistance covers the deeper-bucket branch. - LookupKNearestNeighbourTests: mid-flight cancellation across Alpha=1/3. - NodeHealthTrackerTests: TryRefresh removes node on ping timeout, keeps it on success. - KademliaDiscv4AdapterTests: SendEnrRequest now wires request hash; added rejection test for unsolicited / wrong-keccak ENR responses. - DiscoveryPersistenceManagerTests: skip nodes that fail Node ctor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # src/Nethermind/Nethermind.Xdc/XdcModule.cs
- Extract IdentityNodeHashProvider into a shared file so KBucketTreeTests and LookupKNearestNeighbourTests stop redefining it. - KBucketTreeTests: pull common KBucketTree construction into CreateTree and Add helpers; replace duplicated Assert.That index lookups with single Is.EqualTo array comparisons. - LookupKNearestNeighbourTests: extract CreateLookup factory and named seed/neighbour hash constants so the two test methods only differ in what they're actually asserting. - NodeHealthTrackerTests: extract a CreateTracker helper (and Self/Remote /Stale constants) so each test body only sets the bits unique to its scenario; promote StringNodeHashProvider to a singleton. - DiscoveryPersistenceManagerTests: shared NodeA / NodeB fixtures plus a CreateManager helper and a PingReceived assertion helper; drop the per-test ad-hoc NetworkNode arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ature/kademlia-discv4
|
That.... is probably a debug leftover |

INodeSourcefrom discv4 now actively iterate the kad network for random target instead of randomly find nodes on another thread and waiting for new nodes event to send toINodeSource.IAsyncEnumerableto block further request when peer manager is full.PeerManager.Changes
How to review.
DiscoveryApp, followDiscV4KademliaModule, and thenKademliaModule.KademliaDiscv4Adapter.Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing