libnetwork/networkdb: add property-based tests#50345
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces property-based testing to the NetworkDB module and replaces the previous random-sample implementation with a uniform approach using math/rand/v2.
- Vendor updates: add
pgregory.net/rapidandgithub.com/montanaflynn/stats - Test enhancements: add
networkdb_property_test.go, augment existing helpers, and addTestMRandomNodes - Core changes: seed and store a
rand.RandinNetworkDB, removerandomOffset, and rewritemRandomNodes; add a debugging helperDebugDumpTable
Reviewed Changes
Copilot reviewed 8 out of 66 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| vendor.sum | Added checksums for new testing and stats libraries |
| vendor.mod | Required rapid and stats for property tests |
| libnetwork/networkdb/networkdb_test.go | Refactored helpers to use TestingT interface and dynamic timeout |
| libnetwork/networkdb/networkdb_property_test.go | New slow-test for convergence using rapid state machine |
| libnetwork/networkdb/networkdb.go | Initialized per-instance RNG and mutex |
| libnetwork/networkdb/debug.go | Added DebugDumpTable for introspection |
| libnetwork/networkdb/cluster_test.go | Added TestMRandomNodes to validate unbiased sampling |
| libnetwork/networkdb/cluster.go | Removed randomOffset, integrated rng for node selection |
Comments suppressed due to low confidence (3)
libnetwork/networkdb/networkdb_property_test.go:37
- Cannot range over integer
numNetworks; replace with a traditional loop such asfor i := 0; i < numNetworks; i++.
for i := range numNetworks {
libnetwork/networkdb/cluster_test.go:46
minis undefined; define a helper functionmin(a, b int) intor use an inline comparison to compute the minimum of two ints.
if len(sample) != min(m, len(nodes)-1) {
libnetwork/networkdb/cluster.go:9
- The imported alias
rndis unused in this file; remove the import to clean up dependencies.
rnd "math/rand"
77f33d3 to
e94053c
Compare
| google.golang.org/grpc v1.72.2 | ||
| google.golang.org/protobuf v1.36.6 | ||
| gotest.tools/v3 v3.5.2 | ||
| pgregory.net/rapid v1.2.0 |
There was a problem hiding this comment.
For others; this module has no dependencies, so doesn't impact our dependency-tree, other than what's used here; https://github.com/flyingmutant/rapid/blob/v1.2.0/go.mod
| github.com/moby/sys/user v0.4.0 | ||
| github.com/moby/sys/userns v0.1.0 | ||
| github.com/moby/term v0.5.2 | ||
| github.com/montanaflynn/stats v0.7.1 |
There was a problem hiding this comment.
Same for this one; https://github.com/montanaflynn/stats/blob/v0.7.1/go.mod#L1-L3
A well tested and comprehensive Golang statistics library package with no dependencies.
|
We need to look at this failure; wondering if there's a missing check whether the image was built successfully or smthing |
robmry
left a comment
There was a problem hiding this comment.
Nice!
If the test in the second commit fails without the changes in the third commit - they should probably be swapped?
The convergence test was a bit of a slog to get through, while trying to understand the testing package at the same time. Perhaps just me, but a few more comments probably wouldn't go amiss.
My rationale for committing the test first was to make it more straightforward to test both the old and new implementations. The second commit justifies the existence for the third commit. It lets the test itself be tested, in the vein of TDD: write the test, verify it fails, fix the implementation, verify the test passes. I think the benefits outweigh the downside of maybe having to skip over the commit when bisecting, particularly since when bisecting you generally will be picking a targeted subset of test cases to run, not entire suites. |
I think the way to deal with that is for the test to "xfail" (maybe skip instead of fail) when it's added. Then the commit that fixes the issue should also change the expected test result. That deals with the bisecting issue you mention - and makes it very clear what the fix fixes. (I don't know which part of the test fails at the moment.) |
I'll see what I can do. Go's testing package does not have any facility to flag a test as expected-to-fail, and even if it did it would not be appropriate. The test will sometimes pass with the bad implementation, because randomness and the implementation is just of low quality, not completely unfit for purpose. I could call
I documented that in the commit message. |
You documented that it can fail, but not where ... I doubt test
It's probably not worthwhile, that's why I suggested just swapping the commits. Not a blocker I guess, just not really good practice and it had a simple fix. For my comment about more comments in the convergence test - I was thinking of things like this (just to save a bit of head-scratching for anyone not familiar with the package, which is probably most people for now - and, as convergence is checked after the calls rather than as the "state machine" is updated, it's probably a bit unconventional even for those who are familiar) ... (I meant to ask about method |
e94053c to
256a1f6
Compare
I thought it would be self-evident from my description of the failure mode that it's the assertion of the sample size that failed. moby/libnetwork/networkdb/cluster_test.go Lines 49 to 51 in 256a1f6
It's to satisfy the rapid.StateMachine interface. I have added a comment to that effect, as well. |
256a1f6 to
7c0ff04
Compare
Add a property-based test which asserts that a cluster of NetworkDB nodes always eventually converges to a consistent state. As this test takes a long time to run it is build-tagged to be excluded from CI. Signed-off-by: Cory Snider <csnider@mirantis.com>
TestNetworkDBAlwaysConverges will occasionally find a failure where one entry is missing on one node even after waiting a full five minutes. One possible explanation is that the selection of nodes to gossip with is biased in some way. Test that the mRandomNodes function picks a uniformly distributed sample of node IDs of sufficient length. The new test reveals that mRandomNodes may sometimes pick out a sample of fewer than m nodes even when the number of nodes to pick from (excluding the local node) is >= m. Put the test behind an xfail tag so it is opt-in to run, without interfering with CI or bisecting. Signed-off-by: Cory Snider <csnider@mirantis.com>
The property test for the mRandomNodes function revealed that it may sometimes pick out a sample of fewer than m nodes even when the number of nodes to pick from (excluding the local node) is >= m. Rewrite it using a random shuffle or permutation so that it always picks a uniformly-distributed sample of the requested size whenever the population is large enough. Signed-off-by: Cory Snider <csnider@mirantis.com>
7c0ff04 to
ac5f464
Compare
vvoland
left a comment
There was a problem hiding this comment.
LGTM, I really like the property-based testing and it's always nice to have some 👍🏻
- What I did
I added property-based tests to NetworkDB to find any lingering issues before users encounter them. Testing revealed that the algorithm used to select a random sample of nodes to gossip with does not uphold the expected properties. I replaced the random-sample implementation with one that passes the tests.
- How I did it
I selected pgregory.net/rapid as the property-based testing library to adopt. I wrote a test which asserts that a cluster of NetworkDB instances always converges to the expected state after a series of joins, leaves and table updates. I also wrote a test for the
mRandomNodesfunction to try to narrow down the source of failures with the convergence test. I rewrote its implementation, leveraging themath/rand/v2package for unbiased uniform random shuffle, permutation and value-in-range algorithms.As the convergence test is slow, I build-tagged it so it does not run in CI. The
mRandomNodestest only takes a couple seconds to complete the default number of iterations so is left untagged.- How to verify it
go test -tags slowtests- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)