test(e2e): replace hardcoded time.Sleep with condition-polling in gated tests (#998)#1010
Conversation
|
🤖 Claude Code Review Status: Complete This PR correctly replaces fixed No issues found. The PR description accurately documents changes, verification steps, and out-of-scope items. The intentional sleeps (ban TTL expiration, inter-block timestamp pacing) are correctly preserved with explanatory comments. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-02 06:49 UTC |
ordishs
left a comment
There was a problem hiding this comment.
Approve. Clean, test-only change replacing sleep-then-assert anti-patterns with poll-until-condition.
Verified:
go vetclean on both changed packages; no unusedtimeimports introduced.WaitForPruneris a genuine completion barrier here — both pruner tests setEnablePruner: true, which registers the observer, so the nil-observer no-op path can't be silently hit.require.Eventuallyconversions are sound (captured-error checked in predicate, value preserved for later assertions).
Good judgment on what to leave alone (real ban-TTL / timestamp-pacing sleeps) and honest scoping of skipped/wip tests.
Minor (non-blocking) follow-ups:
block_persisterlost the per-block height/hash diagnostic logging on failure — consider restoring it so a timeout shows which blocks lagged.- Port the
:204'waits for a prune cycle, not provably the child1-DAH cycle' caveat into a code comment, not just the PR description.
|



What
Replace hardcoded
time.Sleep"sleep-then-assert" waits with poll-until-condition (or delete redundant sleeps) across the running gated e2e tests. Cuts serial wall-clock (the gated suites run-parallel 1) and removes a flakiness source. Test-only — no production code, CI, or Makefile changes.Closes #998.
Changes
daemon.NewTestDaemonalready blocks on readiness before returning (readyCh30s +WaitForHealthLiveness10s + FSMRUNNING); the first client/RPC call self-errors if not ready. Kept thebanDuration + 1sban-TTL wait — genuine wall-clock, polling can't shorten it.require.EventuallypollingGetTransactionHashesforlen == 2.TestDeleteAtHeightHappyPath) — deleted a 10s sleep that guarded commented-out assertions (//require.Error,// require.Nil).WaitForTransactionInBlockAssembly(already polls).require.EventuallypollingGetBlocksNotPersistedempty (15s/500ms).:204→WaitForPruner(real completion barrier:EnablePruner: trueregisters the observer);:362→WaitForTransactionInBlockAssembly.Verified
go build+go vetclean acrosstest/e2e/daemon/ready/...andtest/sequentialtest/block_assembly/...;gofmtclean on all 7 files.-race, see note): banlist (3×), multi_node_send_2_tx, utxo, block_assembly_restart, block_persister (both), pruner_parent (both) — all PASS. Per-file wall-clock dropped: banlist ~27s/run, multi_node ~9s, utxo ~10s, block_assembly_restart ~6s.legacy-syncgate.Out of scope (left intentionally — no benefit)
t.Skip'd tests (TestDynamicSubtreeSize,TestTracing,TestSendTxDeleteParentResendTx,TestTransactionPurgeAndSyncConflicting). They never run, so converting saves zero wall-clock and can't be run-verified. The issue's ~99s estimate counted ~17s here that doesn't execute.Notes
-racecurrently fails on a pre-existing libp2p NAT data race (go-libp2pnatpmp.randomPort), unrelated to this change — fails identically before and after on these commits.pruner_parent_before_childhas a latentWaitForBlockPersistedtimeout under container contention (~line 390), pre-existing (akin to Flaky: services/blockassembly system tests (FSM-startup / gRPC connection races under CI load) #997).WaitForPrunerat:204waits for a completed prune cycle, not provably the child1-DAH-height cycle — same property as the sleep it replaced. The:204assertions only check transactions that must survive, so an early return can't mask a false negative; the same helper backs the stable:392"parent was pruned" assertion.