test: deterministic fixes for two flaky unit tests (#1024)#1026
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: Summary:
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-03 07:13 UTC |
ordishs
left a comment
There was a problem hiding this comment.
Approving — test-only change, well-reasoned, and verified independently:
go test -race -count=10on both tests: green- Mutation check (remove the
mergeCtx.Err()short-circuit): netsync test correctly fails withexpected 0, got 1, confirming it guards the regression - Diff scope is test-only; root causes (
Validator.go:723collapsing spend errors intoERR_PROCESSING, andsqlitememorylackingbusy_timeout/bounded pool) are appropriately flagged out of scope — please file those as follow-ups.
One thing to fix before merge — linting:
golangci-lint fails on the doc comment:
services/legacy/netsync/handle_block_test.go:1333:1: File is not properly formatted (gci)
The space-indented continuation lines in the Short-circuit intact: / Short-circuit removed: comment block (~lines 1330-1336) get reinterpreted by gofmt as code blocks. gofmt -w (or gofumpt) on that file resolves it — easiest fix is to let the formatter rewrite those lines, or de-indent the continuations so they read as plain prose. Once gofmt -l reports clean, CI lint will pass.
|



Closes #1024.
Two unit tests flaked under full-suite CI load; both pass in isolation. Test-only fix — no product code changes (
git difftouches only*_test.go).Root causes
services/validator—TestValidateTransactionBatch_DuplicateOutpointCreatesConflicting(expected 36, got 4)ValidateTransactionBatchvalidates the two txs concurrently; both spend the same outpoint through one sharedsqlitememorystore. ThesqlitememoryDSN (util/sql.go:218) sets nobusy_timeoutand leaves the pool unbounded, so under CI CPU starvation concurrent writes hit"database is locked"; the spend batch's 3 short retries exhaust →StorageError, and the validator collapses any non-UTXO spend error intoERR_PROCESSING(Validator.go:723) instead of the conflict path (ERR_TX_CONFLICTING). The#1017batcher attribution in the issue is at most a contributing timing factor — the root vulnerability predates it.services/legacy/netsync—TestSyncManager_createUtxos_ChunkFailureCancelsSiblings(4 not <= 1)Racy by construction, unrelated to
#1017(no batcher on this path).errgroupcancelsmergeCtxonly after the failing worker returns, so the surviving worker keeps passing its loop-topmergeCtx.Err()check and draining iterations until cancellation propagates. The product short-circuit is correct; the test's "≤1 in flight" assumption is not.Fixes (test-only)
utxoStore.RawDB().SetMaxOpenConns(1)— serialise the dev-mode SQLite pool so concurrent batch validations can't surface a transient lock. The concurrent batch path is still exercised; only DB access serialises. Production uses Aerospike.SetMinedMulticancels the parent ctx and returns success; because one worker runs sequentially, iteration 2's loop-top check always observes the cancel. Asserts zero post-trigger calls. Removing the short-circuit fails the count assertion.Out of scope (flagged follow-ups, not bundled)
Validator.go:723collapses transient/storage spend errors intoERR_PROCESSING, discarding conflict semantics — worth a separate issue.sqlitememoryhas nobusy_timeoutand an unbounded pool; other concurrent SQL-backed tests share the same latent flakiness.Test plan
go test -race -count=50each test individually — passmergeCtx.Err()short-circuit fails the test at the count assertion (expected 0, got 1)-race -count=80 -cpu=8(got 4); fix holds 360 iterationsgo test -race ./services/legacy/netsync/ ./services/validator/— both greengo vetclean