Flaky unit tests under full-suite CI load: validator DuplicateOutpoint + netsync ChunkFailureCancelsSiblings

## Summary

Two unit tests fail intermittently in the `test` CI job under full-suite load, but pass deterministically in isolation (including `-race -count=20` locally). Both assert on batch/iteration counts that appear sensitive to batcher flush timing, which changed recently in #1017 (per-batcher fixed-cadence flushing / `SetTickInterval`, commit `bbb70638b`).

These are **flaky**, not deterministically broken — a re-run of the same commit passed green.

## Failing tests

1. `services/validator` — `TestValidateTransactionBatch_DuplicateOutpointCreatesConflicting`
   `Validator_test.go:395`: `Not equal: expected: 36, actual: 4`

2. `services/legacy/netsync` — `TestSyncManager_createUtxos_ChunkFailureCancelsSiblings`
   `handle_block_test.go:1384`: `"4" is not less than or equal to "1"` —
   "mergeCtx short-circuit should suppress sibling iterations after a chunk fails; observed 4 post-trigger call(s)."

## Where observed

CI `test` job on PR #1023 — run [26839805772](https://github.com/bsv-blockchain/teranode/actions/runs/26839805772) (`DONE 10305 tests, 45 skipped, 2 failures`). PR #1023 does not touch `services/validator` or `services/legacy/netsync`, and a re-run of the identical commit passed — so the failure is not attributable to that PR.

## Reproduction attempts (local)

Both pass in isolation, single run and stressed:

```
go test ./services/validator/ -run '^TestValidateTransactionBatch_DuplicateOutpointCreatesConflicting$' -count=20        # ok
go test ./services/legacy/netsync/ -run '^TestSyncManager_createUtxos_ChunkFailureCancelsSiblings$' -count=20 -race       # ok
```

The flake only surfaces under the CI runner's concurrent full-suite load, which is consistent with timing/scheduling sensitivity rather than a logic bug.

## Suspected cause

Both assertions count emitted/observed items:
- validator expects 36 conflicting registrations but sees 4 — looks like a batch flushed early (fewer items grouped) so most conflicts weren't observed together.
- netsync expects ≤1 post-trigger sibling iteration but sees 4 — the short-circuit raced the in-flight batch.

#1017 changed batcher flushing to a fixed cadence (`SetTickInterval`). A timing-driven flush boundary would plausibly change how many items land per batch under load, perturbing both count assertions. Worth confirming whether these tests pin the batcher tick / use a deterministic flush trigger rather than relying on wall-clock cadence.

## Suggested fix direction

Make the two tests deterministic w.r.t. batch flushing — e.g. drive flushes explicitly (size-1 / manual flush / injected clock) instead of depending on the timer cadence, so they don't depend on CI load. Not a release blocker; it's test flakiness.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky unit tests under full-suite CI load: validator DuplicateOutpoint + netsync ChunkFailureCancelsSiblings #1024

Summary

Failing tests

Where observed

Reproduction attempts (local)

Suspected cause

Suggested fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky unit tests under full-suite CI load: validator DuplicateOutpoint + netsync ChunkFailureCancelsSiblings #1024

Description

Summary

Failing tests

Where observed

Reproduction attempts (local)

Suspected cause

Suggested fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions