legacy: createUtxos calls SetMinedMulti with unbounded slice — stalls aerospike on fat blocks (regression from #854)

## Summary

`legacy/netsync/handle_block.go:createUtxos` calls `SetMinedMulti` with one unbounded slice containing every pre-existing tx in the block. On fat blocks (mainnet 755,880 = 2.87M txs, almost all pre-existing via propagation) this becomes a monolithic 2.87M-record aerospike `BatchOperate` that exhausts the client connection pool, hits `MAX_RETRIES_EXCEEDED` with `NETWORK_ERROR` / `connection reset by peer`, and stalls sync in an infinite retry loop.

Regression introduced by #854 (`fix(legacy): merge blockID into pre-existing tx in createUtxos`, merged 2026-05-13).

## Reproduction

Mainnet sync past block 755,880 (2.3 GB, 2,867,288 txs) on a single-node aerospike `.docker.m` deployment.

Observed on `bsva-ovh-teranode-eu-3` running `v0.15.2-beta-1` (commit `f44080f06`, includes #929 arena fix).

Log signature, repeating every ~30s:

```
ERROR | netsync/handle_block.go:698 | legacy| [HandleBlockDirect][000000000000000002c365bec2f13cb3ba6334ebee5a0325201464e67b7fcecc 755880] 2867288 txs, peer ... DONE in 34.5s with error: PROCESSING (4): failed to merge blockID into 2867287 pre-existing txs -> STORAGE_ERROR (69): aerospike BatchOperate error -> UNKNOWN (0): ResultCode: MAX_RETRIES_EXCEEDED, Iteration: 5, InDoubt: true, Node: A1 172.18.0.5:3000: command execution timed out on client: Exceeded number of retries.
  ResultCode: NETWORK_ERROR, ...
  write tcp 172.18.0.10:53374->172.18.0.5:3000: write: connection reset by peer
```

Block download + decode succeeds (34s). The `SetMinedMulti` merge step is where it dies. After failure, sync resets to 755,879, downloads block again, repeats.

## Root cause

`services/legacy/netsync/handle_block.go:643-700` (post-#854):

```go
var (
    existingTxsMu    sync.Mutex
    existingTxHashes []*chainhash.Hash
)

// create all the utxos first
for _, txHash := range txMap.Keys() {           // iterates every tx in the block
    g.Go(func() error {
        if _, err := sm.utxoStore.Create(...); err != nil {
            if errors.Is(err, errors.ErrTxExists) {
                existingTxsMu.Lock()
                existingTxHashes = append(existingTxHashes, &txHash)   // accumulates
                existingTxsMu.Unlock()
                return nil
            }
            ...
        }
        ...
    })
}
g.Wait()

if len(existingTxHashes) > 0 {
    if _, err = sm.utxoStore.SetMinedMulti(ctx, existingTxHashes, utxo.MinedBlockInfo{...}); err != nil {   // monolithic call
        return errors.NewProcessingError("failed to merge blockID into %d pre-existing txs", len(existingTxHashes), err)
    }
}
```

`SetMinedMulti` itself (`stores/utxo/aerospike/set_mined.go:158`) does not chunk — it submits `len(hashes)` records in a single `executeBatchOperation`. The aerospike client splits internally at its default `BatchSize=5000`, producing ~574 sub-requests for 2.87M entries. With `ConnectionQueueSize=16` (current `utxostore.docker.m` URL setting) and `LimitConnectionsToQueueSize=true`, the connection pool saturates and sub-requests time out / reset → whole BatchOperate fails after 5 retries.

## Why the #854 reference pattern works in its original site

#854 mirrored `services/blockvalidation/quick_validate.go:1090-1160` `createAndSpendUTXOsForBatch`. That function is invoked **per-batch** (`batch *SubtreeProcessingBatch`), so `existingTxHashes` is naturally bounded by batch size — typically thousands at most, not millions. The legacy implementation re-uses the same `SetMinedMulti` call but lost the per-batch invocation boundary.

`stores/utxo/aerospike/longest_chain.go:51-53` already demonstrates the chunked pattern for the closely related `MarkTransactionsOnLongestChain` flow:

```go
batchSize := s.settings.UtxoStore.MaxMinedBatchSize             // 1024
numChunks := (len(txHashes) + batchSize - 1) / batchSize
numWorkers := min(s.settings.UtxoStore.MaxMinedRoutines, numChunks)   // 8 on docker.m
```

`createUtxos` should adopt this.

## Pre-#854 behaviour for context

Pre-v0.15 the async `setTxMinedStatus` → `SetMinedMulti` path handled this merge after block accept. PR #711 added a `quickValidation` fast path that skipped that step; PR #854 reinstated the merge but moved it into the synchronous critical-path `createUtxos` without chunking. So this hot path went from async-and-tolerated to synchronous-and-unbounded in one step.

## Proposed fix

Caller-side chunk in `createUtxos`. Use existing settings (`UtxoStore.MaxMinedBatchSize`, `UtxoStore.MaxMinedRoutines`). Roughly:

```go
if len(existingTxHashes) > 0 {
    batchSize := sm.settings.UtxoStore.MaxMinedBatchSize
    numWorkers := sm.settings.UtxoStore.MaxMinedRoutines

    g, gCtx := errgroup.WithContext(ctx)
    util.SafeSetLimit(g, numWorkers)

    for i := 0; i < len(existingTxHashes); i += batchSize {
        chunk := existingTxHashes[i:min(i+batchSize, len(existingTxHashes))]
        g.Go(func() error {
            _, err := sm.utxoStore.SetMinedMulti(gCtx, chunk, utxo.MinedBlockInfo{...})
            return err
        })
    }
    if err := g.Wait(); err != nil {
        return errors.NewProcessingError("failed to merge blockID into pre-existing txs", err)
    }
}
```

Alternative: push the chunking into `SetMinedMulti` itself, which fixes every caller (there are at least two: this one and the per-batch one in `quick_validate.go`). Trade-off — the per-batch caller doesn't need the chunking but wouldn't be harmed by it either.

## Affected hosts

- `bsva-ovh-teranode-eu-3` (mainnet sync, currently stuck on 755,880)

## Captured artifacts (local, available on request)

- Heap-raw + goroutines + allocs profile during the stall:
  - 942 goroutines, 916 aerospike-related
  - 70 TiB alloc churn over 2-day uptime (driven by retries)
  - 50% of alloc churn = `go-bt.Output.appendTo` + `Tx.toBytesHelper` (mitigated by #929 in v0.15.2)

## Workarounds while waiting for the fix

- Increase `ConnectionQueueSize` in `utxostore.docker.m` URL from 16 to 64+
- Add `BatchSize=1024` and `SocketTimeout=120s` to `aerospike_batchPolicy`
- Server-side: set `proto-fd-max 30000` and explicit `service-threads` in `config/aerospike.conf`

None of these eliminate the underlying monolithic batch; they just buy headroom.

## Related

- #854 — introduced this code path
- #711 — added the quickValidation fast path that bypassed the original async merge
- #929 — already-merged go-bt arena fix (eliminated ~50% of allocation churn on retries; helps recovery but not the stall)
- #920 — original go-bt heap issue (closed by #929)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

legacy: createUtxos calls SetMinedMulti with unbounded slice — stalls aerospike on fat blocks (regression from #854) #936

Summary

Reproduction

Root cause

Why the #854 reference pattern works in its original site

Pre-#854 behaviour for context

Proposed fix

Affected hosts

Captured artifacts (local, available on request)

Workarounds while waiting for the fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

legacy: createUtxos calls SetMinedMulti with unbounded slice — stalls aerospike on fat blocks (regression from #854) #936

Description

Summary

Reproduction

Root cause

Why the #854 reference pattern works in its original site

Pre-#854 behaviour for context

Proposed fix

Affected hosts

Captured artifacts (local, available on request)

Workarounds while waiting for the fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions