fix(utxo): verify spend ownership in Unspend#766
Conversation
|
🤖 Claude Code Review Status: Complete SummaryThis PR implements a critical security fix for UTXO spend ownership verification in the Unspend operation. The implementation prevents a scenario where stale or incorrect spend records could silently wipe legitimate spends, potentially leading to UTXO corruption or double-spend vulnerabilities. Key Changes:
Security Impact: High-severity fix — closes #4562 Issues Found[Minor] Unused Error Type Notes
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-15 10:42 UTC |
…ests (refs #4562)
SQL Unspend collapsed 'row missing', 'UTXO unspent', and 'ownership mismatch' into a single ProcessingError, while Aerospike's Lua already distinguished the three. Callers couldn't tell a genuinely- unknown output from a mismatched spend. - SQL: a diagnostic SELECT after a 0-row UPDATE dispatches the three cases into NotFoundError / 'UTXO is unspent' / 'ownership mismatch'. - Aerospike: map LuaErrorCodeTxNotFound to NotFoundError so both backends return the same Go error type for missing records. Adds TestUnspendOwnership_NotFound on both backends; tightens the soft 'might not error' assertion in TestSpendAndUnspendEdgeCases and the message check in TestUnspendOwnership_MismatchOnUnspent.
…562) Unspend's "row exists, spending_data is nil" branch is functionally idempotent — the caller wants the row cleared and it already is. Returning a generic ProcessingError amplified a successful-but-response- lost Unspend into a hard failure inside validator.reverseSpends' 3-retry loop. - New typed ErrUtxoUnspent (errors/error.proto code 76) so callers detect this branch via errors.Is without parsing messages. - Aerospike: split Lua dispatch into ERROR_CODE_SPEND_OWNERSHIP_UNSPENT and _MISMATCH, mapped to ErrUtxoUnspent and ProcessingError in Go. - SQL: diagnostic-SELECT case-2 returns ErrUtxoUnspent. - validator.reverseSpends short-circuits on errors.Is(err, ErrUtxoUnspent) — exactly-once behaviour covered by new TestValidator_ReverseSpends_UtxoUnspentIsIdempotent. - Aerospike TestUnspendOwnership_MismatchOnUnspent fills the parity gap with the SQL test for that branch. - qDiag race window documented in sql.go — best-effort message; safety guarantee lives in q1's AND spending_data = $3 condition.
The Lua script (teranode.lua) changed in this PR — both the ownership check in unspend and the split into distinct SPEND_OWNERSHIP_UNSPENT / _MISMATCH error codes. Bump the package identifier so Aerospike re- registers the UDF on next startup instead of running the previously cached v57 bytecode.
oskarszoon
left a comment
There was a problem hiding this comment.
Approve. Caller audit is complete — all 9 production callers populate SpendingData via GetSpends / SetConflicting / spendsForTx, the SQL and Lua backends have matching three-branch diagnostic semantics (NotFound / unspent / ownership-mismatch), and the validator's reverseSpends correctly treats ErrUtxoUnspent as idempotent-success for retry safety. DoS surface is net-positive (visible failures replace silent corruption). No new state-changing path was added.
Sonar quality gate now passes (97.5% new-code coverage, 0 hotspots) and the previously-failing sequential-postgres / sequential-aerospike checks are re-running — assuming they go green, this is mergeable as-is.
One optional follow-up (non-gating): the SQL q1+qDiag race window is acknowledged inline and the safety guarantee correctly comes from q1's AND spending_data = $3 atomicity — worst case is a stale error message, not a wrong action. Could later be tightened by combining into a single SELECT spending_data FROM outputs ... FOR UPDATE + branch. Optimisation, not correctness.
ProcessConflicting builds affectedParentSpends with the loser's SpendingData, but the parent's stored spending_data is often nil (loser never actually spent — winner did) or belongs to the winner (post-reorg). The strict ownership check from the previous commits erred on these cases, breaking sequential-postgres and sequential-aerospike conflict-resolution tests (double-spend forks, grandparent conflicts, deep chain conflicts). The safety property is "never wipe a spend we don't own", not "error on every no-op". This change keeps the match-on-spending_data guard but makes the non-matching paths a silent success — the stored value is unchanged either way. - Aerospike Lua: gate the utxo clear on callerOwnsSpend; return STATUS_OK without bin mutation when stored is nil or differs. setDeleteAtHeight + aerospike:update still run for housekeeping. - SQL: q1's AND spending_data = $3 still gates the clear. On 0 rows, look up transaction_id directly so the locked + DAH housekeeping still runs. Only a missing row is NotFoundError. - Bump LuaPackage v58 → v59 (Lua content changed). - Drop now-dead validator.reverseSpends ErrUtxoUnspent short-circuit — Unspend never returns that error anymore. - Drop the now-unused LuaErrorCodeSpendOwnership* constants and the matching ERROR_CODE / ERR_ locals in the Lua module. - Rewrite TestUnspendOwnership_Mismatch and _MismatchOnUnspent on both backends to assert no error + stored data unchanged. Remove the now-redundant TestValidator_ReverseSpends_UtxoUnspentIsIdempotent. ErrUtxoUnspent / NewUtxoUnspentError / proto code UTXO_UNSPENT=76 are kept as unused-but-available API; no caller currently emits them.
|
Brings in fff6d3e..origin/main (4 commits since previous merge): - #897 blockassembly columnar->row fallback on Unimplemented - #896 asset on-demand subtree-data admission control + 503/Retry-After - #766 utxo: verify spend ownership in Unspend - #700 health: disk space check Conflict resolutions: - services/blockassembly/Client.go, util/http.go, util/http_test.go: take theirs. Main has the post-Copilot-fix versions of these files from #896 and #897 (parseRetryAfter strict-Atoi, debug log on columnar fallback, extended TestParseRetryAfter cases) — they supersede the pre-fix versions still living on pr-828. - stores/utxo/aerospike/un_spend.go: take ours (native-op call, no SpendingData arg). The ownership check #766 added to the UDF path requires a coordinated update to the BSV-forked aerospike- server's subOpUnspend dispatcher. Until that lands, the native-op path on this branch skips ownership verification. Tracking issue filed.
Two related items from the external security review:
1. The aerospike_use_native_teranode_ops longdesc previously said "Safe
to enable on mixed clusters: capability probe falls back to UDF when
the fork is absent". That's misleading — the probe writes a single
key whose digest pins it to one partition, so a mixed-version
cluster can produce false-positive (probe lands on a BSV-fork
master, runtime writes to partitions still on stock Aerospike fail
with PARAMETER_ERROR) or false-negative (cached for Store lifetime
even after the cluster finishes upgrading). Rewrite the rolling-
deploy section to be explicit, drop the "Safe to enable" claim, and
tell operators to keep the setting false until the rolling upgrade
completes.
2. Probe key was a fixed string ("_teranode-native-op-probe") shared
across every teranode pointing at the same cluster. Two instances
booting concurrently could interleave PutBins / Operate / Get
sequences and observe each other's writes, producing a false-
negative for one of them. Add PID + time.Now().UnixNano() to the
key so every process gets its own; the 60s TTL still bounds the
record's residency if Delete fails.
Follow-ups still tracked in the PR body: per-call PARAMETER_ERROR ->
UDF fallback in teranodeBatchRecord (separate PR), and the un_spend.go
SpendingData ownership check landed for the UDF path in #766 but is
not yet wired into subOpUnspend on the native-op side — issue #899.
…ranch The merge from main (#766: verify spend ownership in Unspend) needed two catch-up changes on this branch's native-op-migrated unspendLua: 1. executeTeranodeOp now receives spend.SpendingData.Bytes() as a third arg + a nil-guard. The UDF fallback path (useNativeTeranodeOps=false, default) forwards the arg verbatim into client.Execute, so the ownership check the Lua "unspend" added in #766 works end-to-end. The native operate-path forwards the arg over the wire too, but the BSV-forked server's subOpUnspend dispatcher must be updated before the check is actually applied server-side — tracked in #899. 2. unspendLua's response-handling branch for LuaStatusError now distinguishes res.ErrorCode == LuaErrorCodeTxNotFound and returns errors.NewNotFoundError. Without this branch, #766's TestUnspendOwnership_NotFound saw the Lua-side "TX not found" response wrapped as StorageError. The merge conflict landed at the call site only; this response-handling change from main slipped past.
The native subOpUnspend=3 dispatcher in the BSV fork of aerospike-server now enforces the #766 SpendingData ownership check. Update the stale un_spend.go comment that said the dispatcher still needed the fix, document the minimum server-build requirement in the UseNativeTeranodeOps longdesc, and drop the private-repo reference from the native_op.go header comment.



Closes #4562. Also closes #4589 (the audit explicitly cross-references — same SQL
AND spending_data = $3fix).Summary
Both Aerospike and SQL
Unspendclearspending_databased only on(TxID, Vout, UTXOHash)— without verifying the caller's expected spending data matches what's stored. If any caller (reorg, conflict resolution, batcher) passes a wrong/stale Spend record, the legitimate spend is silently wiped, opening a path to UTXO corruption / double-spend.Fix
stores/utxo/sql/sql.go): extend theq1UPDATE withAND spending_data = $3. When 0 rows are updated, run a diagnosticSELECT spending_datato distinguish three cases (matches the three branches instores/utxo/aerospike/teranode.lua::unspend):NotFoundError("output %s:%d not found").spending_data IS NULL(UTXO already unspent) →ProcessingError("expected spending data but UTXO is unspent").spending_datadiffers →ProcessingError("ownership mismatch — caller's SpendingData does not match stored spend").stores/utxo/aerospike/teranode.lua,un_spend.go): addexpectedSpendingDataparameter to theunspendLua function and a byte-equality check against the stored value. The Go layer maps Lua error codes to typed Go errors:ERROR_CODE_TX_NOT_FOUND→NotFoundError(parity with SQL).ERROR_CODE_SPEND_OWNERSHIP_MISMATCH(withERR_SPEND_OWNERSHIP_UNSPENTorERR_SPEND_OWNERSHIP_MISMATCHmessage) →ProcessingError.SpendingDatais mandatory: both backends rejectnilwithProcessingErrorbefore reaching the database. No escape hatch — every production caller derives spends fromSpend()/SetConflicting()/GetSpends(), all of which populate it (audit below).Caller audit
All production callers of
Unspendalready passSpendrecords withSpendingDatapopulated:ProcessConflicting→MarkConflictingRecursively→SetConflictingpopulates viaspendpkg.NewSpendingData(tx.TxIDChainHash(), i)(aerospikeconflicting.go:109, sqlsql.go:4017).Validator.reverseSpends, the SQL/AerospikeSpendrollback paths → all consumeSpend()results, which are built fromGetSpends(tx)(stores/utxo/utils.go:184-189).A handful of test-only callers were silently relying on the buggy behaviour by passing mismatched SpendingData. They've been corrected to pass the SpendingData that
GetSpendsactually produces:stores/utxo/tests/tests.go::Restorestores/utxo/sql/sql_test.go::TestUnspend,TestTombstoneAfterSpendAndUnspend,TestSpendAndUnspendEdgeCasesstores/utxo/aerospike/aerospike_server_test.go::TestAerospike/aerospike_reset,TestAerospike/LUAScriptsTest plan
sql_unspend_ownership_test.go):TestUnspendOwnership_HappyPath/_Mismatch/_NotFound/_MismatchOnUnspent/_NilSpendingDataRejected— exercise all three dispatch branches plus the nil guard, and assert that mismatched/rejected calls leavespending_dataunchanged.unspend_ownership_test.go):TestUnspendOwnership_HappyPath/_Mismatch/_NotFound/_NilSpendingDataRejected— same branches as SQL, plus parity for theERROR_CODE_TX_NOT_FOUND→NotFoundErrormapping.ProcessConflictingtests still pass.go test -race ./stores/utxo/sql/...,go test -race ./stores/utxo/,go test -race ./services/validator/all green.