Skip to content

Ensure atomic file store publication#826

Merged
icellan merged 3 commits into
mainfrom
fix/file-store-atomic-publication
May 15, 2026
Merged

Ensure atomic file store publication#826
icellan merged 3 commits into
mainfrom
fix/file-store-atomic-publication

Conversation

@icellan

@icellan icellan commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the file-backed blob store so final blob and checksum filenames are not exposed until their contents have been fully written, synced, closed, and atomically renamed into place.

This addresses production observations of partly written files appearing under their final filenames.

Changes

  • Writes blob data to a sibling temporary file before publishing the final filename.
  • Syncs and closes the temporary blob file before renaming it into place.
  • Writes checksum sidecar files using the same temporary-file publication pattern.
  • Removes stale file-local .dah helper code and tests; blob delete-at-height behavior remains in the scheduler/pruner flow.
  • Adds focused tests that verify:
    • the final blob filename is not visible while a streaming write is still open;
    • a checksum publication failure removes the just-published blob;
    • temporary files are cleaned up on failed writes.
  • Adds detailed documentation for the atomic publication helper functions and their durability limits.

Branch context

This branch has been rebased onto main and the PR targets main. Since the branch previously sat on feat/teranode-native-ops, this PR also includes the native-Aerospike commit stack from that base branch in addition to the file-store commit.

Notes

  • The helper functions use same-directory temporary files so os.Rename publishes within one filesystem.
  • Parent directory sync is best-effort because support varies by platform and filesystem.
  • The file mode intentionally preserves the existing blob-store behavior.
  • Public SetDAH remains available for current block assembly and block persister call sites, but the file store implementation schedules or cancels deletion through the blob deletion scheduler. The pruner service owns deletion execution.

Verification

Passed after rebasing onto main:

  • go test -count=1 ./stores/blob/file
  • go test -race -count=1 ./stores/blob/file
  • go test -count=1 ./stores/blob/...
  • go vet ./stores/blob/...
  • git diff --check

Previously passed before the rebase:

  • commit hooks

Repository-wide checks were also attempted earlier. go test ./... and go vet ./... are not green in this local environment due to unrelated existing or environment-heavy failures, including version-format validation, txmetacache timeout, toxiproxy/Docker/Postgres/SV-node/Kafka/e2e failures, and existing copy-lock warnings in test utilities.

@icellan icellan requested review from oskarszoon and teracoder May 7, 2026 15:39
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

This PR implements atomic file publication for the blob store to prevent partially-written files from being visible. The implementation is well-designed and thoroughly tested.

Key Changes:

  • Introduces temp-file-then-rename pattern for blob and checksum files
  • Adds configurable fsync behavior (full/data/none modes)
  • Uses os.Root API for path sandboxing (addresses CodeQL security alerts)
  • Removes stale DAH helper code (now handled by pruner service)
  • Comprehensive test coverage of atomic publication guarantees

Observations:

The code quality is strong with excellent documentation. The PR author has:

  • Documented the durability semantics of each fsync mode
  • Explained cross-platform rename behavior differences
  • Added focused tests for atomicity, error cleanup, and concurrent writers
  • Addressed CodeQL security concerns by routing operations through os.Root

No blocking issues identified. The implementation follows Go best practices and maintains backward compatibility while fixing the production issue of partial file exposure.

Note: CodeQL alerts visible in the PR comments appear to be addressed by the os.Root usage throughout the new helper methods (storeRelPath, openStoreRoot, removeStorePath). These ensure paths cannot escape the store root via symlinks or path traversal.

Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
Comment thread stores/blob/file/file.go Fixed
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-826 (e2d6863)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.677µ 1.729µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.58n 61.59n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.58n 61.56n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.49n 61.65n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.22n 31.43n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 52.15n 51.71n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 109.3n 107.3n ~ 0.100
MiningCandidate_Stringify_Short-4 264.0n 262.1n ~ 0.100
MiningCandidate_Stringify_Long-4 1.889µ 1.896µ ~ 0.300
MiningSolution_Stringify-4 982.4n 970.6n ~ 0.100
BlockInfo_MarshalJSON-4 1.766µ 1.753µ ~ 0.100
NewFromBytes-4 131.9n 130.5n ~ 0.700
Mine_EasyDifficulty-4 60.47µ 59.96µ ~ 0.700
Mine_WithAddress-4 6.665µ 6.691µ ~ 0.700
BlockAssembler_AddTx-4 0.02811n 0.02915n ~ 0.400
AddNode-4 10.62 10.87 ~ 0.700
AddNodeWithMap-4 10.90 10.83 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 63.09n 63.65n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.58n 31.78n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 30.40n 30.66n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.15n 29.49n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 28.58n 28.96n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 281.7n 285.4n ~ 0.200
SubtreeProcessorAdd/64_per_subtree-4 276.7n 279.4n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 277.6n 280.0n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 273.2n 272.6n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 272.6n 276.1n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 276.5n 281.5n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 273.8n 278.9n ~ 0.200
SubtreeProcessorRotate/256_per_subtree-4 273.6n 282.5n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 274.6n 278.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.43n 55.26n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 34.37n 34.59n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.42n 33.45n ~ 0.200
SubtreeNodeAddOnly/1024_per_subtree-4 33.32n 32.87n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 113.7n 114.4n ~ 0.900
SubtreeCreationOnly/64_per_subtree-4 418.6n 417.6n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.404µ 1.489µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.432µ 4.475µ ~ 0.200
SubtreeCreationOnly/2048_per_subtree-4 8.359µ 8.339µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 273.0n 275.9n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 271.1n 275.8n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 601.2µ 816.4µ ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 1.352m 1.359m ~ 0.400
ParallelGetAndSetIfNotExists/50k_nodes-4 6.721m 6.659m ~ 0.700
ParallelGetAndSetIfNotExists/100k_nodes-4 13.55m 13.48m ~ 0.200
SequentialGetAndSetIfNotExists/1k_nodes-4 673.4µ 678.4µ ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 3.019m 2.866m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 11.28m 10.43m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 21.08m 20.06m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 667.9µ 656.9µ ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.312m 4.337m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.85m 16.90m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 694.5µ 716.5µ ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.146m 6.054m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 41.46m 38.99m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.964µ 3.935µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.643µ 3.800µ ~ 0.200
DiskTxMap_ExistenceOnly-4 338.1n 438.5n ~ 0.100
Queue-4 197.2n 202.0n ~ 0.100
AtomicPointer-4 3.694n 3.685n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 869.2µ 853.4µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 825.5µ 851.8µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 114.0µ 114.4µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 64.39µ 64.45µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 65.97µ 60.46µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.01µ 11.04µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/10K-4 5.450µ 5.495µ ~ 1.000
ReorgOptimizations/NodeFlags/New/10K-4 1.836µ 1.769µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.90m 11.91m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.46m 11.08m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.185m 1.130m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 705.2µ 705.8µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 645.8µ 569.8µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 202.9µ 204.7µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 53.37µ 55.72µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 18.15µ 19.39µ ~ 0.200
TxMapSetIfNotExists-4 46.43n 46.58n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 38.69n 38.67n ~ 1.000
ChannelSendReceive-4 612.3n 587.0n ~ 0.700
CalcBlockWork-4 464.2n 467.8n ~ 0.100
CalculateWork-4 666.8n 619.8n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.302µ 1.428µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 15.69µ 13.59µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 139.0µ 126.8µ ~ 0.700
CatchupWithHeaderCache-4 104.8m 104.9m ~ 1.000
_BufferPoolAllocation/16KB-4 3.430µ 3.484µ ~ 1.000
_BufferPoolAllocation/32KB-4 8.404µ 7.614µ ~ 0.100
_BufferPoolAllocation/64KB-4 15.76µ 17.27µ ~ 0.500
_BufferPoolAllocation/128KB-4 28.32µ 30.00µ ~ 0.100
_BufferPoolAllocation/512KB-4 119.1µ 109.6µ ~ 0.100
_BufferPoolConcurrent/32KB-4 19.65µ 19.40µ ~ 1.000
_BufferPoolConcurrent/64KB-4 28.17µ 28.43µ ~ 0.400
_BufferPoolConcurrent/512KB-4 146.6µ 147.5µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/16KB-4 657.4µ 661.5µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/32KB-4 633.8µ 614.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 634.4µ 641.0µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/128KB-4 632.7µ 640.3µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/512KB-4 659.9µ 659.3µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.45m 35.97m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.36m 35.66m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.16m 35.71m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.34m 35.62m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.06m 35.36m ~ 0.400
_PooledVsNonPooled/Pooled-4 739.6n 741.2n ~ 0.700
_PooledVsNonPooled/NonPooled-4 7.011µ 7.082µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.850µ 6.747µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.44µ 11.15µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.689µ 10.115µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.368m 1.358m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 327.2µ 321.0µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 79.12µ 76.84µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.62µ 19.28µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.754µ 9.481µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.839µ 4.688µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.407µ 2.355µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 77.63µ 75.23µ ~ 0.200
BlockSizeScaling/10k_tx_256_per_subtree-4 19.65µ 18.86µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.896µ 4.719µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 403.0µ 390.7µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 95.51µ 93.31µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.96µ 23.17µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 164.0µ 154.5µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 169.6µ 164.1µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 335.0µ 324.9µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.807µ 9.141µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 10.175µ 9.809µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.86µ 18.93µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.355µ 2.172µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.494µ 2.408µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.928µ 4.770µ ~ 0.100
_prepareTxsPerLevel-4 402.8m 416.2m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.899m 3.910m ~ 1.000
_prepareTxsPerLevel_Comparison/Original-4 413.2m 410.0m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 3.606m 4.091m ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 335.5µ 336.9µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 321.2µ 343.2µ ~ 0.100
GetUtxoHashes-4 255.4n 257.5n ~ 0.400
GetUtxoHashes_ManyOutputs-4 43.71µ 44.34µ ~ 0.100
_NewMetaDataFromBytes-4 237.5n 236.5n ~ 0.200
_Bytes-4 609.7n 619.4n ~ 0.200
_MetaBytes-4 560.6n 560.2n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-05-13 14:39 UTC

@icellan icellan self-assigned this May 7, 2026
@icellan icellan requested review from ordishs and removed request for teracoder May 13, 2026 10:00
@ordishs

ordishs commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Review

Overview

The stated goal — atomic publication of blobs and checksum sidecars in the file store — is solid, and the implementation is well thought through. New helpers (createTempSibling, syncAndCloseTempFile, renameTempFile, writeFileAtomically, syncParentDirBestEffort) implement temp-write → fsync → close → rename → dir-sync correctly, with os.Root containment and O_EXCL for safety.

However the diff is 66 files, ~1965+/-695, of which only 2 commits (~370 lines under stores/blob/file/) match the PR title. The other 7 commits are the feat/teranode-native-ops aerospike stack pulled in by the rebase.


🔴 Major: Scope / Reviewability

The PR description acknowledges the bundling, but acknowledgement does not fix the reviewability problem:

  • dbfc613a + ac99a445 — file store atomic publication (the stated goal)
  • b9685d5d, 8440a845, 1f4ee916, a67035f4, 7448f018, a8f3b513 — aerospike native operate-path, BSV fork import switch, diagnostics. Touches spend.go (445+/125-), alert_system.go (112+/62-), locked.go (101+/45-), set_mined.go, plus new native_op.go (296 lines) and batch_diagnostics.go (124 lines).

Recommendation: land the aerospike commits in their own PR against main first, then rebase this PR so it contains only the file-store commits. Bisection and revert get messy otherwise — if the aerospike cutover regresses, you cannot revert it without losing the file-store fix, and vice versa. This also violates the Small Diff Rule in AGENTS.md.


File Store Changes — In-Scope Review

Strengths

  • os.Root (Go 1.24+) + filepath.IsLocal containment rejects path-traversal and TOCTOU symlink attacks — a real security improvement over the previous raw-path os.Rename/os.Remove.
  • O_EXCL on temp creation prevents clobber; the bounded 10-attempt retry on collision is sensible given a 63-bit random suffix.
  • Hidden temp filenames (.basename.<rand>.tmp) keep operational ls output clean during writes.
  • Hash-write failure now rolls back the blob via removeStorePath(filename) — closes a real consistency hole where a blob could exist without its checksum sidecar.
  • syncParentDirBestEffort is appropriately documented as best-effort and does not fail the operation.
  • Removing the dead readDAHFromFile / writeDAHToFile / loadDAHs helpers is good — SetDAH already routes through blobDeletionScheduler.
  • Doc comments on the new helpers call out durability boundaries and the same-directory rename requirement explicitly. Useful.

Concerns

1. renameTempFile ErrBlobAlreadyExists branch is effectively dead on POSIX.

if err := root.Rename(tmpRel, finalRel); err != nil {
    if fileInfo, statErr := root.Stat(finalRel); statErr == nil && !fileInfo.IsDir() {
        return errors.NewBlobAlreadyExistsError(...)
    }
    return errors.NewStorageError(...)
}

POSIX rename(2) atomically replaces the destination — it does not error if the destination exists. So this branch only fires on Windows or non-POSIX filesystems. The original code had the same dead branch, but the change in error semantics (warn-and-continue → returned error) is now visible on Windows and any non-POSIX target. The real concurrent-writer guard is errorOnOverwrite earlier in SetFromReader, and that remains a TOCTOU race this PR does not address.

2. AllowOverwrite interaction.

If a caller sets AllowOverwrite=true and runs on a platform where Rename fails on existing destinations, they would now get ErrBlobAlreadyExistsError rather than the overwrite they asked for. Worth either verifying every supported target or short-circuiting the existence check when AllowOverwrite is true.

3. Minor: big.NewInt(1<<63-1) reads as opaque.

math.MaxInt64 communicates intent better.

4. Verification status in PR description.

The description notes that repo-wide go test ./... and go vet ./... were not green locally, attributed to unrelated/environment failures. Per AGENTS.md (Loop Until Verified): CI on this PR needs to be green before merge, not deferred.

Tests

  • TestSetFromReader_DoesNotExposeFinalFileUntilReaderCompletes — exercises the exact invariant the PR claims to fix. Tight test using io.Pipe to hold the writer open mid-stream.
  • TestSetFromReader_RemovesBlobWhenChecksumPublicationFails — clever use of os.Mkdir(filename+checksumExtension) to force checksum write to fail; verifies blob rollback and temp cleanup.
  • TestFileStoreRelPath — covers happy path, .. escape, sibling-prefix escape, and relative store paths.
  • Missing: test for the renameTempFile "destination exists" branch — even if POSIX-skipped, force via os.Link to validate the Windows path.
  • Missing: concurrent-writer test for two SetFromReader calls on the same key. Even if the errorOnOverwrite race is out of scope, asserting that no partial final file is ever observed under concurrent writers is precisely the invariant this PR is claiming.

Aerospike Changes — Out of Scope (Brief)

Not reviewing in depth because these do not belong in this PR. Skim notes:

  • Import switch github.com/aerospike/aerospike-client-go/v8github.com/bsv-blockchain/aerospike-client-go/v8 across ~30 files. Presumably the BSV fork pin. Deserves its own PR with justification (what is in the fork, why we own it).
  • native_op.go (+296), batch_diagnostics.go (+124), and the rewrites of spend.go / set_mined.go / locked.go / alert_system.go implement a TeranodeModifyOp native operate-path (wire op type 200) with UDF fallback. Real, complex, latency-sensitive code that deserves a dedicated review.
  • New setting useNativeTeranodeOps in settings/aerospike_settings.go is gated by a server-capability probe (detectNativeTeranodeOpSupport). Sensible defensive pattern, but needs its own review thread.

Recommendation

  1. Split the PR. Land the aerospike commits as a separate PR against main. Rebase this branch so it contains only the two file-store commits.
  2. Tighten renameTempFile "already exists" semantics, or document explicitly that the error path is non-POSIX only.
  3. Add a concurrent-writer test for SetFromReader.
  4. Get CI green before merging.

The file-store change itself is good work and addresses a real production bug. The packaging is what needs fixing.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. The file-store work is good and addresses a real production bug, the aerospike native-op design is sound, and Simon's review covers the file-store-scope findings correctly — I won't duplicate any of that. Three new items below, plus a small amount of supporting evidence for his split recommendation.

Merge blocker — go.mod is missing the BSV Aerospike fork in the direct-require block. The PR swaps the upstream Aerospike client for github.com/bsv-blockchain/aerospike-client-go/v8 across 16+ files in stores/utxo/aerospike/, but the new module isn't in go.mod's require block. go.sum has the hash for v8.7.1-bsv2, so MVS will resolve it on the fly, but the version is unpinned in the source of truth. go build ./stores/utxo/aerospike/ fails locally with "no required module provides package". Fix: add github.com/bsv-blockchain/aerospike-client-go/v8 v8.7.1-bsv2 to the direct-require block and run go mod tidy. Worth confirming whether the build happened to work on someone else's machine (GOPATH/GOWORK differences could mask it).

CodeQL #119 is a real defense-in-depth gap. Of the seven CodeQL alerts on stores/blob/file/file.go, six (118, 120–124) are confirmed false positives — the user-provided "key" is hex.EncodeToString(reverse(key)) (charset [0-9a-f]) before reaching any path expression, and the four-layer containment (ConstructFilenamevalidatePathWithinBasestoreRelPathos.Root + filepath.IsLocal) handles the rest. Those six are safe to dismiss-with-justification.

Alert #119 is different: os.Remove(dir) at file.go:1159 (empty hash-prefix directory cleanup in Del) is the only path operation in the file that isn't wrapped in os.Root.Remove. It's currently guarded by dir != s.path + len(filepath.Base(dir)) <= 2 heuristic. In practice the hex encoding upstream prevents anything bad landing here, but the os.Root containment story you've otherwise built is intentionally consistent across all path ops in this file — leaving one os.Remove outside the sandbox weakens that story. Trivial fix: wrap in root.Remove(rel) for consistency.

NFS / network-attached storage operational risk. The new atomic-publication path adds per-write fsync(blob) + fsync(checksum) + fsync(dir). On NVMe this is ~200ms cumulatively per 1000-subtree block — fine. On NFS this becomes 20–200 s per block depending on round-trip latency — catastrophic for block-assembly throughput. Could you confirm no production file stores currently sit on NFS / network-attached storage? If any do, the rollout needs filesystem-aware gating (or the fsync needs to be opt-out via a setting for those deployments). Worth a one-line note in the PR description either way.

Two smaller observations, neither blocking:

  • The PR claims native operate-path will deliver a real speedup over the UDF executor by eliminating the per-call Lua interpreter overhead (3–10 µs per record). I agree the design will yield that, but the Verification section has no benchmark numbers backing the implicit 2–5× speedup claim — and we have no CI benches that exercise this path (the aerospike benches require -tags aerospike + live container). A before/after bench from a local Aerospike container, posted in the PR body, would close that gap. Same goes for the file-store fsync overhead — no Benchmark* functions exist in stores/blob/file/ today.

  • The syncParentDirBestEffort doc framing ("best-effort because support varies by platform and filesystem") is accurate but understates that on ext4/xfs/btrfs the dir-fsync is correctness-critical for final-name durability after crash, not just "nice to have". A future maintainer reading "best-effort" might disable it without realising they're regressing crash safety on the dominant production filesystems. Worth a doc tweak.

Supporting Simon's split recommendation with technical evidence: the six aerospike commits stand on their own — commits 1–2 introduce helpers that are functionally no-op until the probe enables them, commit 3 mechanically switches imports, commits 4–6 iterate on the probe. They have no functional dependency on the two file-store commits, and the spend.go rewrite (445+/125-) is genuinely better INDEPENDENT of native-op (defensive nil-checks, structured ParseLuaMapResponse replacing inline map[interface{}]interface{} type assertions, describeAerospikeBatchRecord diagnostic helpers) — it would be valuable as a standalone diagnostics PR. The BSV fork at v8.7.1-bsv2 is verified published (the proxy.golang.org info endpoint returns 200) and is a strict superset of upstream (adds wire opcodes 200/201 only). Splitting is technically clean.

Verified non-issues worth surfacing: Go 1.26.0 is pinned in go.mod and all CI workflows so os.OpenRoot (Go 1.24+) is safely available; the DAH-migration concern doesn't materialise on upgrade because readDAHFromFile/writeDAHToFile were already dead code in main with the scheduler authoritative; native op type 200 hits the same Aerospike ACL gate as standard ops, and the probe verifies a side-effect rather than just a response code (so MITM-spoofing the probe response is blocked by TLS to the cluster); the BSV Aerospike fork lives under the bsv-blockchain org, same trust boundary as Teranode itself.

I'd be happy to re-review once the go.mod fix lands and the CodeQL #119 wrap is in (the others can be addressed in follow-ups).

@icellan icellan force-pushed the fix/file-store-atomic-publication branch from f4fa990 to be3a508 Compare May 13, 2026 12:45
@icellan icellan requested a review from oskarszoon May 13, 2026 12:54
@icellan

icellan commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

Force-pushed f4fa99006be3a508e0 to drop the 6 aerospike commits per Simon's split recommendation. The PR is now 2 commits / ~370 lines, scoped to stores/blob/file/ only.

Addressed by the force-push:

  • Scope / reviewability (Simon's Major item, my supporting note)
  • go.mod BSV Aerospike fork missing from direct-require block (no longer applicable)
  • Bench / SonarQube / Advanced Security reports anchored to the old commits — these will re-run

Working on now (separate commit(s)):

  • CodeQL docs: fix minor issues in protobuf documentation #119: wrap os.Remove(dir) at file.go:1159 in os.Root
  • renameTempFile ErrBlobAlreadyExists branch semantics + AllowOverwrite interaction
  • big.NewInt(1<<63-1)math.MaxInt64
  • Concurrent-writer test for SetFromReader; "destination exists" test for renameTempFile
  • Benchmark* for fsync overhead in stores/blob/file/
  • syncParentDirBestEffort doc clarifying correctness-critical role on ext4/xfs/btrfs
  • NFS / network-attached storage: will confirm and either gate or document

The aerospike commits will land as a separate PR.

@ordishs

ordishs commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Re-review after split

Re-reviewed the rebased branch. Scope is now exactly what the title promises: 2 commits, stores/blob/file/file.go (+288/-129) and stores/blob/file/file_test.go (+81/-171). Aerospike work moved to #828. 👍

Verified locally on the PR head:

go test -count=1 -race ./stores/blob/file/  →  ok  6.291s

CI on this branch is fully green (test, golangci-lint, smoketest, sequential-*, prunertest, chainintegrity, legacy-sync, CodeQL, all benches).

Resolved from previous review

  • Scope/reviewability — split done.
  • Verification status — CI green, not a deferred concern anymore.

Still open (minor, non-blocking)

  1. renameTempFile ErrBlobAlreadyExistsError branch is dead on POSIX. POSIX rename(2) atomically replaces the destination, so root.Rename doesn't return an error when the destination exists — the existence check inside if err != nil only fires on Windows/non-POSIX filesystems. Not a regression vs. the previous code, just worth either (a) documenting the platform caveat in the godoc, or (b) deciding whether you want a RENAME_NOREPLACE-style guard on Linux via unix.Renameat2.
  2. Concurrent-writer test gap. The new tests cover the partial-write invariant and the checksum-failure rollback well, but there's no test that asserts the invariant under concurrent writers for the same key. Even with the existing errorOnOverwrite TOCTOU race (out of scope), an assertion that no observer ever sees a partial final file under two parallel SetFromReader calls is exactly what "atomic publication" promises.
  3. big.NewInt(1<<63-1) in createTempSiblingmath.MaxInt64 would read more clearly. Pure nit.
  4. Stale PR description. The "Branch context" section still references the previous bundling with feat/teranode-native-ops and the "Verification" section still notes repo-wide failures from the old branch state. Both are out of date now that the split is done — worth tightening before merge so future readers of the merge commit aren't misled.

Verdict

The file-store change is good and addresses the production bug cleanly. The scope split addresses my main concern. Items above are non-blocking — happy for this to merge once (4) is tidied up. Items (1)–(3) are fine as follow-ups.

Round-trip changes from the review comments left after the initial
atomic-publication PR:

- CodeQL #119: route the post-Del hash-prefix directory removal
  through os.Root so every path operation in this file stays inside
  the store sandbox.
- renameTempFile: cross-platform semantics tightened. The
  ErrBlobAlreadyExists branch is documented as non-POSIX only
  (rename(2) atomically replaces on POSIX), and an allowOverwrite
  parameter lets non-POSIX callers explicitly remove the destination
  and retry instead of receiving a spurious already-exists error
  when they opted in to overwriting.
- syncParentDirBestEffort: doc clarifies that the directory fsync
  is correctness-critical on ext4/xfs/btrfs after rename, not an
  optional "nice to have". "Best-effort" applies to failure handling,
  not to whether the call should run.
- createTempSibling: replace big.NewInt(1<<63-1) with
  big.NewInt(math.MaxInt64) for readability.
- fsyncMode opt-out via the ?fsyncMode=full|data|none URL parameter.
  Default remains full (current behaviour, safe on local disks).
  Operators running the file store over NFS or other network-attached
  storage can drop to data (skip the directory fsync) or none (skip
  both) to avoid the ~10-20 ms per-write cost; the trade-off is
  weakened crash safety as documented at the fsyncMode declaration.

Tests added:

- TestSetFromReader_ConcurrentWritersNeverExposePartialFinal: eight
  concurrent SetFromReader calls on the same key with AllowOverwrite.
  A reader poller asserts that any successful read of the final
  filename is byte-identical to the expected header+body payload,
  never a partial prefix.
- TestRenameTempFile_OverwriteAndRejectSemantics: documents the
  observable POSIX behaviour (rename replaces on AllowOverwrite=true,
  errorOnOverwrite stops the call earlier when false) so a regression
  on either branch shows up in CI.
- TestParseFsyncMode / TestNewWithFsyncMode: cover the new URL
  parameter (parsing, end-to-end Set/Get on each mode, invalid value
  rejection).
- BenchmarkSetFromReader_FsyncModes: measures the cost of the
  atomic-publication path across three payload sizes and the three
  fsync modes. Local-disk numbers on an Apple M3 Max show
  fsyncModeFull at ~20 ms/op, fsyncModeData at ~10 ms/op, fsyncModeNone
  at <1 ms/op — gives operators a concrete frame for evaluating the
  trade-off on their target filesystem.
@icellan

icellan commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

Pushed cec145fc0 addressing the review feedback. Detail per item:

From @ordishs

  • ✅ Scope split — earlier force-push (be3a508e0)
  • renameTempFile ErrBlobAlreadyExists branch / AllowOverwrite — added allowOverwrite parameter; on the non-POSIX path the helper now removes the destination and retries when overwrite was requested. POSIX behaviour and the unreachable-on-POSIX nature of the branch are documented at the function.
  • big.NewInt(1<<63-1)big.NewInt(math.MaxInt64)
  • ✅ Missing concurrent-writer test — added TestSetFromReader_ConcurrentWritersNeverExposePartialFinal. Eight concurrent Set calls with AllowOverwrite=true; reader poller asserts any successful read is byte-identical to the expected header+body, never a partial prefix.
  • ⚠️ Missing destination-exists test for renameTempFile — added TestRenameTempFile_OverwriteAndRejectSemantics that documents the observable POSIX contract (rename replaces; existence is enforced by errorOnOverwrite earlier). I did not force the Windows branch via os.Link because the new test covers what's reachable on every CI platform and the non-POSIX path is now structurally simpler. If you'd rather have an explicit os.Link fixture for the Windows path I can add it.
  • ⚠️ CI green — pre-existing failures are env-related as before; this PR's package suite is green (go test -race ./stores/blob/file/...).

From @oskarszoon

  • go.mod BSV Aerospike fork merge blocker — resolved by the earlier force-push (aerospike commits gone).
  • ✅ CodeQL docs: fix minor issues in protobuf documentation #119os.Remove(dir) at file.go:1159 now routed through root.Remove(rel) so every path op in the file stays inside the os.Root sandbox.
  • ✅ NFS / network-attached storage risk — added ?fsyncMode=full|data|none URL param. Default remains full (current behaviour, safe on local disks). Operators on NFS can drop to data (skip dir fsync) or none (skip both). The trade-off is documented at the fsyncMode declaration.
  • ✅ Missing Benchmark* for fsync overhead — added BenchmarkSetFromReader_FsyncModes across 3 payload sizes × 3 fsync modes. Local-disk numbers on an Apple M3 Max:
    • mode=full: ~20 ms/op
    • mode=data: ~10 ms/op
    • mode=none: <1 ms/op
  • syncParentDirBestEffort doc — rewritten to make clear the directory fsync is correctness-critical on ext4/xfs/btrfs for crash safety. "Best-effort" applies only to failure handling, not to whether the call should run.
  • ✅ Native-op / BSV-fork comments — moot post-split.

go build / vet / test -race / staticcheck / golangci-lint all green on ./stores/blob/file/....

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
64.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@ordishs

ordishs commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Re-review after cec145fc0

All four items from my previous review plus oskarszoon's six are addressed. The new commit is a clean, well-scoped round of fixes.

Verified locally on the PR head:

go test -count=1 -race ./stores/blob/file/  →  ok  6.718s
go vet ./stores/blob/file/                  →  clean

CI on the PR head is fully green (test, golangci-lint, smoketest, sequential-{sqlite,postgres,aerospike}, prunertest, chainintegrity, legacy-sync, CodeQL, all benches).

Resolved

  • CodeQL docs: fix minor issues in protobuf documentation #119 — post-Del hash-prefix directory removal now goes through root.Remove(rel), consistent with the rest of the file. Every path operation in file.go stays inside the os.Root sandbox.
  • renameTempFile cross-platform semantics — added allowOverwrite parameter; on the non-POSIX branch the helper removes the destination and retries when overwrite was requested. The POSIX-unreachability of the existence-check branch is now documented at the function. Good clarification.
  • syncParentDirBestEffort doc — rewritten so "best-effort" is unambiguously about failure handling, not about whether the call should run. Future maintainers won't delete it under the misreading that directory fsync is optional.
  • big.NewInt(1<<63-1)big.NewInt(math.MaxInt64) — done.
  • Concurrent-writer testTestSetFromReader_ConcurrentWritersNeverExposePartialFinal exercises eight concurrent Set calls with AllowOverwrite=true while a reader polls. Any successful read must be byte-identical to the expected header+body — exactly the atomic-publication invariant.
  • renameTempFile destination-exists testTestRenameTempFile_OverwriteAndRejectSemantics documents the observable POSIX contract. Not testing the Windows branch via os.Link is fine — the path is structurally simpler now and any regression on the reachable POSIX behaviour shows up in CI.
  • NFS / network-attached storage trade-off?fsyncMode=full|data|none URL param. Default stays full so local-disk deployments get the existing crash-safety guarantee unchanged. Benchmark numbers from BenchmarkSetFromReader_FsyncModes give operators concrete data:
    payload=256B/mode=full   ≈ 16 ms/op
    payload=256B/mode=data   ≈  8 ms/op
    payload=256B/mode=none   ≈ 0.3 ms/op
    
    (Apple M5 Max, local APFS; on NFS the gap will be larger.) The fsyncMode doc block clearly states what each operator is opting out of.

Minor / non-blocking

  1. SonarQube coverage gate at 64.8% (threshold 80%). Per-function: the new functional paths are well-covered (parseFsyncMode 100%, writeHashFile 87.5%, SetFromReader 77.8%, storeRelPath/openStoreRoot/removeStorePath 75%). The drag is hard-to-trigger error paths and the non-POSIX-only branches in renameTempFile (52%), syncAndCloseTempFile (50%), and syncParentDirBestEffort (61%). Those are structural to systems-level code and physically unreachable on the Linux CI; not a behaviour gap. Worth either accepting the delta or adding a small build-tag-guarded test that fakes the non-POSIX rename. Either is fine.
  2. Stale PR description. The "Branch context" section still references the pre-split rebase from feat/teranode-native-ops, and the "Verification" section still mentions repo-wide failures from the bundled-PR state. Both are obsolete and would mislead future readers of the merge commit. Worth tidying before merge.
  3. Pure nit: BenchmarkSetFromReader_FsyncModes body calls f.Set (which delegates to SetFromReader internally). Name suggests it exercises SetFromReader directly. Either rename or add a Set overload — purely cosmetic.

Verdict

LGTM to merge. Item (2) is the only thing I'd address before the merge so the commit body reads correctly. Items (1) and (3) are fine as follow-ups (or as no-action).

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every blocking item from the original CHANGES_REQUESTED is resolved in the current code. Two minor follow-ups below, neither blocking.


Per-item verification

1. go.mod / Aerospike fork — RESOLVED

git log upstream/main..HEAD is exactly three commits (64043d59c, be3a508e0, cec145fc0), all scoped to stores/blob/file/. No aerospike commits.

The git diff upstream/main..HEAD -- go.mod output shows one line removed (pgregory.net/rapid v1.3.0), which looks alarming at first but is a red herring: rapid is absent from go.mod at the branch point (merge base adf9e66c5) and absent at HEAD — it was added to upstream/main after this branch was cut, by PR #848. The diff is upstream having moved forward, not this PR touching go.mod. Confirmed: no go.mod or go.sum changes attributable to this PR.

2. CodeQL #119 — os.Root sandbox — MOSTLY RESOLVED, two raw calls remain in Del

The write path is fully sandboxed. createTempSibling uses root.OpenFile, renameTempFile uses root.Rename / root.Remove, removeStorePath uses root.Remove, and syncParentDirBestEffort uses root.Open + dir.Sync(). The hash-prefix directory cleanup that was the original CodeQL site is now routed through openStoreRoot() + root.Remove(rel) at file.go:1208–1214, with the comment explicitly crediting the CodeQL fix. That specific issue is resolved.

However, Del still has two raw os.Remove calls that bypass os.Root:

  • file.go:1187: os.Remove(checksumFile) — the checksum sidecar delete.
  • file.go:1191: os.Remove(fileName) — the blob delete.

Both checksumFile and fileName are constructed from merged.ConstructFilename(s.path, key, fileType), so they're derived from the store path, and options-layer path traversal validation happens upstream. But CodeQL #119 was specifically about raw os.Remove being outside the os.Root sandbox — these two are the same pattern as the original finding, just in Del rather than the post-delete cleanup. It's not clear why removeStorePath wasn't used consistently here while the rest of Del uses root.Remove. Worth checking whether CodeQL re-runs against these two.

This is the one item where "addressed" is accurate for the original finding but the same class of issue survives elsewhere in the file. Not a new regression from this PR — Del was unchanged from before — but if CodeQL fired on the hash-prefix dir remove, it may fire here too.

3. NFS / fsyncMode URL param — RESOLVED

parseFsyncMode exists at file.go:79–90. It parses "" and "full" both as fsyncModeFull, "data" as fsyncModeData, "none" as fsyncModeNone, and returns a ConfigurationError for anything else (case-insensitive via strings.ToLower). Default is full — confirmed: parseFsyncMode("") returns fsyncModeFull, nil.

The three modes are documented at the type declaration (file.go:55–77) with explicit operator guidance: what each mode trades off, when to use data (NFS with expensive dir fsync), when to accept none (operator opts out of crash safety). The comment is accurate and appropriately explicit about the tradeoff.

Behaviour verified against code:

  • fsyncModeNone: syncAndCloseTempFile skips file.Sync() at file.go:1292 (if s.fsyncMode != fsyncModeNone). syncParentDirBestEffort returns immediately at file.go:1326 (if s.fsyncMode != fsyncModeFull). Both fsyncs skipped. Correct.
  • fsyncModeData: syncAndCloseTempFile runs file.Sync() (condition is != None, which passes). syncParentDirBestEffort returns immediately (!= Full). File content durable, directory entry not. Correct.
  • fsyncModeFull: both run. Correct.

BenchmarkSetFromReader_FsyncModes (file_test.go:767) calls real f.Set() for all three modes across 256B, 4KB, 64KB payloads. It uses a real temp dir and real fsync; on Linux CI the full/data modes invoke actual file.Sync(). The benchmark is meaningful for regression detection of the fsync schedule.

4. Bench numbers for native-op claim — SKIPPED (moot after scope split, as noted)

5. syncParentDirBestEffort doc framing — RESOLVED

The godoc at file.go:1309–1324 is clear. The first two paragraphs establish that on ext4/xfs/btrfs this is correctness-critical, not optional. The third paragraph explicitly scopes "best-effort" to failure handling only:

"The 'best-effort' framing applies only to failure handling: directory sync is not supported uniformly across platforms... so a failure to open or sync the parent directory is logged at debug level rather than turning an already-successful rename into a failed blob write."

This directly addresses the concern. The comment also preemptively warns against removal: "Removing this call regresses crash safety on those filesystems; future maintainers should not delete it under the assumption that 'best effort' implies optional."


ordishs' post-cec145fc0 items

renameTempFile cross-platform semantics: the test at file_test.go:673 (TestRenameTempFile_OverwriteAndRejectSemantics) documents the POSIX contract in the test comment and exercises both allowOverwrite values. The renameTempFile godoc at file.go:1355–1377 explains the POSIX vs non-POSIX divergence and names the TOCTOU window explicitly. Ordishs' concern is addressed.

Concurrent-writer test: TestSetFromReader_ConcurrentWritersNeverExposePartialFinal (file_test.go:588) runs 8 writer goroutines against a single key with AllowOverwrite=true for 500ms while a reader polls os.ReadFile on the final path and asserts byte-identical reads. This does exercise the atomic-publication invariant. The reader uses os.ReadFile rather than the store's Get, which is actually stronger: it verifies the kernel-visible atomicity of the rename, not just the store's API contract. One note: 500ms wall-clock is short; a heavily loaded CI machine could give this test very few reader-observed-writes. It's not a weak test, but it's also not a deterministic stress test — it'll pass even with zero successful reads as long as nothing errors. Worth keeping in mind.

big.NewInt(math.MaxInt64): at file.go:1257, rand.Int(rand.Reader, big.NewInt(math.MaxInt64)) generates a random int64-range integer for temp file naming. rand.Int returns [0, max), so the upper bound is math.MaxInt64 - 1. This is fine — the value is used only as a filename discriminator, not for cryptographic purposes, and the range is large enough to make collisions negligible with 10 retries. Ordishs flagged it as something to double-check; it's correct.


New observations

Raw os.Remove in Del (filing separately from item 2): Both os.Remove(checksumFile) (line 1187) and os.Remove(fileName) (line 1191) bypass os.Root. The rest of Del routes through removeStorePath via root.Remove. This inconsistency is pre-existing and not introduced by this PR, but it's the same pattern CodeQL fired on. If CodeQL scans Del, expect it to reopen the finding. The fix would be replacing those two calls with s.removeStorePath(...).

Health check uses raw os.Remove: lines 553 and 574 both call os.Remove on paths constructed from s.path and the system temp name. These are health-check temp files created in the store root, so the path is controlled, but they're still outside the os.Root boundary. Again pre-existing, not from this PR.

Sonar 64.8% coverage gate: ordishs' framing is correct — renameTempFile's non-POSIX branch (the allowOverwrite remove-and-retry path), syncAndCloseTempFile's close-after-sync-failure branch, and syncParentDirBestEffort's error-on-open branch are all only reachable on non-Linux platforms or via injected I/O errors. A tempfile that fails to sync or close is hard to trigger without fault injection; the renameTempFile Windows branch is physically unreachable on Linux. These aren't coverage gaps the author can close without either platform-specific builds or error-injection hooks. The 64.8% gate failure is a structural issue with the gate threshold, not with the code or tests.


Verdict

Lifting CHANGES_REQUESTED. The three blocking items are resolved as claimed. Two non-blocking follow-ups:

  1. Non-blocking: Del's two raw os.Remove calls for the blob file and checksum file (file.go:1187, 1191) are the same pattern as CodeQL #119 in a different method. Pre-existing, not introduced here. Consider routing through removeStorePath in a follow-up PR to close the CodeQL class consistently — the hash-prefix dir cleanup in this same method already does it correctly.

  2. Non-blocking: concurrent-writer test is real but time-bounded (500ms); it passes with zero observed reads. Consider adding a read counter assertion (require.Greater(t, reads, 0)) to ensure the race actually runs before the timeout elapses.

@icellan

icellan commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Re: "merge blocker — go.mod missing the BSV Aerospike fork in the direct-require block."

This doesn't reproduce against current PR HEAD (ac99a44). The direct require is present at go.mod:13–17 (with the explanatory comment about wire opcodes 200/201), pinned at v8.7.1-bsv2, and the upstream aerospike/aerospike-client-go/v8 v8.4.2 is correctly demoted to // indirect (pulled in transitively via testcontainers-aerospike-go).

Verification on the PR head:

$ go build ./stores/utxo/aerospike/    →  exit 0, no output
$ go mod tidy                          →  no changes to go.mod or go.sum

History also lines up — commit 1f4ee916 ("aerospike: switch import path to BSV fork") moved the dependency into the direct-require block at v8.7.1-bsv1, and a67035f4 bumped to -bsv2. Both predate the review.

Most likely the failing local build was against a stale checkout from before 1f4ee916 landed, or a GOWORK/GOFLAGS=-mod=mod quirk mis-resolved the path. Removing the merge blocker on this point.

The other items in that review (CodeQL #119 os.Remove not wrapped in os.Root, NFS fsync risk note, missing concurrent-writer test, doc tweak on syncParentDirBestEffort) still stand and will be addressed separately.

@icellan icellan merged commit e2c10f2 into main May 15, 2026
30 checks passed
@icellan icellan deleted the fix/file-store-atomic-publication branch May 15, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants