Skip to content

refactor: rewrite batch mode to use temp directory#142

Merged
gammazero merged 12 commits intomasterfrom
batch-temp-dir
Jan 13, 2026
Merged

refactor: rewrite batch mode to use temp directory#142
gammazero merged 12 commits intomasterfrom
batch-temp-dir

Conversation

@gammazero
Copy link
Contributor

@gammazero gammazero commented Nov 14, 2025

Continued from #136
Credit to @requilence for most of this PR

Problem

When uploading large files through UnixFS or other IPLD structures, the current batch implementation keeps everything in memory and writes blocks individually. If the application crashes mid-process, partial blocks remain in FlatFS, making garbage collection extremely difficult since there's no way to identify which blocks belong to incomplete operations.

Solution

This PR rewrites the batch mode to use a temporary directory approach:

  1. Atomic batch operations: All blocks in a batch are written to a temporary directory first, then renamed to their final locations only on commit. If the process crashes, no partial data remains in the main datastore - the temp directory is cleaned on restart.
  2. Memory efficiency: Instead of keeping all data in memory, blocks are written directly to temp files, enabling large file uploads without high RAM usage.
  3. Read operations in batch: Added BatchReader interface implementing Get/Has/GetSize methods. This is essential for IPLD structures which need to verify block links during construction. Without batch reads, you cannot build complex IPLD structures that reference blocks added earlier in the same batch. This follows standard database transaction semantics where reads within a transaction see uncommitted writes.

Implementation Details

  • Each batch creates a unique temp directory under .temp/
  • Files are written without sharding in temp
  • On commit, files are atomically renamed to their sharded destinations
  • On discard or crash, temp files are cleaned up (on the next startup). This code was already existing
  • Thread-safe with mutex protection
  • Batch can be reused after commit/discard (because this behavior of Batch was before my changes)

requilence and others added 2 commits November 14, 2025 11:10
* Rewrite batch mode to use temp directory

Write batch data to temp directory instead of memory to avoid high RAM usage with large files. Each batch creates its own temp directory, writes blocks directly to disk, and renames all files atomically on commit.

* Use slice instead of map for batch puts
* Add DiscardableBatch interface with Discard method
* add read operations for batch iface
* fix: remove unused function and variable to fix CI checks
* Removed unused tempFileOnce() method
* Replaced unused 'done' variable with '_' in Commit method
* remove some local files
* perf: optimize batch read operations to skip filesystem checks

Only check temp directory for keys that exist in puts slice

* feat: implement batch Query with temp directory merging
* make batch Put operations async

Address PR review feedback

* fix Batch init
* Update flatfs.go
@lidel lidel mentioned this pull request Nov 18, 2025
67 tasks
@hsanjuan hsanjuan requested review from hsanjuan and lidel November 25, 2025 15:24
@gammazero
Copy link
Contributor Author

Triage: Open PR in kubo and test on kubo staging.

gammazero added a commit to ipfs/kubo that referenced this pull request Dec 4, 2025
Upgrade go-ds-flatfs to version that uses uses temproary files to store items added to batches.

See: ipfs/go-ds-flatfs#142
- add godoc for maxConcurrentPuts explaining why 16 was chosen
- fix incorrect O(n) claims for Get/Has/GetSize (actually O(1) via putSet map)
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, as long we address two things:

  1. Updated godoc (bd468ee) to reflect the latest version, but unsure about one thing – see inline.
  2. IIUC two concurrent goroutines calling os.Create() and Write() on the same file can error or corrupt data – see inline.

gammazero and others added 2 commits December 8, 2025 19:43
Co-authored-by: Marcin Rataj <lidel@lidel.org>
flatfs.go Outdated
return
}

file, err := os.Create(tempFile)
Copy link
Member

@lidel lidel Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit worried about this on Windows: i suspect concurrent async writes (multiple ipfs add) may cause file handle issues, things like golang/go#34681 – flatfs uses https://github.com/alexbrainman/goissue34681 for a reason, we likely need similar wrapper on windows here (old code used tempFileOnce from util_windows.go).

@gammazero too late on my end to write code, but we probably need to create createFile equivalent that does something similar (pseudocode):

util_unix.go:

  func createFile(name string) (*os.File, error) {
  	return os.Create(name)
  }

util_windows.go:

  func createFile(name string) (*os.File, error) {
  	return goissue34681.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0666)
  }

Then finally here:

  file, err := createFile(tempFile)

Windows concurrency is janky, unsure if we need a retry wrapper, maybe can go without it, but in case, it could be something like:

util_windows.go:

  func createFile(name string) (*os.File, error) {
  	var file *os.File
  	var err error
  	for i := 0; i < RetryAttempts; i++ {
  		file, err = createFileOnce(name)
  		if err == nil || !isTooManyFDError(err) {
  			break
  		}
  		time.Sleep(time.Duration(i+1) * RetryDelay)
  	}
  	return file, err
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, without retry wrapper.

@hsanjuan
Copy link
Contributor

hsanjuan commented Dec 9, 2025

Triage:

  • Should test windows well. Perhaps stress-test with a test in this repo. Potential problem with file with duplicate blocks writing in parallel etc.

@gammazero gammazero requested a review from lidel December 20, 2025 04:21
@gammazero
Copy link
Contributor Author

@lidel Included batch concurrency test TestConcurrentDuplicateBatchWrites (last test in batch_test.go). Let me know if this looks sufficient.

Comment on lines +1209 to +1216
// Skip duplicate keys to prevent concurrent goroutines from writing to the
// same temp file. Without this check, two Put calls with the same key
// would spawn goroutines that race on os.Create/Write, potentially
// corrupting the file contents.
if _, exists := bt.putSet[key]; exists {
bt.mu.Unlock()
<-bt.asyncPutGate // Release semaphore slot acquired above
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. the Query method documentation says:

Keys Put multiple times appear only once (last write wins)

BUT the duplicate key handling in here seem to silently discard updates?

This means:

batch.Put(ctx, key, []byte("first"))
batch.Put(ctx, key, []byte("second"))
batch.Commit(ctx)  // writes "first", not "second"

@gammazero late friday so maybe I misunderstood intent here, but would it be more intuitive for the batch to implement "last write wins" semantics instead of "first write wins"?

My main worry is that the current behavior could cause subtle bugs for callers expecting standard transaction semantics where later operations override earlier ones.

Copy link
Contributor Author

@gammazero gammazero Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking was that duplicate keys should have the same data (assuming key is CID), so instead of blocking/overwriting when key already exists, discard the write instead. If "last write wins" is needed within a batch, because a write with same key could have different data than previous write, then we need change to change behavior to block/overwrite implementation or not do asynchronous writes.

Should the batch behavior be configurable? Last-write-wins or discard-duplicate-write

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I stepped back and did some research on this.

TL;DR: Seems that I was wrong, this is not a genera-purpose datastore: the current behavior (first-successful-writer-wins, discard duplicates) is correct for flatfs's intended use case (limited to CID→block), but we should document it more explicitly (for humans and LLMs).

Why this is fine for flatfs

  1. flatfs is explicitly a special-purpose datastore for CID:block pairs only - Kubo's docs already state: "flatfs must only be used as a block store (mounted at /blocks) as it only partially implements the datastore interface"

  2. CIDs guarantee value determinism - A CID is a cryptographic hash of the content. If Put(cid, A) and Put(cid, B) are called with the same CID, then A == B by definition (or one is corrupted). First-writer-wins and last-writer-wins produce identical results.

  3. Non-batch Put already has this behavior - The existing doWriteOp documents: "we assume that the first succeeding operation on that key was the last one to happen after all successful others"

  4. go-datastore Batch interface doesn't guarantee ordering - The interface docs say batches "do NOT have transactional semantics"

Next steps

I've pushed 410cfd2 with documentation updates that make the "first-successful-writer-wins" semantics explicit in :

  • README.md restrictions section
  • Batch type documentation
  • Put method documentation

It also points users needing last-writer-wins for non-deterministic values should use leveldb, or pebble instead.

If we are ok with "first-successful-writer-wins" for flatfs, and it being limited to cid→block use, I think this should be enough. Thoughts @gammazero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with "first-successful-writer-wins" for this limited purpose datastore, and given rule 2 above, it is the most correct behavior.

@gammazero gammazero requested a review from lidel January 12, 2026 15:29
lidel added 2 commits January 12, 2026 19:58
clarify flatfs is for content-addressed storage only and point users
to leveldb/pebble for mutable data needing last-writer-wins.
- document batch reuse contract per go-datastore spec
- fix TOCTOU race in temp dir creation using MkdirAll
- log warnings on temp dir cleanup failures
@lidel lidel changed the title Rewrite batch mode to use temp directory (#136) refactor: rewrite batch mode to use temp directory (#136) Jan 12, 2026
@lidel lidel changed the title refactor: rewrite batch mode to use temp directory (#136) refactor: rewrite batch mode to use temp directory Jan 12, 2026
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Previous concerns were addressed:

  • docs clarified
  • TOCTOU race fixed with MkdirAll
  • temp dir cleanup now logs warnings on failure (so users debugging get useful info)
  • batch reuse contract documented (#142 (comment))
  • did some manual testing and races, so far all good

@gammazero if #142 (comment) sounds right, feel free to merge. Up to you if you wan to tag a release, or switch kubo to latest commit from master for now for RC testign, until final release.

@gammazero gammazero merged commit 5f67d0e into master Jan 13, 2026
9 checks passed
@gammazero gammazero deleted the batch-temp-dir branch January 13, 2026 04:26
gammazero added a commit to ipfs/kubo that referenced this pull request Jan 13, 2026
* datastore: upgrade go-ds-flatfs to v0.6.0
See: ipfs/go-ds-flatfs#142
* docs(changelog): add go-ds-flatfs atomic batch writes
*documents the new flatfs batch implementation that uses atomic
operations via temp directory, preventing orphan blocks on interrupted
imports and reducing memory usage.
* includes improved tests, batch cleanup fixes, and docs
* docs(changelog): reframe go-ds-flatfs entry for users
focus on user benefits instead of implementation details
@lidel lidel added the iteration/2026-q1 On maintainer radar for Q1 2026 label Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

iteration/2026-q1 On maintainer radar for Q1 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants