refactor: rewrite batch mode to use temp directory#142
Conversation
* Rewrite batch mode to use temp directory Write batch data to temp directory instead of memory to avoid high RAM usage with large files. Each batch creates its own temp directory, writes blocks directly to disk, and renames all files atomically on commit. * Use slice instead of map for batch puts * Add DiscardableBatch interface with Discard method * add read operations for batch iface * fix: remove unused function and variable to fix CI checks * Removed unused tempFileOnce() method * Replaced unused 'done' variable with '_' in Commit method * remove some local files * perf: optimize batch read operations to skip filesystem checks Only check temp directory for keys that exist in puts slice * feat: implement batch Query with temp directory merging * make batch Put operations async Address PR review feedback * fix Batch init * Update flatfs.go
|
Triage: Open PR in kubo and test on kubo staging. |
Upgrade go-ds-flatfs to version that uses uses temproary files to store items added to batches. See: ipfs/go-ds-flatfs#142
- add godoc for maxConcurrentPuts explaining why 16 was chosen - fix incorrect O(n) claims for Get/Has/GetSize (actually O(1) via putSet map)
lidel
left a comment
There was a problem hiding this comment.
Lgtm, as long we address two things:
- Updated godoc (bd468ee) to reflect the latest version, but unsure about one thing – see inline.
- IIUC two concurrent goroutines calling
os.Create()andWrite()on the same file can error or corrupt data – see inline.
Co-authored-by: Marcin Rataj <lidel@lidel.org>
flatfs.go
Outdated
| return | ||
| } | ||
|
|
||
| file, err := os.Create(tempFile) |
There was a problem hiding this comment.
Bit worried about this on Windows: i suspect concurrent async writes (multiple ipfs add) may cause file handle issues, things like golang/go#34681 – flatfs uses https://github.com/alexbrainman/goissue34681 for a reason, we likely need similar wrapper on windows here (old code used tempFileOnce from util_windows.go).
@gammazero too late on my end to write code, but we probably need to create createFile equivalent that does something similar (pseudocode):
util_unix.go:
func createFile(name string) (*os.File, error) {
return os.Create(name)
}util_windows.go:
func createFile(name string) (*os.File, error) {
return goissue34681.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0666)
}Then finally here:
file, err := createFile(tempFile)Windows concurrency is janky, unsure if we need a retry wrapper, maybe can go without it, but in case, it could be something like:
util_windows.go:
func createFile(name string) (*os.File, error) {
var file *os.File
var err error
for i := 0; i < RetryAttempts; i++ {
file, err = createFileOnce(name)
if err == nil || !isTooManyFDError(err) {
break
}
time.Sleep(time.Duration(i+1) * RetryDelay)
}
return file, err
}There was a problem hiding this comment.
Done, without retry wrapper.
|
Triage:
|
|
@lidel Included batch concurrency test |
| // Skip duplicate keys to prevent concurrent goroutines from writing to the | ||
| // same temp file. Without this check, two Put calls with the same key | ||
| // would spawn goroutines that race on os.Create/Write, potentially | ||
| // corrupting the file contents. | ||
| if _, exists := bt.putSet[key]; exists { | ||
| bt.mu.Unlock() | ||
| <-bt.asyncPutGate // Release semaphore slot acquired above | ||
| return nil |
There was a problem hiding this comment.
Hm.. the Query method documentation says:
Keys Put multiple times appear only once (last write wins)
BUT the duplicate key handling in here seem to silently discard updates?
This means:
batch.Put(ctx, key, []byte("first"))
batch.Put(ctx, key, []byte("second"))
batch.Commit(ctx) // writes "first", not "second"@gammazero late friday so maybe I misunderstood intent here, but would it be more intuitive for the batch to implement "last write wins" semantics instead of "first write wins"?
My main worry is that the current behavior could cause subtle bugs for callers expecting standard transaction semantics where later operations override earlier ones.
There was a problem hiding this comment.
Thinking was that duplicate keys should have the same data (assuming key is CID), so instead of blocking/overwriting when key already exists, discard the write instead. If "last write wins" is needed within a batch, because a write with same key could have different data than previous write, then we need change to change behavior to block/overwrite implementation or not do asynchronous writes.
Should the batch behavior be configurable? Last-write-wins or discard-duplicate-write
There was a problem hiding this comment.
Thanks, I stepped back and did some research on this.
TL;DR: Seems that I was wrong, this is not a genera-purpose datastore: the current behavior (first-successful-writer-wins, discard duplicates) is correct for flatfs's intended use case (limited to CID→block), but we should document it more explicitly (for humans and LLMs).
Why this is fine for flatfs
-
flatfs is explicitly a special-purpose datastore for CID:block pairs only - Kubo's docs already state: "flatfs must only be used as a block store (mounted at
/blocks) as it only partially implements the datastore interface" -
CIDs guarantee value determinism - A CID is a cryptographic hash of the content. If
Put(cid, A)andPut(cid, B)are called with the same CID, thenA == Bby definition (or one is corrupted). First-writer-wins and last-writer-wins produce identical results. -
Non-batch Put already has this behavior - The existing
doWriteOpdocuments: "we assume that the first succeeding operation on that key was the last one to happen after all successful others" -
go-datastore Batch interface doesn't guarantee ordering - The interface docs say batches "do NOT have transactional semantics"
Next steps
I've pushed 410cfd2 with documentation updates that make the "first-successful-writer-wins" semantics explicit in :
- README.md restrictions section
- Batch type documentation
- Put method documentation
It also points users needing last-writer-wins for non-deterministic values should use leveldb, or pebble instead.
If we are ok with "first-successful-writer-wins" for flatfs, and it being limited to cid→block use, I think this should be enough. Thoughts @gammazero?
There was a problem hiding this comment.
I agree with "first-successful-writer-wins" for this limited purpose datastore, and given rule 2 above, it is the most correct behavior.
clarify flatfs is for content-addressed storage only and point users to leveldb/pebble for mutable data needing last-writer-wins.
- document batch reuse contract per go-datastore spec - fix TOCTOU race in temp dir creation using MkdirAll - log warnings on temp dir cleanup failures
lidel
left a comment
There was a problem hiding this comment.
LGTM. Previous concerns were addressed:
- docs clarified
- TOCTOU race fixed with MkdirAll
- temp dir cleanup now logs warnings on failure (so users debugging get useful info)
- batch reuse contract documented (#142 (comment))
- did some manual testing and races, so far all good
@gammazero if #142 (comment) sounds right, feel free to merge. Up to you if you wan to tag a release, or switch kubo to latest commit from master for now for RC testign, until final release.
* datastore: upgrade go-ds-flatfs to v0.6.0 See: ipfs/go-ds-flatfs#142 * docs(changelog): add go-ds-flatfs atomic batch writes *documents the new flatfs batch implementation that uses atomic operations via temp directory, preventing orphan blocks on interrupted imports and reducing memory usage. * includes improved tests, batch cleanup fixes, and docs * docs(changelog): reframe go-ds-flatfs entry for users focus on user benefits instead of implementation details
Continued from #136
Credit to @requilence for most of this PR
Problem
When uploading large files through UnixFS or other IPLD structures, the current batch implementation keeps everything in memory and writes blocks individually. If the application crashes mid-process, partial blocks remain in FlatFS, making garbage collection extremely difficult since there's no way to identify which blocks belong to incomplete operations.
Solution
This PR rewrites the batch mode to use a temporary directory approach:
BatchReaderinterface implementing Get/Has/GetSize methods. This is essential for IPLD structures which need to verify block links during construction. Without batch reads, you cannot build complex IPLD structures that reference blocks added earlier in the same batch. This follows standard database transaction semantics where reads within a transaction see uncommitted writes.Implementation Details