Conversation
1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes #898 (comment) License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>
|
We may want to reduce the parallelism. However, we should probably test badger first (it may work better with increased parallelism). |
|
This makes adds with sync enabled almost as fast adds with sync disabled with the badger datastore (and the same |
|
This is really nice, great find here :) |
merkledag/batch.go
Outdated
| ) | ||
|
|
||
| // ParallelBatchCommits is the number of batch commits that can be in-flight before blocking. | ||
| // TODO: Experiment with multiple datastores, storage devices, and CPUs to find |
There was a problem hiding this comment.
Can you create issues for this instead of in code TODO (they just get forgotten).
There was a problem hiding this comment.
Fine! (...grumble... we'll never get to it anyways)
There was a problem hiding this comment.
that might be true but it is still better than having TODO in code.
As you said in your issue, someone might just take a stab at it out of pure boredom.
There was a problem hiding this comment.
You're right, I was just being lazy 🙂.
merkledag/batch.go
Outdated
| }(t.blocks) | ||
|
|
||
| t.activeCommits++ | ||
| t.blocks = nil |
There was a problem hiding this comment.
I would preallocate a buffer here of MaxBlocks as appending will expand the buffer and cause more allocations and copies.
There was a problem hiding this comment.
Technically, the max size of this array is 128 pointers to blocks (2KiB). However, it will likely never be greater than 32 pointers (0.5KiB) assuming that we have 256KiB blocks. Does 32 sound like a reasonable default size?
Personally, I don't think that will make much of a difference. We already do 1 allocation per block so this will only add another log(n) allocations.
There was a problem hiding this comment.
Actually, I just preallocated a blocks array of the same size as the one we just filled. That should be a reasonable guess.
There was a problem hiding this comment.
As go allocates next power of 2, getting to 128 would be 9 reallocations and copies. IMO it is worth it.
There was a problem hiding this comment.
You're right, my log(n) estimate was incorrect anyways. log(n) per batch but still n overall (7-15% allocation overhead depending on the block sizes).
License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>
It's probably safe to assume that this buffer will be about the same time each flush. This could cause 1 extra allocation (if this is the last commit) but that's unlikely to be an issue. License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>
|
After further testing, the effect isn't nearly so pronounced for medium-size files (the tests above were on single large files) and is probably non-existent for small files as we create a new batch per file. We should consider using the same batch when adding multiple small files. |
(ipfs/kubo#4296) 1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes ipfs/kubo#898 (comment)
(ipfs/kubo#4296) 1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes ipfs/kubo#898 (comment)
This makes
ipfs add --local~3.5x faster with the flatfs datastore (untestedwith badger).
fixes #898 (comment)
License: MIT
Signed-off-by: Steven Allen steven@stebalien.com