FUSE: reflect deduplication in allocated blocks#184
FUSE: reflect deduplication in allocated blocks#184dnnr wants to merge 1 commit intojborg:masterfrom dnnr:fuse-filesizes
Conversation
Instead of giving all files a fixed block count of 1, this assigns each deduplicated chunk to a certain file. In effect, the cumulative file size that is shown in the mountpoint accurately reflects the amount of actual disk space needed for the repository (barring metadata overhead). Although the block assignment is done arbitrarily, depending on the user's access pattern, the sizes will be consistent within the entire mount point. This facilitates the use of tools like du and ncdu for inspecting the actual disk usage in a repository as opposed to just looking at the original, uncompressed, non-deduplicated file sizes.
|
can we have some opinions here about this PR? is there a chance that this might confuse users, if the blocks are more or less random compared to the original filesize? |
|
On the one hand, yes. But on the other hand, those values are currently simply set to 1, i.e., they're mostly wrong and meaningless anyway. And more importantly: I'd say that the semantics of that field are actually correct this way. It's supposed represent the "size used on disk" and therefore supposed to be potentially arbitrarily different from the nominal file size exactly because of the effects caused by compression, deduplication, sparse files, or whatever else is going on in the underlying file system. So of course someone might claim to be confused by those values, but I actually can't think of any better way of populating |
Instead of giving all files a fixed block count of 1, this assigns each
deduplicated chunk to a certain file. In effect, the cumulative file
size that is shown in the mountpoint accurately reflects the amount of
actual disk space needed for the repository (barring metadata overhead).
Although the block assignment is done arbitrarily, depending on the
user's access pattern, the sizes will be consistent within the entire
mount point. This facilitates the use of tools like du and ncdu for
inspecting the actual disk usage in a repository as opposed to just
looking at the original, uncompressed, non-deduplicated file sizes.