Add CacheFile#fsync() method to ensure cached data are written on disk#64201
Add CacheFile#fsync() method to ensure cached data are written on disk#64201tlrx merged 9 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
| try { | ||
| // check again if the file is evicted before doing expensive I/O | ||
| ensureOpen(); | ||
| IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
This method is intended to be called periodically by a task that will fsync all cache files of a given shard before fsyncing the parent directory.
original-brownbear
left a comment
There was a problem hiding this comment.
Just two questions, looks good overall
| try { | ||
| // check again if the file is evicted before doing expensive I/O | ||
| ensureOpen(); | ||
| IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
I wonder, can't we just call force on the file channel that we already have open if we have one and avoid fsync if we don't have one by fsyncing before closing the channel (would have to enhance acquireFileChannelReference to return null in case there's no eviction listeners on the current instance I guess)? No need to get a new file descriptor here maybe is there?
There was a problem hiding this comment.
Yes, I also thought about borrowing the current file channel if we have one but I found that it added more complexity (mostly in incref/decref the file channel another time) to save a file descriptor that would be quickly released.
I can maybe give it another try by implementing something that use the file channel if there's one or use IOUtils.fsync() otherwise. Deferring the fsync at closing time does not seem the right thing to do as we don't want to fsync everytime a chunk is written but instead control when cache files are fsynced.
| reference.decRef(); | ||
| } | ||
| gap.onCompletion(); | ||
| fsynced.set(false); |
There was a problem hiding this comment.
Might be better to set this false as early as possible (i.e. directly after the write to the file) to reduce contention statistically speaking? (since fsync and write will always be blocking each other anyway even if we do use multiple FileChannel instances I think).
There was a problem hiding this comment.
I like that fsynced not only indicate that something was written to the file but that ranges are also updated - although it is not a requirement. I'm not sure if it makes a big difference but I can move this earlier.
There was a problem hiding this comment.
Ah that's a good point :)
original-brownbear
left a comment
There was a problem hiding this comment.
Either way (regarding the file channels) LGTM, that comment does only matter if the fsync interval we will use is small I guess :)
henningandersen
left a comment
There was a problem hiding this comment.
I added a few comments, looking good otherwise.
| final List<Tuple<Long, Long>> completedRanges = tracker.getCompletedRanges(); | ||
| assert completedRanges != null; | ||
|
|
||
| if (fsynced.compareAndSet(false, true)) { |
There was a problem hiding this comment.
I think there is an assumption of only ever going here in one thread, since otherwise we risk a concurrent thread seeing the completedRanges. I think that is fair, but perhaps for safety we should either lock the entire method or assert something to capture this?
There was a problem hiding this comment.
I think this is OK, but your comment and your suggestion about returning an empty set of ranges if the cache file got evicted made me change a bit the semantic of this new fsync method.
I pushed d55278e which makes the method return a non-empty set of completed ranges only if the cache file was actually fsynced on disk; otherwise it returns an empty collection. This should address the case of concurrent threads and also simplify the logic to know if new ranges of data were effectively synchronized on disk or not.
| try { | ||
| // check again if the file is evicted before doing expensive I/O | ||
| ensureOpen(); | ||
| IOUtils.fsync(file, false); // TODO don't forget to fsync parent directory |
There was a problem hiding this comment.
This also fsync's the metadata of the file. I think that is unnecessary?
There was a problem hiding this comment.
Agreed - it is an extra I/O that we don't need. I pushed 6eea900
| public void testFSync() throws Exception { | ||
| try (FSyncTrackingFileSystemProvider fileSystem = setupFSyncCountingFileSystem()) { | ||
| final CacheFile cacheFile = new CacheFile("test", randomLongBetween(100, 1000), fileSystem.resolve("test")); | ||
| assertFalse(cacheFile.isFSynced()); |
There was a problem hiding this comment.
I think counting the file as not fsync'ed when just created could lead to fsync'ing the empty file - which I think is unnecessary? We care mostly about the data, not the files.
| if (randomBoolean()) { | ||
| final long position = i * 10L; // simplify the test by completing small non-contiguous ranges | ||
| final Tuple<Long, Long> range = Tuple.tuple(position, position + 1L); | ||
| cacheFile.populateAndRead( |
There was a problem hiding this comment.
Would be good to also verify multiple populateAndRead calls within one fsync?
There was a problem hiding this comment.
Sure, I randomized this in the test now.
| * @throws AlreadyClosedException if the cache file is evicted | ||
| * @throws java.nio.file.NoSuchFileException if the cache file does not exist | ||
| */ | ||
| public List<Tuple<Long, Long>> fsync() throws IOException { |
There was a problem hiding this comment.
Should this return a SortedSet instead to signal the expected order verified by the test, as well as uniqueness?
There was a problem hiding this comment.
This is a good suggestion, thanks. I pushed 7b3590e
| } | ||
|
|
||
| cacheFile.startEviction(); | ||
| expectThrows(AlreadyClosedException.class, cacheFile::fsync); |
There was a problem hiding this comment.
This can definitely be handled in a follow-up instead when you start using it. But it might be more appropriate to just return the empty set of ranges in this case, since we essentially having nothing cached for it, rather than have to exception handle in the fsync'er thread?
There was a problem hiding this comment.
I agree. In case of an evicted file the method will now return an empty collection. See #64201 (comment)
| if (fsynced.compareAndSet(false, true)) { | ||
| boolean success = false; | ||
| try { | ||
| // check again if the file is evicted before doing expensive I/O |
There was a problem hiding this comment.
I wonder if this is necessary, since there is no waiting above?
|
@henningandersen Thanks for your review. I updated the code and changed a bit the semantic of the fsync method. Can you please have another look and tell me what you think? Thanks! |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM, thanks for the extra iteration.
| * considered as fsynced (the initialization value is {@code true}) when it is created; and writing new data to the cache file | ||
| * will toggle the flag to {@code false}. | ||
| **/ | ||
| private final AtomicBoolean fsynced = new AtomicBoolean(true); |
There was a problem hiding this comment.
I wonder if calling this needsFsync (and reverting the boolean) would be slightly more intuitive?
With the current name, a reader would be inclined to think the boolean is only true when the file is fsync'ed, but this is not true (it is also still true while writing and completing the gap).
There was a problem hiding this comment.
Thanks, it makes sense. I pushed d8fdaaa2aab25e78033906f2fa8c956a6e340441.
|
Thanks Armin and Henning! I'm merging this only on master for now and will backport this change and the futures one about the persistent cache in one go. |
…iles (#64696) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting).
elastic#64201) This commit adds a new CacheFile.fsync() method to fsync the cache file on disk. This method uses the utility method IOUtils.fsync(Path, boolean) that executes a FileChannel.force() call under the hood.
…iles (elastic#64696) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in elastic#64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting).
…files #64696 (#66216) This committ changes the searchable snapshot's CacheService so that it now periodically fsync cache files using the method introduced in #64201. The synchronization is executed every 10 seconds by default (this interval can be changed using a new xpack.searchable.snapshot.cache.sync.interval setting). Backport of #64696 for 7.11
This pull request adds a new
CacheFile.fsync()method to fsync the cache file on disk. This method uses the utility methodIOUtils.fsync(Path, boolean)that executes aFileChannel.force()call under the hood.