Copy checkpoint atomically when rolling generation#35407
Copy checkpoint atomically when rolling generation#35407DaveCTurner merged 4 commits intoelastic:masterfrom
Conversation
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
|
Pinging @elastic/es-distributed |
|
This situation occurred in https://discuss.elastic.co/t/failed-shard-after-ooming-corrupt-index/155612/3. I recognise there's no tests for this change yet, because I don't know a good way to simulate this situation. Any ideas? |
|
God I don’t know how often I looked at this code and I missed that?! Code LGTM, regarding testing I think we can add a randomly throwing FS impl that’s corrupting files if they are not fsynced. We do this in some lucene testing directories but that’s a bigger change. I am ok with getting this in as is and start the conversation on a follow up issue |
bleskes
left a comment
There was a problem hiding this comment.
LGTM. It terms of testing, we have various tests that inject errors in TranslogTests. Did you check if we already cover the case where files can be created, we run into out of disk space and leave them empty?
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
Today when rolling a transog generation we copy the checkpoint from
translog.ckptotranslog-nnnn.ckpusing a simpleFiles.copy()followed byappropriate
fsync()calls. The copy operation is not atomic, so if we crashat the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.
In contrast, in
recoverFromFiles()we do copy the checkpoint atomically.This commit extracts the atomic copy operation from
recoverFromFiles()andre-uses it in
rollGeneration().