Skip to content

Remove Specific Atomic Write Method from BlobContainer#43329

Closed
original-brownbear wants to merge 2 commits intoelastic:masterfrom
original-brownbear:remove-non-atomic-write
Closed

Remove Specific Atomic Write Method from BlobContainer#43329
original-brownbear wants to merge 2 commits intoelastic:masterfrom
original-brownbear:remove-non-atomic-write

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

  • Extracted from Remove BlobContainer APIs that Are not Universally Applicable #42886
  • Having both write and writeAtomic makes no sense. On Cloud providers all writes are atomic and on FS and HDFS the added cost of a move and directory fsync operation is hardly relevant.
  • On the other hand, allowing for partial writes (though extremely unlikely in practice) simply has the potential of creating more garbage blobs that aren't even deserializable. This change neatly ensures that every blob by a non-temp name has been fully written and fsynced (in case of NFS + HDFS).

* Extracted from #42886
* Having both write and writeAtomic makes no sense. On Cloud providers all writes are atomic and on FS and HDFS the added cost of a move and directory fsync operation is hardly relevant.
* On the other hand, allowing for partial writes (though extremely unlikely in practice) simply has the potential of creating more garbage blobs that aren't even deserializable. This change neatly ensures that every blob by a non-temp name has been fully written and fsynced (in case of NFS + HDFS).
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.3.0 labels Jun 18, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@original-brownbear
Copy link
Copy Markdown
Contributor Author

@tlrx thanks!

@ywelsch can you double check this one please? I think this needs 2 reviewers :)

@ywelsch
Copy link
Copy Markdown
Contributor

ywelsch commented Jun 20, 2019

I think this brings the risk now of creating more garbage files at the top-level, which are not cleaned up.

@original-brownbear
Copy link
Copy Markdown
Contributor Author

original-brownbear commented Jun 21, 2019

@ywelsch

I think this brings the risk now of creating more garbage files at the top-level, which are not cleaned up.

I don't think that's true. We are currently not cleaning up top level files anyway, on broken snapshots or deletes we still leak top level snap-{}.dat etc. All this changes is that we'd be leaking temp- blobs instead of non-temp named blobs. If anything I'd say that makes it easier to run cleanup :) the number of blobs def. does not go up imo.

PS: I will open a PR to clean up the top level situation very shortly anyway :)

@jpountz jpountz added v7.4.0 and removed v7.3.0 labels Jul 3, 2019
@ywelsch
Copy link
Copy Markdown
Contributor

ywelsch commented Jul 4, 2019

I think this brings the risk now of creating more garbage files at the top-level, which are not cleaned up.

I don't think that's true. We are currently not cleaning up top level files anyway, on broken snapshots or deletes we still leak top level snap-{}.dat etc

You're right.

on FS and HDFS the added cost of a move and directory fsync operation is hardly relevant.

On HDFS the cost will be higher. Each metadata request can take a few hundred milliseconds to be processed by a NameNode, so this might make snapshotting on HDFS slower.

Another problem is that the write_once semantics of blobs will be harder to guarantee. The move operation on FS as well as HDFS does not guarantee the same semantics as CREATE_NEW.

The Javadocs for FIles.move say that

If the target file exists then it is
     *     implementation specific if the existing file is replaced or this method
     *     fails by throwing an {@link IOException}.

This is also why we're doing a best-effort check in FSBlobContainer.moveBlobAtomic.

I'm a bit torn on this. On one hand I think removing the extra method is good. On the other hand, it gives us worse guarantees and slower backups on HDFS.

@original-brownbear
Copy link
Copy Markdown
Contributor Author

On HDFS the cost will be higher. Each metadata request can take a few hundred milliseconds to be processed by a NameNode, so this might make snapshotting on HDFS slower.

Yea that's true, but in the end how relevant is this for overall performance in relative terms?

In the end we're doubling the number (not even because we already use atomic write in a bunch of spots) of metadata requests and keep everything else constant. Especially if we also get #42791 in (parallelize snapshots on the file not the shard level) I think the effect will not be that relevant from the user perspective. But even without I'd be surprised if metadata requests were the bulk of the snapshot cost at the moment and if they aren't then the effect isn't that big.

Another problem is that the write_once semantics of blobs will be harder to guarantee.

See my other PR on this topic: #42886 I don't think we should work with write_once any more due to the difficulties it introduces when it comes to S3 (and some NFS).

@colings86 colings86 added v7.5.0 and removed v7.4.0 labels Aug 30, 2019
@original-brownbear
Copy link
Copy Markdown
Contributor Author

original-brownbear commented Sep 24, 2019

Closing this as it will become irrelevant as a result of #46250 and follow-ups (since we'll use the CS for consistency of the repository contents) and we can simply remove the atomic-write methods once they become unused.

@original-brownbear original-brownbear removed :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.5.0 v8.0.0 labels Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants