Skip to content

Frozen Prototype File Chunk Writing/Reading Infrastructure#68270

Merged
original-brownbear merged 8 commits intoelastic:frozen-protofrom
original-brownbear:frozen-proto-just-infra
Feb 1, 2021
Merged

Frozen Prototype File Chunk Writing/Reading Infrastructure#68270
original-brownbear merged 8 commits intoelastic:frozen-protofrom
original-brownbear:frozen-proto-just-infra

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

This adds all the necessary infrastructure to use the reusable, single-file cache in practice:

  • Create cache file in a data directory instead of a temp directory
  • Fully pre-allocate it (the existing solution would at least on Linux still do a sparse allocation)
  • Manage file channel resource by ref counting
  • Add minimal abstraction in place of exposing FileChannel to allow for adjusting the concrete paging approach under the hood in a follow-up

I'd open a follow-up for the more complicated logical changes introducing two page sizes based on this (almost done with those, but I hope/think this gives us what we need for initial experiments meanwhile and is easier to review in isolation).

@original-brownbear original-brownbear added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Feb 1, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Feb 1, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@Override
protected void closeInternal() {
try {
IOUtils.close(fileChannel, path == null ? null : () -> Files.deleteIfExists(path));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't super sure here, as long as we don't persist the contents of this file in a way that allows for reusing it I figured there was no point in keeping it around across restarts (except for making start-up faster). It also felt safer to delete here. Otherwise configuring a large file and then failing to allocate it (because of concurrent disk space usage changes or so) would cause the disk to be left full even though the node fails to start up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't need to persist this file across restarts. We should also make sure it's cleaned up when fileSize == 0 after restart

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed 046f998

final CacheKey cacheKey = createCacheKey(file.physicalName());
cacheService.removeFromCache(cacheKey);
frozenCacheService.removeFromCache(cacheKey);
if (partial) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add this to make some tests pass with today's changes.

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Jenkins run elasticsearch-ci/bwc (known JDBC issue)

// write one byte at the end of the file to make sure all bytes are allocated
fileChannel.write(ByteBuffer.allocate(1), fileSize - 1);
for (Path path : environment.dataFiles()) {
// TODO: be resilient to this check failing and try next path?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would split this up in some form across the available data paths?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so (on the splitting up) but I wonder if we should even continue to start up if a given data path is broken?
I certainly see the potential performance benefit of using multiple disks and it's probably nice to be a little more even in the disk use as well.
But we should do this in the follow-up if we want it once we're using multiple file channels I think to not blow this up too much.

if (io == null || io.tryIncRef() == false) {
final IO newIO;
boolean success = false;
incRef();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryIncref already incremented by 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for SharedBytes not IO. We have a ref count on the SharedBytes here that is incremented for each IO we created and one for the IO itself. The IO will decrement shared bytes once its fully released.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we should have tests for these eventually as well.

@Override
protected void closeInternal() {
try {
IOUtils.close(fileChannel, path == null ? null : () -> Files.deleteIfExists(path));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't need to persist this file across restarts. We should also make sure it's cleaned up when fileSize == 0 after restart

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Jenkins run elasticsearch-ci/1 (unrelated but important https://gradle-enterprise.elastic.co/s/whnbotirrvizq) I'll look into it

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Yannick + Tanguy!

@original-brownbear original-brownbear merged commit fdfbe5d into elastic:frozen-proto Feb 1, 2021
@original-brownbear original-brownbear deleted the frozen-proto-just-infra branch February 1, 2021 13:11
@original-brownbear original-brownbear restored the frozen-proto-just-infra branch April 18, 2023 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants