Skip to content

Do not execute listeners when progress is updated to the end of the range#108095

Merged
tlrx merged 6 commits intoelastic:mainfrom
tlrx:2024/04/30/do-not-execute-listeners-when-progress-is-updated-to-end
May 30, 2024
Merged

Do not execute listeners when progress is updated to the end of the range#108095
tlrx merged 6 commits intoelastic:mainfrom
tlrx:2024/04/30/do-not-execute-listeners-when-progress-is-updated-to-end

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Apr 30, 2024

ProgressListenableActionFuture executes listeners once the progress is updated to a value the listener is waiting for.

But when a listener waits for the exact end of a range/gap, there is no need to execute it at the time the progress is updated to the end of the range. Instead we can leave the listener and it will be executed when the range/gap is completed (which should happen just after).

I'd like to propose this change because we can have read listeners that use CacheFile#tryRead which relies on SparseFileTracker#checkAvailable and the volatile SparseFileTracker#complete field, which is only updated when the range is completed, just before executing listeners.

If listeners are executed when the progress is updated to the end of the range, they may call tryRead before the SparseFileTracker#complete field is updated, making the fast read path fails.

@tlrx tlrx marked this pull request as ready for review May 7, 2024 09:12
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 7, 2024
@tlrx tlrx added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels May 7, 2024
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team. and removed needs:triage Requires assignment of a team area label labels May 7, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx tlrx requested review from fcofdez, kingherc and ywangd May 30, 2024 10:48
@arteam arteam self-requested a review May 30, 2024 11:19
*/
class ProgressListenableActionFuture extends PlainActionFuture<Long> {

private record PositionAndListener(Long position, ActionListener<Long> listener) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I believe position can be a primitive long?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I pushed 9b857ab. Thanks!

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This makes sense to do. I wonder if we should also separately try to progress the complete marker to allow the fast path read while filling the rest of the region with data, before the gap completes (i.e., based on progress signals)?

*/
class ProgressListenableActionFuture extends PlainActionFuture<Long> {

private record PositionAndListener(Long position, ActionListener<Long> listener) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private record PositionAndListener(Long position, ActionListener<Long> listener) {}
private record PositionAndListener(long position, ActionListener<Long> listener) {}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 9b857ab, thanks

@tlrx tlrx merged commit d367c11 into elastic:main May 30, 2024
@tlrx tlrx deleted the 2024/04/30/do-not-execute-listeners-when-progress-is-updated-to-end branch May 30, 2024 15:00
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented May 30, 2024

Thanks Artem and Henning!

I wonder if we should also separately try to progress the complete marker to allow the fast path read while filling the rest of the region with data, before the gap completes (i.e., based on progress signals)?

I was looking at something similar, yes. In the meanwhile I opened #109212.

Comment on lines +82 to +84
if (progressValue == end) {
return; // reached the end of the range, listeners will be completed by {@link #onResponse(Long)}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my ignorance. If a listener waits for a progressValue lower than the end, it will still be invoked before onResponse is called. Is this not a problem? If the listener also uses tryRead to access data, can it fail at SparseFileTracker#checkAvailable because complete is not updated? Or maybe it is not possible for such listener to go through the tryRead route? In fact, I am not even sure how a listener waiting for the entire range can subsequent go through the tryRead route. Any pointer is appreciated!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my ignorance.

No ignorance here, just good questions :)

If a listener waits for a progressValue lower than the end, it will still be invoked before onResponse is called. Is this not a problem?

Not really a problem for the current read operation; it waited for a given range to be available because it wasn't, and it will be executed as soon as it is available. It may be an issue for a following tryRead (if any), which may also go on the slow read path (because complete may not be updated at the time it executes) and will also waits for the range to be available or will execute immediately if the range is covered by the write progress. Note that Lucene often uses 1kb or 4kb buffers to reads files, so Lucene can usually make 1 or more reads before having to wait a bit more.

I'm looking at ways to optimize the complete field update in my spacetime. Last time we talked about this it was too costly to update it on every write on all ranges, but we can maybe find a good compromise by updating for every write only if the range to be written is garanteed to make the complete advances (that is, all ranges before the one we write are already available).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple options spring to mind:

  1. Only update complete in assocation with firing listeners. Since if we tell nobody about the progress, there is no need to update complete.
  2. We can check that the start of the range/gap is >= complete and update complete to progress if it is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Henning, those are two good ideas which I had not thought of. Worth investigating.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen I created the draft #109247, let me know what you think 🙏🏻

tlrx added a commit to tlrx/elasticsearch that referenced this pull request May 31, 2024
…le failing

Test required some adjustments now listeners are
not completed on progress update if they are waiting
up to the end of the gap.

Relates elastic#108095
Closes elastic#109237
tlrx added a commit that referenced this pull request May 31, 2024
…le failing (#109239)

Test required some adjustments now listeners are
not completed on progress update if they are waiting
up to the end of the gap.

Relates #108095
Closes #109237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed Meta label for distributed team. v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants