Do not execute listeners when progress is updated to the end of the range by tlrx · Pull Request #108095 · elastic/elasticsearch

tlrx · 2024-04-30T14:25:26Z

ProgressListenableActionFuture executes listeners once the progress is updated to a value the listener is waiting for.

But when a listener waits for the exact end of a range/gap, there is no need to execute it at the time the progress is updated to the end of the range. Instead we can leave the listener and it will be executed when the range/gap is completed (which should happen just after).

I'd like to propose this change because we can have read listeners that use CacheFile#tryRead which relies on SparseFileTracker#checkAvailable and the volatile SparseFileTracker#complete field, which is only updated when the range is completed, just before executing listeners.

If listeners are executed when the progress is updated to the end of the range, they may call tryRead before the SparseFileTracker#complete field is updated, making the fast read path fails.

…ange

…gress-is-updated-to-end

elasticsearchmachine · 2024-05-07T12:42:02Z

Pinging @elastic/es-distributed (Team:Distributed)

…gress-is-updated-to-end

arteam · 2024-05-30T11:22:56Z

...b-cache/src/main/java/org/elasticsearch/blobcache/common/ProgressListenableActionFuture.java

 */
 class ProgressListenableActionFuture extends PlainActionFuture<Long> {

+    private record PositionAndListener(Long position, ActionListener<Long> listener) {}


Nit: I believe position can be a primitive long?

Yes, I pushed 9b857ab. Thanks!

henningandersen

LGTM.

This makes sense to do. I wonder if we should also separately try to progress the complete marker to allow the fast path read while filling the rest of the region with data, before the gap completes (i.e., based on progress signals)?

henningandersen · 2024-05-30T10:52:17Z

...b-cache/src/main/java/org/elasticsearch/blobcache/common/ProgressListenableActionFuture.java

 */
 class ProgressListenableActionFuture extends PlainActionFuture<Long> {

+    private record PositionAndListener(Long position, ActionListener<Long> listener) {}


Suggested change

private record PositionAndListener(Long position, ActionListener<Long> listener) {}

private record PositionAndListener(long position, ActionListener<Long> listener) {}

I pushed 9b857ab, thanks

…gress-is-updated-to-end

tlrx · 2024-05-30T15:05:46Z

Thanks Artem and Henning!

I wonder if we should also separately try to progress the complete marker to allow the fast path read while filling the rest of the region with data, before the gap completes (i.e., based on progress signals)?

I was looking at something similar, yes. In the meanwhile I opened #109212.

ywangd · 2024-05-31T06:10:49Z

...b-cache/src/main/java/org/elasticsearch/blobcache/common/ProgressListenableActionFuture.java

+        if (progressValue == end) {
+            return; // reached the end of the range, listeners will be completed by {@link #onResponse(Long)}
+        }


Sorry for my ignorance. If a listener waits for a progressValue lower than the end, it will still be invoked before onResponse is called. Is this not a problem? If the listener also uses tryRead to access data, can it fail at SparseFileTracker#checkAvailable because complete is not updated? Or maybe it is not possible for such listener to go through the tryRead route? In fact, I am not even sure how a listener waiting for the entire range can subsequent go through the tryRead route. Any pointer is appreciated!

Sorry for my ignorance.

No ignorance here, just good questions :)

If a listener waits for a progressValue lower than the end, it will still be invoked before onResponse is called. Is this not a problem?

Not really a problem for the current read operation; it waited for a given range to be available because it wasn't, and it will be executed as soon as it is available. It may be an issue for a following tryRead (if any), which may also go on the slow read path (because complete may not be updated at the time it executes) and will also waits for the range to be available or will execute immediately if the range is covered by the write progress. Note that Lucene often uses 1kb or 4kb buffers to reads files, so Lucene can usually make 1 or more reads before having to wait a bit more.

I'm looking at ways to optimize the complete field update in my spacetime. Last time we talked about this it was too costly to update it on every write on all ranges, but we can maybe find a good compromise by updating for every write only if the range to be written is garanteed to make the complete advances (that is, all ranges before the one we write are already available).

A couple options spring to mind:

Only update complete in assocation with firing listeners. Since if we tell nobody about the progress, there is no need to update complete.

We can check that the start of the range/gap is >= complete and update complete to progress if it is.

Thanks Henning, those are two good ideas which I had not thought of. Worth investigating.

@henningandersen I created the draft #109247, let me know what you think 🙏🏻

…le failing Test required some adjustments now listeners are not completed on progress update if they are waiting up to the end of the gap. Relates elastic#108095 Closes elastic#109237

…le failing (#109239) Test required some adjustments now listeners are not completed on progress update if they are waiting up to the end of the gap. Relates #108095 Closes #109237

Do not execute listeners when progress is updated to the end of the r…

e0e4444

…ange

elasticsearchmachine added the v8.15.0 label Apr 30, 2024

tlrx added 2 commits May 7, 2024 09:42

Merge branch 'main' into 2024/04/30/do-not-execute-listeners-when-pro…

3b92576

…gress-is-updated-to-end

PositionAndListener

18af1e0

tlrx marked this pull request as ready for review May 7, 2024 09:12

tlrx requested review from henningandersen and original-brownbear May 7, 2024 09:13

elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 7, 2024

tlrx added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels May 7, 2024

elasticsearchmachine added Team:Distributed Meta label for distributed team. and removed needs:triage Requires assignment of a team area label labels May 7, 2024

Merge branch 'main' into 2024/04/30/do-not-execute-listeners-when-pro…

df56f96

…gress-is-updated-to-end

tlrx requested review from fcofdez, kingherc and ywangd May 30, 2024 10:48

arteam self-requested a review May 30, 2024 11:19

arteam reviewed May 30, 2024

View reviewed changes

henningandersen approved these changes May 30, 2024

View reviewed changes

tlrx added 2 commits May 30, 2024 15:49

Merge branch 'main' into 2024/04/30/do-not-execute-listeners-when-pro…

0ad721f

…gress-is-updated-to-end

long

9b857ab

tlrx merged commit d367c11 into elastic:main May 30, 2024

tlrx deleted the 2024/04/30/do-not-execute-listeners-when-progress-is-updated-to-end branch May 30, 2024 15:00

ywangd reviewed May 31, 2024

View reviewed changes

tlrx mentioned this pull request May 31, 2024

[Test] Fix SparseFileTrackerTests.testCallsListenerWhenRangeIsAvailable failing #109239

Merged

ywangd mentioned this pull request Jun 6, 2024

Update sparse file tracker complete pointer on progress #109247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not execute listeners when progress is updated to the end of the range#108095

Do not execute listeners when progress is updated to the end of the range#108095
tlrx merged 6 commits intoelastic:mainfrom
tlrx:2024/04/30/do-not-execute-listeners-when-progress-is-updated-to-end

tlrx commented Apr 30, 2024 •

edited

Loading

Uh oh!

elasticsearchmachine commented May 7, 2024

Uh oh!

arteam May 30, 2024

Uh oh!

tlrx May 30, 2024

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen May 30, 2024

Uh oh!

tlrx May 30, 2024

Uh oh!

tlrx commented May 30, 2024

Uh oh!

ywangd May 31, 2024

Uh oh!

tlrx May 31, 2024

Uh oh!

henningandersen May 31, 2024

Uh oh!

tlrx May 31, 2024

Uh oh!

tlrx May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	private record PositionAndListener(Long position, ActionListener<Long> listener) {}
	private record PositionAndListener(long position, ActionListener<Long> listener) {}

Conversation

tlrx commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented May 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented May 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tlrx commented Apr 30, 2024 •

edited

Loading