Skip to content

Make SourceProvider using stored fields segment-thread-safe#95082

Merged
romseygeek merged 7 commits intoelastic:mainfrom
romseygeek:source/segment-thread-safety
Apr 12, 2023
Merged

Make SourceProvider using stored fields segment-thread-safe#95082
romseygeek merged 7 commits intoelastic:mainfrom
romseygeek:source/segment-thread-safety

Conversation

@romseygeek
Copy link
Copy Markdown
Contributor

This adds a per-segment cache to the SourceProvider returned by calling
SourceProvider.fromStoredFields(). Previously, this source provider could
only be safely accessed from a single thread. With this new cache, we now
have a separate saved Source for each segment, making this safe to use for
concurrent searches that still use a single thread for each segment.

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.8.0 labels Apr 6, 2023
@romseygeek romseygeek requested review from iverase and javanna April 6, 2023 14:57
@romseygeek romseygeek self-assigned this Apr 6, 2023
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Apr 6, 2023
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Copy Markdown
Contributor Author

In terms of synchronization cost, this adds a single volatile read to each call to getSource(), which given all the IO access and decompression happening elsewhere I think is unlikely to make an appreciable difference to performance.

@romseygeek
Copy link
Copy Markdown
Contributor Author

This will only effect search-time lookups, btw, because the FetchPhase doesn't use this standalone SourceProvider implementation (except in some fairly unusual circumstances - so it does help, but isn't a complete solution). We'll need to add a similar per-segment cache to the ProvidedSourceProvider in FetchPhase to get that working, but that won't need any locking at all so should be simpler to implement.

public Source getSource(LeafReaderContext ctx, int doc) throws IOException {
LeafStoredFieldSourceProvider[] leaves = getLeavesUnderLock(ctx.parent);
if (leaves[ctx.ord] == null) {
// individual segments are only accessed on one thread so there's no need
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true as far as concurrency is done at segment level. If/When inter segment concurrency is implemented, then we might need extra locking here? If this is true we might want to add it in the comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement within-segment concurrency then this will have to work completely differently I think. I'll add a comment though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! 👍

Copy link
Copy Markdown
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek romseygeek merged commit 458b937 into elastic:main Apr 12, 2023
@romseygeek romseygeek deleted the source/segment-thread-safety branch April 12, 2023 13:24
@javanna
Copy link
Copy Markdown
Contributor

javanna commented Apr 17, 2023

Quick question: does this effectively solve the concurrency issues for source based runtime fields when the search is executed concurrently across segments?

@romseygeek
Copy link
Copy Markdown
Contributor Author

@javanna it should do, yes. I remember that we muted some tests that were failing due to a lack of concurrency, but I haven't been able to find which ones they were to unmute them - @cbuescher might remember?

@cbuescher
Copy link
Copy Markdown
Member

I only remember touching concurrency with the cache cleaning around the _id metadata field recently, however afaik there wasn't a muted test for that. I will try to think about it some more but don't recall anything atm.

@iverase
Copy link
Copy Markdown
Contributor

iverase commented Apr 17, 2023

Maybe this?: #90712

romseygeek added a commit that referenced this pull request Apr 17, 2023
This commit reverts #90712, which disabled concurrency on runtime
field scripts. Since #95082 these scripts should be thread-safe, and so
we can use the standard test searcher methods which may randomly
add multi-threaded search.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.8.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants