Skip to content

Realtime get from in-memory segment when possible#64504

Merged
dnhatn merged 14 commits intoelastic:masterfrom
dnhatn:get-in-memory
Nov 10, 2020
Merged

Realtime get from in-memory segment when possible#64504
dnhatn merged 14 commits intoelastic:masterfrom
dnhatn:get-in-memory

Conversation

@dnhatn
Copy link
Copy Markdown
Member

@dnhatn dnhatn commented Nov 2, 2020

If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.

@dnhatn dnhatn added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.11.0 labels Nov 2, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Nov 2, 2020
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnhatn , this looks good. I have only looked at the main part of the PR for now and have a single comment/question.

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller comments, otherwise looking good.


@Override
public CacheHelper getReaderCacheHelper() {
return in.getReaderCacheHelper();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might mostly be for my understanding, but since the wrapped directory reader could return different results from the inner reader, should we not return null here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We should return null here as we modify the liveDocs. I addressed it in eb968e4.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reverted this and added a comment instead. Please see: e176bf2

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Nov 9, 2020

@henningandersen Are you okay with the latest changes?

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dnhatn
Copy link
Copy Markdown
Member Author

dnhatn commented Nov 10, 2020

@henningandersen @jpountz Thanks for your reviews.

@dnhatn dnhatn merged commit 33b408f into elastic:master Nov 10, 2020
@dnhatn dnhatn deleted the get-in-memory branch November 10, 2020 00:38
dnhatn added a commit that referenced this pull request Nov 10, 2020
If the reader wrapper is specified, then we can't perform a realtime get
using operations from translog. With this change, we will create an
in-memory Lucene segment from that indexing operation and perform a
realtime get from that segment to avoid refresh storms.
dnhatn added a commit that referenced this pull request Nov 10, 2020
ywelsch added a commit that referenced this pull request Jul 1, 2021
Reading from translog during a realtime get requires special handling in some higher level components, e.g.
ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with
the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always
applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory
index of the document, even when this directory wrapper exists.

This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a
realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve
the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g.
access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is
created on-demand.

Relates #64504
ywelsch added a commit that referenced this pull request Jul 1, 2021
Reading from translog during a realtime get requires special handling in some higher level components, e.g.
ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with
the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always
applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory
index of the document, even when this directory wrapper exists.

This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a
realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve
the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g.
access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is
created on-demand.

Relates #64504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement Team:Distributed Meta label for distributed team. v7.11.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants