Realtime get from in-memory segment when possible#64504
Realtime get from in-memory segment when possible#64504dnhatn merged 14 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Engine) |
henningandersen
left a comment
There was a problem hiding this comment.
Thanks @dnhatn , this looks good. I have only looked at the main part of the PR for now and have a single comment/question.
server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
henningandersen
left a comment
There was a problem hiding this comment.
A few smaller comments, otherwise looking good.
server/src/main/java/org/elasticsearch/index/engine/SingleDocDirectoryReader.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public CacheHelper getReaderCacheHelper() { | ||
| return in.getReaderCacheHelper(); |
There was a problem hiding this comment.
Might mostly be for my understanding, but since the wrapped directory reader could return different results from the inner reader, should we not return null here?
There was a problem hiding this comment.
Good catch. We should return null here as we modify the liveDocs. I addressed it in eb968e4.
There was a problem hiding this comment.
I had to reverted this and added a comment instead. Please see: e176bf2
server/src/main/java/org/elasticsearch/index/engine/Engine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/Engine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/SingleDocDirectoryReader.java
Outdated
Show resolved
Hide resolved
|
@henningandersen Are you okay with the latest changes? |
server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
Show resolved
Hide resolved
|
@henningandersen @jpountz Thanks for your reviews. |
If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.
Reading from translog during a realtime get requires special handling in some higher level components, e.g. ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory index of the document, even when this directory wrapper exists. This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g. access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is created on-demand. Relates #64504
Reading from translog during a realtime get requires special handling in some higher level components, e.g. ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory index of the document, even when this directory wrapper exists. This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g. access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is created on-demand. Relates #64504
If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.