Speed up PK lookups at index time.#19856
Merged
jpountz merged 2 commits intoelastic:masterfrom Jun 15, 2017
Merged
Conversation
Contributor
Author
|
This is basically a revert of #14070. I'm stalling this on making cache keys / closed listeners less trappy. |
Collaborator
|
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
Member
|
@jpountz Is this PR worth reviving? |
Contributor
Author
|
Absolutely! |
At index time Elasticsearch needs to look up the version associated with the `_id` of the document that is being indexed, which is often the bottleneck for indexing. While reviewing the output of the `jfr` telemetry from a Rally benchmark, I saw that significant time was spent in `ConcurrentHashMap#get` and `ThreadLocal#get`. The reason is that we cache lookup objects per thread and segment, and for every indexed document, we first need to look up the cache associated with this segment (`ConcurrentHashMap#get`) and then get a state that is local to the current thread (`ThreadLocal#get`). So if you are indexing N documents per second and have S segments, both these methods will be called N*S times per second. This commit changes version lookup to use a cache per index reader rather than per segment. While this makes cache entries live for less long, we now only need to do one call to `ConcurrentHashMap#get` and `ThreadLocal#get` per indexed document.
896fa1f to
fc1db9e
Compare
Contributor
Author
|
I rebased this PR! |
s1monw
approved these changes
Jun 15, 2017
Contributor
s1monw
left a comment
There was a problem hiding this comment.
left a small comment LGTM otherwise
| IndexReader.CacheHelper cacheHelper = reader.getCoreCacheHelper(); | ||
| CloseableThreadLocal<PerThreadIDVersionAndSeqNoLookup> ctl = lookupStates.get(cacheHelper.getKey()); | ||
| private static PerThreadIDVersionAndSeqNoLookup[] getLookupState(IndexReader reader, String uidField) throws IOException { | ||
| // We cache on the top level |
Contributor
There was a problem hiding this comment.
can you leave a comment why we moved to this? you can also reference this PR for documentation purposes
Contributor
Author
There was a problem hiding this comment.
+1 I had more comments initially but they apparently got lost with the rebase
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Jun 15, 2017
* master: Upgrade icu4j for the ICU analysis plugin to 59.1 (elastic#25243) move assertBusy to use CheckException (elastic#25246) Use SPI in High Level Rest Client to load XContent parsers (elastic#25098) [TEST] test that low level REST client leaves path untouched (elastic#25193) Speed up PK lookups at index time. (elastic#19856) [Docs] Fix documentation for percentiles bucket aggregation (elastic#25229) Upgrade to lucene-7.0.0-snapshot-92b1783. (elastic#25222) Build: Add master flag for disabling bwc tests (elastic#25230) Scripting: Rename SearchScript.needsScores to needs_score (elastic#25235) Support script context stateful factory in Painless. (elastic#25233) FastVectorHighlighter should not cache the field query globally (elastic#25197) Remove QUERY_AND_FETCH BWC for pre-5.3.0 nodes (elastic#25223) Add more missing AggregationBuilder getters (elastic#25198) Extract the snapshot/restore full cluster restart tests from the translog full cluster restart tests (elastic#25204)
jasontedor
added a commit
to glefloch/elasticsearch
that referenced
this pull request
Jun 15, 2017
* master: (44 commits) Upgrade icu4j for the ICU analysis plugin to 59.1 (elastic#25243) move assertBusy to use CheckException (elastic#25246) Use SPI in High Level Rest Client to load XContent parsers (elastic#25098) [TEST] test that low level REST client leaves path untouched (elastic#25193) Speed up PK lookups at index time. (elastic#19856) [Docs] Fix documentation for percentiles bucket aggregation (elastic#25229) Upgrade to lucene-7.0.0-snapshot-92b1783. (elastic#25222) Build: Add master flag for disabling bwc tests (elastic#25230) Scripting: Rename SearchScript.needsScores to needs_score (elastic#25235) Support script context stateful factory in Painless. (elastic#25233) FastVectorHighlighter should not cache the field query globally (elastic#25197) Remove QUERY_AND_FETCH BWC for pre-5.3.0 nodes (elastic#25223) Add more missing AggregationBuilder getters (elastic#25198) Extract the snapshot/restore full cluster restart tests from the translog full cluster restart tests (elastic#25204) Refactor TransportShardBulkAction.executeUpdateRequest and add tests Make sure range queries are correctly profiled. (elastic#25108) Test: allow setting socket timeout for rest client (elastic#25221) Migration docs for elastic#25080 (elastic#25218) Remove `discovery.type` BWC layer from the EC2/Azure/GCE plugins elastic#25080 When stopping via systemd only kill the JVM, not its control group (elastic#25195) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
At index time Elasticsearch needs to look up the version associated with the
_uidof the document that is being indexed, which is often the bottleneck forindexing.
While reviewing the output of the
jfrtelemetry from a Rally benchmark, I sawthat significant time was spent in
ConcurrentHashMap#getandThreadLocal#get.The reason is that we cache lookup objects per thread and segment, and for every
indexed document, we first need to look up the cache associated with this
segment (
ConcurrentHashMap#get) and then get a state that is local to thecurrent thread (
ThreadLocal#get). So if you are indexing N documents persecond and have S segments, both these methods will be called N*S times per
second.
This commit changes version lookup to use a cache per index reader rather than
per segment. While this makes cache entries live for less long, we now only need
to do one call to
ConcurrentHashMap#getandThreadLocal#getper indexeddocument.
NOTE: I had to remove
IndexReader#getCombinedCoreAndDeletesKeyfrom theforbidden APIs in order to make it work.
Here are some screenshots of the hot methods as reported when indexing 100M empty documents with rally using 1 shard, 0 replicas and otherwise default settings.


Before:
After:
ThreadLocal#getis no longer among the hottest methods after the change, and whileConcurrentHashMap#getis still there, its main callers are now the live version map and the circuit breaker, not the state we maintain for version lookups in the index.Here is the report from
rally compareon two races that have been run without telemetry (so that it does not affect the benchmark results). The speedup looks real.