Speed up MappingStats Computation on Coordinating Node#82830
Speed up MappingStats Computation on Coordinating Node#82830original-brownbear merged 2 commits intoelastic:masterfrom original-brownbear:speed-up-mapping-stats
Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-data-management (Team:Data Management) |
henningandersen
left a comment
There was a problem hiding this comment.
Thanks @original-brownbear this is a nice optimization. I think our testing may be a bit thin towards this though, perhaps you can add a bit of randomized unittests for cases of shared and non-shared mappings for both analysis-stats and mapping-stats (or maybe I missed it, happy to be pointed to it instead).
There was a problem hiding this comment.
I would find it more intuitive to use an IdentityHashMp or just a hash-map (since hash-code/equals compare on the hash anyway). Is there a reason to use an explicit hash-value as key here?
There was a problem hiding this comment.
Right ... IdentityHashMap sounds nice and simplifies the loop a little as well saving another round of lookup :)
There was a problem hiding this comment.
I think we should also do ensureNotCancelled.run() here to ensure we can cancel here too.
There was a problem hiding this comment.
++ adding it back
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/MappingStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I think we can add a multiplier param to FieldScriptStats.update too to avoid the loop?
There was a problem hiding this comment.
++ that's cuter :)
server/src/test/java/org/elasticsearch/action/admin/cluster/stats/MappingStatsTests.java
Outdated
Show resolved
Hide resolved
|
Thanks for taking a look Henning!
Right it was quite thin indeed. We had a pretty extensive test case for the mapping stats that I reused for a non-shard test (not the most beautiful solution but I figured it was a reasonable cost-return tradeoff). |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM, thanks for the extra work on the tests.
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/MappingStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I wonder if we should just expose the size of the map, since that is all we need? Small point, but would be nice to keep this internal to this class.
There was a problem hiding this comment.
Hmm we could here but I have another PR inbound that needs this map. I left it as is for now, hope that's ok.
server/src/test/java/org/elasticsearch/action/admin/cluster/stats/AnalysisStatsTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/action/admin/cluster/stats/MappingStatsTests.java
Outdated
Show resolved
Hide resolved
|
Hi @original-brownbear, I've created a changelog YAML for you. |
|
@elasticmachine update branch |
|
Hi @original-brownbear, I've created a changelog YAML for you. |
|
@elasticmachine update branch (sorry some changelog madness here) |
We can exploit the mapping deduplication logic to save deserializing the same mapping repeatedly here. This should fix extremly long running computations when the cache needs to be refreshed for these stats in the common case of many duplicate mappings in a cluster. In a follow-up we can probably do the same for `AnalysisStats` as well.
|
Hi @original-brownbear, I've created a changelog YAML for you. |
|
Jenkins run elasticsearch-ci/part-2 |
|
Thanks Henning! |
We can exploit the mapping deduplication logic to save deserializing the
same mapping repeatedly here. This should fix extremely long running
computations when the cache needs to be refreshed for these stats
in the common case of many duplicate mappings in a cluster.
Also, removed some confusing and needless set creation in the constructor here.
We could go even further here probably and merge the logic for analysis and mapping stats parsing into one, but it doesn't matter much. With this fix the time to get a response in a 10k indices cluster with many repeated but very large (Beats) mappings (as you would expect them to be in the real world) goes from ~10s down to sub-second in the uncached case.
This removes a very long running task from the management pool and also ensures we don't burn endless CPU responding to monitoring in clusters that go through frequent metadata updates and thus won't see all that much benefit from the caching in
TransportClusterStatsAction.relates #77466