Save a little space in agg tree by nik9000 · Pull Request #53730 · elastic/elasticsearch

nik9000 · 2020-03-18T14:09:42Z

This drop the "top level" pipeline aggregators from the aggregation
result tree which should save a little memory and a few serialization
bytes. Perhaps more imporantly, this provides a mechanism by which we
can remove all pipelines from the aggregation result tree. This will
save quite a bit of space when pipelines are deep in the tree.

Sadly, doing this isn't simple because of backwards compatibility. Nodes
before 7.8.0 need those pipelines. We provide them by setting passing
a Supplier<PipelineTree> into the root of the aggregation tree that we
only call if we need to serialize to a version before 7.8.0.

This solution works for cross cluster search because we always reduce
the aggregations in each remote cluster and then forward them back to
the coordinating node. Its quite possible that the coordinating node
needs the pipeline (say it is version 7.1.0) and the gateway node in the
remote cluster doesn't (version 7.8.0). In that case the data nodes
won't send the pipeline aggregations back to the gateway node.
Critically, the gateway node will send the pipeline aggregations back
to the coordinating node. This is all managed with that
Supplier<PipelineTree>, but how it is managed is a bit tricky.

This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove *all* pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 *need* those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node *will* send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.

elasticmachine · 2020-03-18T14:09:44Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal

Left a few comments mainly around testing. Think this looks good! Serialization changes always give me the cold sweats but I can't see any flaws currently with the approach.

polyfractal · 2020-03-23T19:43:46Z

qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/10_basic.yml

  - match: { aggregations.cluster.buckets.0.key: "local_cluster" }
  - match: { aggregations.cluster.buckets.0.doc_count: 5 }

+  # once more, this time with a pipeline agg


Hmm, should we maybe split this out into it's own test? Just thinking that long multi-step yaml tests can be tricky to debug sometimes.

Haven't looked at the rest of the tests in this yaml though, so it might not be easy to move the indexing (or whatever else) steps up to the setup though.

I'll move it, sure! They can get hard to debug.

Ok - this is a test that doesn't clear indices after it runs. So moving things around is a bit more complex than we'd like to be honest. I can do it, but I think it should wait for a followup.

Ah ok, no worries then. not worth re-arranging everything for 👍

polyfractal · 2020-03-23T19:57:36Z

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java

+         * Setting the pipeline tree source to null is here is correct but
+         * only because we don't immediately pass the InternalAggregations
+         * off to another node. Instead, we always reduce together with
+         * many aggregations and that always adds the 


Looks like the comment trails off without finishing it's thought :)

Thanks! I do that somet

polyfractal · 2020-03-24T17:40:24Z

qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/10_basic.yml

  - match: { aggregations.cluster.buckets.0.doc_count: 5 }

+  # once more, this time with a pipeline agg
+  - do:


Should we have a similar test for the rolling-upgrade module? Theoretically it should be the same as CCS, but it might also smoke out different issues due to heterogeneous serialization inside the same cluster (instead of funneling through a gateway).

It'd be great to have a "mixed cluster CCS" test. I talked that one through with @javanna and we don't have one now and probably don't want to build one just for this.

polyfractal · 2020-03-24T17:41:35Z

qa/multi-cluster-search/src/test/resources/rest-api-spec/test/multi_cluster/10_basic.yml

+              terms:
+                field: f1.keyword
+              aggs:
+                s:


Should we add a non-top-level pipeline agg just to confirm they aren't affected?

I figured I'd get it in my next PR about non-top-level pipeline aggs, but I'm happy to do it now!

nik9000 · 2020-03-24T19:05:16Z

@polyfractal, I think this is ready for another round!

polyfractal

👍

nik9000 · 2020-03-24T20:17:36Z

Thanks @polyfractal !

nik9000 · 2020-03-25T13:11:47Z

I've updated all of the version numbers in the description to 7.8.0 because this is not making the 7.7.0 release train.

This drop the "top level" pipeline aggregators from the aggregation result tree which should save a little memory and a few serialization bytes. Perhaps more imporantly, this provides a mechanism by which we can remove *all* pipelines from the aggregation result tree. This will save quite a bit of space when pipelines are deep in the tree. Sadly, doing this isn't simple because of backwards compatibility. Nodes before 7.7.0 *need* those pipelines. We provide them by setting passing a `Supplier<PipelineTree>` into the root of the aggregation tree that we only call if we need to serialize to a version before 7.7.0. This solution works for cross cluster search because we always reduce the aggregations in each remote cluster and then forward them back to the coordinating node. Its quite possible that the coordinating node needs the pipeline (say it is version 7.1.0) and the gateway node in the remote cluster doesn't (version 7.7.0). In that case the data nodes won't send the pipeline aggregations back to the gateway node. Critically, the gateway node *will* send the pipeline aggregations back to the coordinating node. This is all managed with that `Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.

nik9000 · 2020-03-25T20:12:45Z

Backport done! I'm going to leave the backport_pending label until I've reenabled bwc tests on master.

Updates a few versions in serialization because we didn't make the 7.7.0 release train.

This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in elastic#53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in elastic#53730 to read the pipelines from the

This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the

This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in elastic#53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in elastic#53730 to read the pipelines from the

This fixes pipeline aggregations used in cross cluster search from an older version of Elasticsearch to a newer version of Elasticsearch. I broke this in #53730 when I was too aggressive in shutting off serialization of pipeline aggs. In particular, this comes up when the coordinating node is pre-7.8.0 and the gateway node is on or after 7.8.0. The fix is another step down the line to remove pipeline aggregators from the aggregation tree. Sort of. It create a new `List<PipelineAggregator>` member in `InternalAggregation` *but* it is only used for bwc serialization and it is fed by the mechanism established in #53730 to read the pipelines from the

nik9000 added :Analytics/Aggregations Aggregations >refactoring v8.0.0 v7.7.0 labels Mar 18, 2020

replace nocommit

4c9db02

nik9000 requested review from not-napoleon and polyfractal March 18, 2020 14:11

nik9000 mentioned this pull request Mar 18, 2020

Pipeline aggregations are weird #53742

Closed

3 tasks

nik9000 added 6 commits March 18, 2020 14:44

Test CCS

5b36280

Merge branch 'master' into pipeline_drop_serialization

bd890bc

Merge branch 'master' into pipeline_drop_serialization

aa33bbe

Merge branch 'master' into pipeline_drop_serialization

b9f4cda

Merge branch 'master' into pipeline_drop_serialization

1562e76

Boo

cff7a67

$polyfractal$

polyfractal reviewed Mar 24, 2020

View reviewed changes

nik9000 added 2 commits March 24, 2020 13:46

Merge branch 'master' into pipeline_drop_serialization

f335863

Update

feb194c

$polyfractal$

polyfractal approved these changes Mar 24, 2020

View reviewed changes

Switch version

12dae79

bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020

nik9000 merged commit e8c54c7 into elastic:master Mar 25, 2020

nik9000 added the backport pending label Mar 25, 2020

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 25, 2020

Reenable BWC after backporting elastic#53730

4be2f9b

Updates a few versions in serialization because we didn't make the 7.7.0 release train.

nik9000 added a commit that referenced this pull request Mar 25, 2020

Reenable BWC after backporting #53730 (#54230)

53c6278

Updates a few versions in serialization because we didn't make the 7.7.0 release train.

nik9000 removed the backport pending label Mar 25, 2020

nik9000 mentioned this pull request Mar 26, 2020

Fix pipeline agg serialization for ccs #54282

Merged

nik9000 mentioned this pull request Mar 30, 2020

Fix pipeline agg serialization for ccs (backport of #54282) #54468

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

nik9000 commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Mar 18, 2020

Uh oh!

polyfractal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Mar 24, 2020

Uh oh!

polyfractal left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Mar 24, 2020

Uh oh!

nik9000 commented Mar 25, 2020

Uh oh!

nik9000 commented Mar 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nik9000 commented Mar 18, 2020 •

edited

Loading

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal left a comment