Skip to content

Remove deprecated wrapper from scripted_metric#57627

Merged
nik9000 merged 2 commits intoelastic:masterfrom
nik9000:scripted_metric_mem
Jun 5, 2020
Merged

Remove deprecated wrapper from scripted_metric#57627
nik9000 merged 2 commits intoelastic:masterfrom
nik9000:scripted_metric_mem

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Jun 3, 2020

This removes the deprecated asMultiBucketAggregator wrapper from
scripted_metric. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.

Relates to #56487

This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.

Relates to elastic#56487
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 3, 2020
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 3, 2020

@elasticmachine run elasticsearch-ci/1

* tracked by the circuit breakers properly. This is sad. So we pick a big
* number and estimate that each bucket costs that. It could be wildly
* inaccurate. We're sort of hoping that the real memory breaker saves
* us here. Or that folks just don't use the aggregation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha

* inaccurate. We're sort of hoping that the real memory breaker saves
* us here. Or that folks just don't use the aggregation.
*/
private static final long BUCKET_COST_ESTIMATE = 1024 * 5;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what made you choose this number beyond just "make it large"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment: I wonder if it would make sense to expose this as a configurable option for advanced users? (perhaps in a followup PR)

Reasoning being that we can't figure it out easily, it's already an advanced agg, so a user might actually be able to give us a reasonable estimate. And if they don't set it, we fall back to Big Number.

Dunno, might be a terrible idea :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number is the original default "weight" of an aggregator. So it is basically what we used to have. I didn't want to just set it to the default weight because it doesn't really have anything to do with it. Other than it is what we were using.

Making it configurable is certainly interesting! I don't folks are going to have a good idea of what to set it to without having done some java hacking though. We try not to leak stuff like that to our users. OTOH scripted_metric is pretty bonkers so if we are going to leak that sort of thing anywhere, here is it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't feel super great about the idea either. But then again, we're essentially allowing a user to write a custom map-reduce job that runs on ES, so exposing this kind of accounting isn't too crazy in that context.

There's probably a case for forcing the user to estimate it too, since I've definitely seen very bad scripts knock over clusters before (maps holding high cardinality IDs, etc). I think that's probably a bridge too far, alas :)

@nik9000 nik9000 merged commit 2b82551 into elastic:master Jun 5, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 5, 2020
…7627)

This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.

Relates to elastic#56487
nik9000 added a commit that referenced this pull request Jun 5, 2020
scripts don't have permission to call `parallelStream` because it'll
sometimes make a fork/join pool and script can't start threads. In this
case I added it by accident in #57627 and should have just used
`stream`.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 5, 2020
scripts don't have permission to call `parallelStream` because it'll
sometimes make a fork/join pool and script can't start threads. In this
case I added it by accident in elastic#57627 and should have just used
`stream`.
nik9000 added a commit that referenced this pull request Jun 5, 2020
…57763)

This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.

Relates to #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 25, 2020
Fixes two bugs introduced by elastic#57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
nik9000 added a commit that referenced this pull request Jun 25, 2020
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 25, 2020
Fixes two bugs introduced by elastic#57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
nik9000 added a commit that referenced this pull request Jun 25, 2020
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants