Allow evaluation to consist of multiple steps. by przemekwitek · Pull Request #46653 · elastic/elasticsearch

przemekwitek · 2019-09-12T10:09:22Z

This PR adds a possibility for an evaluation to consist of more than one search step.
This is needed when the results of one aggregation are input to another aggregation and pipeline aggregations cannot be used.
TypedChainExecutor is used to execute a (dynamically built) sequence of steps.

Relates #46735

elasticmachine · 2019-09-16T09:05:23Z

Pinging @elastic/ml-core

dimitris-athanasiou · 2019-09-16T09:50:15Z

This should certainly live under a new evaluation type. We need to determine a suitable name for it.

przemekwitek · 2019-09-16T09:59:32Z

This should certainly live under a new evaluation type. We need to determine a suitable name for it.

Ok, we can discuss the final naming offline.
I'll start with "hard_classification" and will introduce the new type in this PR.
Codewise, it will reuse most of the code for "regression" (the difference will be that "hard_classification" will require categorical actual and predicted fields and the default metric will be multiclass confusion matrix).

dimitris-athanasiou · 2019-09-16T10:24:32Z

Let's just start with classification. I think we'll have some metrics requiring a probability and some that don't and I can't imagine how we're helping our users by forcing them to do 2 different API calls to gather them all. I think there's a chance we'll find that a classification evaluation can also do what our binary_soft_classification is now doing and that way replace it.

benwtrent

Please add 1-2 yaml tests for coverage.

...java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/classification/Classification.java

benwtrent · 2019-09-16T11:26:20Z

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java

This is an interesting hard limit to have.

@tveasey when it comes to classification, do you think we should support > 100 classes?

@przemekwitek if we do need to support > 100 classes, I think chaining together callbacks to scroll through the composite aggregation would be necessary. It is not overly complicated, but may cause some frustrating refactoring in the search execution.

There are certainly cases where people would have more than 100 classes, but I think they'll be rare. We could consider this as an enhancement

Ok, let's stick to the limit of 100 classes for now.
Increasing that limit may cause code refactoring but should be invisible from user's perspective.

przemekwitek · 2019-09-16T11:42:33Z

run elasticsearch-ci/2

przemekwitek

Please add 1-2 yaml tests for coverage.

Done

...java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/classification/Classification.java

przemekwitek · 2019-09-16T12:48:28Z

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java

Ok, let's stick to the limit of 100 classes for now.
Increasing that limit may cause code refactoring but should be invisible from user's perspective.

dimitris-athanasiou · 2019-09-16T14:24:50Z

I haven't had the chance to look through this as closely as I'd like. Could you please hold off merging it?

dimitris-athanasiou

Looks good! Left a couple of points to consider.

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java

dimitris-athanasiou · 2019-09-17T08:06:05Z

Also, could you please add documentation for this?

dimitris-athanasiou · 2019-09-18T16:46:02Z

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java

I am a bit confused about this part and how it'd work. Here are my thoughts.

The set of actual classes may differ from that of the predicted classes. We're working with the first 1000 actual classes. For each of them, we're working with the first 1000 predicted classes for a given class.

I think it's fine in terms of the result matrix. It won't be a symmetric matrix, but I don't think it matters as we can still answer the question "how many times was class X classified as Y?".

However, when it comes to reporting the number of unhandled classes, I think what we do now may be confusing. There are 2 different counts at play. First, the count of unhandled actual classes which we get from the outer aggregation. Second, the count of unhandled predicted classes for each actual class we handle. I am not sure how helpful the max of all those is. Let's think a bit about this and discuss a solution.

I like the general idea...

I think the natural way to implement this would be as follows:

Order the classes by frequency (unless there is some extrinsic notion of importance, i.e. user defined list),

Limit to 100 classes subject to the order defined in 1,

Introduce a new class "other" which is every class not selected in 2,

Report errors statistics for "actual is selected class prediction is other" and "actual is other prediction is selected type"

I'd probably omit the other vs other diagonal entry. Filling this in implies the classification is correct, where as of course we can't determine that by examining the actual classes.

I think that would need 2 searches: the first to figure out the most frequent actual classes and the second to get the predicted classes after filtering out classes not in the above set.

We'll need to stretch the framework a bit to allow multiple searches but it might be good to do anyhow for paving the road for auc_roc, etc.

This is now done. Metric evaluation can consist of many steps. Evaluation process gathers the results. PTAL

I'm also exploring if using TypedChainTaskExecutor would make sense here.

Update:
I used TypedChainTaskExecutor to simplify the task chaining code.

dimitris-athanasiou · 2019-09-25T10:12:12Z

I have discussed offline with @przemekwitek to do the following changes:

separate metric processing into 3 steps: 1. build search (aggs), 2. extract data from search response, 3. evaluate result
eventually split the multi-search refactoring into a separate PR

przemekwitek · 2019-09-26T07:32:12Z

I have discussed offline with @przemekwitek to do the following changes:

separate metric processing into 3 steps: 1. build search (aggs), 2. extract data from search response, 3. evaluate result

eventually split the multi-search refactoring into a separate PR

Done.
This PR becomes the refactoring PR and, as such is ready for review.
The actual work on classification evaluation is in a separate follow-up PR: #47126

dimitris-athanasiou

I think this looks much better! A few minor comments.

dimitris-athanasiou · 2019-09-26T13:07:32Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportEvaluateDataFrameAction.java

Perhaps we should add the first task in the constructor of the executor and then we won't need this at all.

dimitris-athanasiou · 2019-09-26T13:08:04Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportEvaluateDataFrameAction.java

Shall we call this nextTask? In a way this is like an iterator of sorts.

dimitris-athanasiou · 2019-09-26T13:25:30Z

...n/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/MeanSquaredError.java

Do we need this method now? We could inline that in process, no?

dimitris-athanasiou · 2019-09-26T13:25:47Z

.../src/main/java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/regression/RSquared.java

przemekwitek · 2019-09-26T14:24:05Z

run elasticsearch-ci/bwc

dimitris-athanasiou

LGTM

This is groundwork for introducing classification evaluation which actually needs multistep evaluation.

przemekwitek · 2019-09-27T06:57:57Z

run elasticsearch-ci/2

This is groundwork for introducing classification evaluation which actually needs multistep evaluation.

przemekwitek added the WIP label Sep 12, 2019

przemekwitek force-pushed the classification_evaluation branch 6 times, most recently from 5915583 to 72e6565 Compare September 16, 2019 09:01

przemekwitek added :ml Machine learning and removed WIP labels Sep 16, 2019

przemekwitek added >feature v7.5.0 v8.0.0 labels Sep 16, 2019

przemekwitek marked this pull request as ready for review September 16, 2019 09:05

przemekwitek force-pushed the classification_evaluation branch 2 times, most recently from 50177fd to c269036 Compare September 16, 2019 09:54

przemekwitek force-pushed the classification_evaluation branch 4 times, most recently from f47e3c9 to ae64b7f Compare September 16, 2019 11:26

benwtrent reviewed Sep 16, 2019

View reviewed changes

przemekwitek commented Sep 16, 2019

View reviewed changes

benwtrent approved these changes Sep 16, 2019

View reviewed changes

dimitris-athanasiou reviewed Sep 17, 2019

View reviewed changes

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java Outdated Show resolved Hide resolved

...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java Outdated Show resolved Hide resolved

przemekwitek force-pushed the classification_evaluation branch from 486b5ec to fc56c98 Compare September 18, 2019 11:49

dimitris-athanasiou reviewed Sep 18, 2019

View reviewed changes

przemekwitek force-pushed the classification_evaluation branch 2 times, most recently from 4ee034b to af09c1d Compare September 25, 2019 15:54

przemekwitek added >non-issue and removed >feature labels Sep 25, 2019

przemekwitek changed the title ~~Implement evaluation API for multiclass classification problem~~ Allow evaluation to consist of multiple steps. Sep 25, 2019

przemekwitek force-pushed the classification_evaluation branch 4 times, most recently from 75a1955 to 1162769 Compare September 26, 2019 07:27

przemekwitek force-pushed the classification_evaluation branch from 1162769 to 955c712 Compare September 26, 2019 07:40

przemekwitek mentioned this pull request Sep 26, 2019

Implement evaluation API for multiclass classification problem #47126

Merged

dimitris-athanasiou reviewed Sep 26, 2019

View reviewed changes

dimitris-athanasiou approved these changes Sep 26, 2019

View reviewed changes

przemekwitek force-pushed the classification_evaluation branch from ec88b42 to 782443b Compare September 26, 2019 19:22

Allow evaluation to consist of multiple steps.

aaf8206

This is groundwork for introducing classification evaluation which actually needs multistep evaluation.

przemekwitek force-pushed the classification_evaluation branch from 782443b to aaf8206 Compare September 27, 2019 04:39

przemekwitek mentioned this pull request Sep 27, 2019

[ML] Introduce classification analysis type #46735

Closed

9 tasks

przemekwitek merged commit 41d82f6 into elastic:master Sep 27, 2019

przemekwitek deleted the classification_evaluation branch September 27, 2019 07:29

przemekwitek mentioned this pull request Sep 27, 2019

[7.x] Allow evaluation to consist of multiple steps. (#46653) #47194

Merged

przemekwitek pushed a commit to przemekwitek/elasticsearch that referenced this pull request Sep 27, 2019

Allow evaluation to consist of multiple steps. (elastic#46653)

dccc717

This is groundwork for introducing classification evaluation which actually needs multistep evaluation.

przemekwitek pushed a commit that referenced this pull request Sep 27, 2019

[7.x] Allow evaluation to consist of multiple steps. (#46653) (#47194)

3fbd58d

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

przemekwitek commented Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Sep 16, 2019

Uh oh!

dimitris-athanasiou commented Sep 16, 2019

Uh oh!

przemekwitek commented Sep 16, 2019

Uh oh!

dimitris-athanasiou commented Sep 16, 2019

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

przemekwitek commented Sep 16, 2019

Uh oh!

przemekwitek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Sep 16, 2019

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dimitris-athanasiou commented Sep 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

przemekwitek Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Sep 25, 2019

Uh oh!

przemekwitek commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

przemekwitek commented Sep 26, 2019

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

przemekwitek commented Sep 12, 2019 •

edited

Loading

przemekwitek Sep 24, 2019 •

edited

Loading

przemekwitek commented Sep 26, 2019 •

edited

Loading