Allow evaluation to consist of multiple steps.#46653
Allow evaluation to consist of multiple steps.#46653przemekwitek merged 1 commit intoelastic:masterfrom
Conversation
5915583 to
72e6565
Compare
|
Pinging @elastic/ml-core |
|
This should certainly live under a new evaluation type. We need to determine a suitable name for it. |
50177fd to
c269036
Compare
Ok, we can discuss the final naming offline. |
|
Let's just start with |
f47e3c9 to
ae64b7f
Compare
benwtrent
left a comment
There was a problem hiding this comment.
Please add 1-2 yaml tests for coverage.
...java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/classification/Classification.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This is an interesting hard limit to have.
@tveasey when it comes to classification, do you think we should support > 100 classes?
@przemekwitek if we do need to support > 100 classes, I think chaining together callbacks to scroll through the composite aggregation would be necessary. It is not overly complicated, but may cause some frustrating refactoring in the search execution.
There was a problem hiding this comment.
There are certainly cases where people would have more than 100 classes, but I think they'll be rare. We could consider this as an enhancement
There was a problem hiding this comment.
Ok, let's stick to the limit of 100 classes for now.
Increasing that limit may cause code refactoring but should be invisible from user's perspective.
|
run elasticsearch-ci/2 |
przemekwitek
left a comment
There was a problem hiding this comment.
Please add 1-2
yamltests for coverage.
Done
...java/org/elasticsearch/xpack/core/ml/dataframe/evaluation/classification/Classification.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Ok, let's stick to the limit of 100 classes for now.
Increasing that limit may cause code refactoring but should be invisible from user's perspective.
|
I haven't had the chance to look through this as closely as I'd like. Could you please hold off merging it? |
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Looks good! Left a couple of points to consider.
...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java
Outdated
Show resolved
Hide resolved
...asticsearch/xpack/core/ml/dataframe/evaluation/classification/MulticlassConfusionMatrix.java
Outdated
Show resolved
Hide resolved
|
Also, could you please add documentation for this? |
486b5ec to
fc56c98
Compare
There was a problem hiding this comment.
I am a bit confused about this part and how it'd work. Here are my thoughts.
The set of actual classes may differ from that of the predicted classes. We're working with the first 1000 actual classes. For each of them, we're working with the first 1000 predicted classes for a given class.
I think it's fine in terms of the result matrix. It won't be a symmetric matrix, but I don't think it matters as we can still answer the question "how many times was class X classified as Y?".
However, when it comes to reporting the number of unhandled classes, I think what we do now may be confusing. There are 2 different counts at play. First, the count of unhandled actual classes which we get from the outer aggregation. Second, the count of unhandled predicted classes for each actual class we handle. I am not sure how helpful the max of all those is. Let's think a bit about this and discuss a solution.
There was a problem hiding this comment.
I like the general idea...
I think the natural way to implement this would be as follows:
- Order the classes by frequency (unless there is some extrinsic notion of importance, i.e. user defined list),
- Limit to 100 classes subject to the order defined in 1,
- Introduce a new class "other" which is every class not selected in 2,
- Report errors statistics for "actual is selected class prediction is other" and "actual is other prediction is selected type"
I'd probably omit the other vs other diagonal entry. Filling this in implies the classification is correct, where as of course we can't determine that by examining the actual classes.
There was a problem hiding this comment.
I think that would need 2 searches: the first to figure out the most frequent actual classes and the second to get the predicted classes after filtering out classes not in the above set.
We'll need to stretch the framework a bit to allow multiple searches but it might be good to do anyhow for paving the road for auc_roc, etc.
There was a problem hiding this comment.
This is now done. Metric evaluation can consist of many steps. Evaluation process gathers the results. PTAL
I'm also exploring if using TypedChainTaskExecutor would make sense here.
There was a problem hiding this comment.
Update:
I used TypedChainTaskExecutor to simplify the task chaining code.
|
I have discussed offline with @przemekwitek to do the following changes:
|
4ee034b to
af09c1d
Compare
75a1955 to
1162769
Compare
Done. |
1162769 to
955c712
Compare
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
I think this looks much better! A few minor comments.
There was a problem hiding this comment.
Perhaps we should add the first task in the constructor of the executor and then we won't need this at all.
There was a problem hiding this comment.
Shall we call this nextTask? In a way this is like an iterator of sorts.
There was a problem hiding this comment.
Do we need this method now? We could inline that in process, no?
|
run elasticsearch-ci/bwc |
ec88b42 to
782443b
Compare
This is groundwork for introducing classification evaluation which actually needs multistep evaluation.
782443b to
aaf8206
Compare
|
run elasticsearch-ci/2 |
This is groundwork for introducing classification evaluation which actually needs multistep evaluation.
This PR adds a possibility for an evaluation to consist of more than one search step.
This is needed when the results of one aggregation are input to another aggregation and pipeline aggregations cannot be used.
TypedChainExecutor is used to execute a (dynamically built) sequence of steps.
Relates #46735