I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:
{
"index": "circle-ml",
"query": {
"term": {
"ml.is_training": false
}
},
"evaluation": {
"classification": {
"actual_field": "in_unit_circle",
"predicted_field": "ml.in_unit_circle_prediction.keyword",
"metrics": {
"accuracy": {},
"multiclass_confusion_matrix": {}
}
}
}
}
The evaluation reported accuracy of 0 as it could not find any point for which dependent_variable was equal to the prediction.
The problem is that dependent variable is boolean and prediction is string, and the painless script is:
doc[''{0}''].value == doc[''{1}''].value
Two solutions I see here are:
- (simpler) relax the equality check so that it treats boolean
true and string "true" as equal
- (more involved) make C++ code report prediction using the type of dependent variable. The type of the dependent variable can be passed down from Java.
Also, the same scenario should be reproduced for integer types.
I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:
The evaluation reported accuracy of
0as it could not find any point for whichdependent_variablewas equal to the prediction.The problem is that dependent variable is boolean and prediction is string, and the painless script is:
Two solutions I see here are:
trueand string"true"as equalAlso, the same scenario should be reproduced for integer types.