I would like to discuss adding a multivalued metrics aggregation that will apply unpaired and paired two-sample t-tests to two samples selected based on filters or fields or a combination of both.
So, unpaired t-test might look like this:
GET logs/_search
{
"size": 0,
"aggs" : {
"test" : {
"t_test" : {
"filters" : [
{ "match" : { "group" : "A" }},
{ "match" : { "group" : "B" }}
],
"field": "value"
}
}
}
}
The paired t-test might look something like this:
GET logs/_search
{
"size": 0,
"aggs" : {
"test" : {
"t_test" : {
"fields" : ["before", "after"]
}
}
}
}
We can also add support for scripts.
The type of the test can be specified by the user with defaults based on the presence of absence of filters. We can support a type parameter that can be specified as paired (default and only supported if filters are not present), homoscedastic (equal variance) or heteroscedastic (unequal variance, default if filters are present.
The output will be a typical metrics aggregation with t and p values.
Alternatively, we can implement this as a pipeline aggregation, but in this case it will simplify implementation, but might make usage a bit more difficult and can complicate kibana adoption. We can also consider implementing it as both pipeline and metric aggregation similar to stats.
cc: @jtibshirani, @polyfractal
I would like to discuss adding a multivalued metrics aggregation that will apply unpaired and paired two-sample t-tests to two samples selected based on filters or fields or a combination of both.
So, unpaired t-test might look like this:
The paired t-test might look something like this:
We can also add support for scripts.
The type of the test can be specified by the user with defaults based on the presence of absence of filters. We can support a
typeparameter that can be specified aspaired(default and only supported if filters are not present),homoscedastic(equal variance) orheteroscedastic(unequal variance, default if filters are present.The output will be a typical metrics aggregation with t and p values.
Alternatively, we can implement this as a pipeline aggregation, but in this case it will simplify implementation, but might make usage a bit more difficult and can complicate kibana adoption. We can also consider implementing it as both pipeline and metric aggregation similar to stats.
cc: @jtibshirani, @polyfractal