Add Student's t-test aggregation support

I would like to discuss adding a multivalued metrics aggregation that will apply unpaired and paired two-sample t-tests to two samples selected based on filters or fields or a combination of both. 

So, unpaired t-test might look like this:
```
GET logs/_search
{
  "size": 0,
  "aggs" : {
    "test" : {
      "t_test" : {
        "filters" : [
          { "match" : { "group" : "A" }},
          { "match" : { "group" : "B" }}
        ],
        "field": "value"
      }
    }
  }
}
```

The paired t-test might look something like this:

```
GET logs/_search
{
  "size": 0,
  "aggs" : {
    "test" : {
      "t_test" : {
        "fields" : ["before", "after"]
      }
    }
  }
}
```

We can also add support for scripts. 

The type of the test can be specified by the user with defaults based on the presence of absence of filters. We can support a `type` parameter that can be specified as `paired` (default and only supported if filters are not present), `homoscedastic` (equal variance) or `heteroscedastic` (unequal variance, default if filters are present.

The output will be a typical metrics aggregation with t and p values.

Alternatively, we can implement this as a pipeline aggregation, but in this case it will simplify implementation, but might make usage a bit more difficult and can complicate kibana adoption. We can also consider implementing it as both pipeline and metric aggregation similar to stats. 

cc: @jtibshirani, @polyfractal



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Student's t-test aggregation support #53692

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Student's t-test aggregation support #53692

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions