New Matrix Stats Aggregation module#18300
Conversation
f3b1159 to
bce9bbf
Compare
|
This PR has been updated to include MultiValuesSource classes (per comment in #18285) @colings86, @jpountz could one (or both) of you have a look if you get a chance? |
There was a problem hiding this comment.
I think this should actually be in ValuesSourceAggregatorBuilder since this CommonFields interface is for the response side of the aggregations but value_type is for the request side, so it doesn't really belong here
There was a problem hiding this comment.
Ah, so is InternalAggregation specifically for the response side? I'm not as intimate with the aggs framework so I thought it was just an internal framework class (request / response agnostic).
So then what do you think about having a CommonFields static class to AggregatorBuilder for common fields used on the request side?
There was a problem hiding this comment.
+1 to having CommonFields class in AggregatorBuilder
|
@nknize I left some comments on the aggregation type stuff but I don't know enough about the actual statistics we are calculating here to review those bits. Maybe @brwe or @polyfractal would be able to review those bits? |
There was a problem hiding this comment.
should we rather do it in onModule? cc @colings86
|
I tried, and largely failed, to grok the stats calculations. My math skills are pretty rusty and the streaming formulas look different from what I'm used to. I'd try to get a second opinion from @brwe :) |
89d8c39 to
2fa3320
Compare
|
Great suggestions! I incorporated all feedback and the PR is ready for another review. Here are the main changes:
/cc @colings86, @jpountz |
There was a problem hiding this comment.
to make this class more efficient, we might want to give an ordinal to every field and then use field ordinals as keys rather than field names? (I don't mind it being done as a follow-up, I'm just suggesting)
There was a problem hiding this comment.
+1 for followup PR. I'll open an issue
|
Thanks for the latest round of review @jpountz. I went ahead and made the minor code changes, added support for missing values, and updated documentation. Let me know what you think. |
2a811b5 to
f33b04e
Compare
|
I'd like @colings86 to have a final look, at least for the core refactorings that you did. But otherwise this looks good to me for a first iteration! |
There was a problem hiding this comment.
Should we make all the fields of this class private now we have setters/getters for them so there is only one way of accessing/modifying the values? (maybe in a followup PR?)
There was a problem hiding this comment.
I went ahead and did it in this PR but will keep it as a separate commit when I merge.
|
@jpountz @nknize I took a look at the core changes and they LGTM. I also took a brief look at the module, there are unit and REST tests but we are missing integration tests for this. Is there a reason we have not done an integration tests for this aggregation? All other aggregations have integration tests. |
|
@colings86 I had asked @rjernst about integration testing in modules. I could be wrong but I think to avoid the overhead of starting/stopping integration test clusters in plugins and modules the |
f33b04e to
2f66665
Compare
REST tests are real integration tests. No need for fantasy land tests. |
b85644a to
1706782
Compare
This commit adds a new aggs-matrix-stats module. The module presents a new class of aggregations called Matrix Aggregations. Matrix aggregations work on multiple fields and produce a matrix as output. The first matrix aggregation provided by the module is matrix_stats aggregation. This aggregation computes the following statistics over a set of fields: * Covariance * Correlation For completeness (and interpretation purposes) the following per-field statistics are also provided: * sample count * population mean * population variance * population skewness * population kurtosis
1706782 to
54575e5
Compare
This PR adds a new aggs-matrix-stats module. The module introduces a new class of aggregations called
Matrix Aggregations. Matrix aggregations work on multiple fields and produce a matrix as output. The matrix aggregation provided by this module is thematrix_statsaggregation which computes the following descriptive statistics over a set of fields:For completeness (and interpretation purposes) the following per-field statistics are also provided:
Example request:
Example response:
This PR replaces #16826. The differences include:
multifield_statstomatrix_stats