add corrMatrix,covarSampMatrix,covarPopMatrix AggregateFunction#44680
add corrMatrix,covarSampMatrix,covarPopMatrix AggregateFunction#44680pufit merged 10 commits intoClickHouse:masterfrom
Conversation
| INSERT INTO fh(a_value, b_value, c_value, d_value) VALUES (1, 5.6,-4.4, 2.6),(2, -9.6, 3, 3.3),(3, -1.3,-4, 1.2),(4, 5.3,9.7,2.3),(5, 4.4,0.037,1.222),(6, -8.6,-7.8,2.1233),(7, 5.1,9.3,8.1222),(8, 7.9,-3.6,9.837),(9, -8.2,0.62,8.43555),(10, -3,7.3,6.762); | ||
|
|
||
| SELECT corrMatrix(a_value) FROM (select a_value from fh limit 0); | ||
|
|
There was a problem hiding this comment.
why not just select corrMatrix(a_value) from fh?
There was a problem hiding this comment.
These functions are unstable and have some randomness.
There was a problem hiding this comment.
The reason for the randomness is probably a floating-point numbers error. This can be dealt with a round function in the test query.
There was a problem hiding this comment.
OK, I'll add some test later.
|
|
||
| SELECT corrMatrix(a_value) FROM (select a_value from fh limit 0); | ||
|
|
||
| SELECT corrMatrix(a_value) FROM (select a_value from fh limit 1); |
There was a problem hiding this comment.
When the data has only zero row or one rows, the output is stable.
| { | ||
| assertNoParameters(name, parameters); | ||
| for (const auto & argument_type : argument_types) | ||
| if (!isNumber(argument_type)) |
There was a problem hiding this comment.
What about Decimal data type?
There was a problem hiding this comment.
Maybe just isNativeNumber is enough?
There was a problem hiding this comment.
yes, I'll fix this later.
| template <StatisticsFunctionKind _kind> | ||
| struct StatFuncArbitraryArgData | ||
| { | ||
| using DataType = std::conditional_t<_kind == StatisticsFunctionKind::corr, CorrMoments<Float64>, CovarMoments<Float64>>; |
There was a problem hiding this comment.
If all data type of the arguments are Float32, then result type is better be Float32.
There was a problem hiding this comment.
Forwarding all argument types to template is a problem.
| template <typename StatFuncData> | ||
| class AggregateFunctionVarianceSimpleMatrix final | ||
| : public IAggregateFunctionDataHelper<StatFuncData, AggregateFunctionVarianceSimpleMatrix<StatFuncData>> | ||
| { |
There was a problem hiding this comment.
Maybe no need to add a new class, we can add it to AggregateFunctionVarianceSimple, and control by a template parameter.
|
Add functions to |
OK👌 |
|
@alexey-milovidov Hi, could you take a look at his first pull request? |
| @@ -0,0 +1,24 @@ | |||
| [[nan]] | |||
There was a problem hiding this comment.
Because corr use a unstable algorithm, you could have a look at corr's test query in 00181_aggregate_functions_statistics.sql and 00181_aggregate_functions_statistics.reference.
There was a problem hiding this comment.
Those tests look like very old ones. But, I suppose something like SELECT arrayMap(x -> arrayMap(y -> round(y, 5), x), corrMatrix(...)) FROM fh; should do the trick and return consistent results. And it will actually validate that inner math logic works correctly in this case.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add
corrMatrixAggregatefunction, calculating each two columns.In addition, since Aggregatefunctions
covarSampandcovarPopare similar tocorr, I addcovarSampMatrix,covarPopMatrixby the way.@alexey-milovidov closes #44587