Conversation
rschu1ze
left a comment
There was a problem hiding this comment.
Thanks! I hope that the comments will be useful. Please add some functional SQL tests which demonstrate the new SQL functions.
src/Common/ErrorCodes.cpp
Outdated
| M(679, IO_URING_SUBMIT_ERROR) \ | ||
| M(690, MIXED_ACCESS_PARAMETER_TYPES) \ | ||
| M(691, UNKNOWN_ELEMENT_OF_ENUM) \ | ||
| M(692, ILLEGAL_VALUE_OF_ARGUMENT) \ |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
|
||
| #include <cmath> | ||
| #include <limits> | ||
| #include <stdexcept> |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
|
||
| /// count of applied values. Using in calculating exponential smoothing. | ||
|
|
||
| unsigned long long int count = 0; |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
|
||
| double value = 0; | ||
|
|
||
| /// count of applied values. Using in calculating exponential smoothing. |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| /// How much value decays after count_passed. | ||
| static double scale(unsigned long long int count_passed, double alpha) | ||
| { | ||
| /// using binary power because of low precision of pow(). |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| } | ||
|
|
||
| /// Merge two counters. It is done by moving to the same point of reference and summing the values. | ||
| /// First counter will be 'main' one, and second will be 'additional' one. |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
|
||
| /// first applied value. Using to avoid multiplying first value on alpha. | ||
|
|
||
| struct { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| struct { | ||
| double value = 0; | ||
| unsigned long long int timestamp = 0; | ||
| bool was = false; |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| ExponentiallySmoothedAlphaWithTime(double current_value, unsigned long long int current_time, double first_value_, unsigned long long int first_timestamp_) | ||
| : value(current_value), timestamp(current_time) | ||
| { | ||
| first_value.value = first_value_; |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| double alpha) | ||
| { | ||
| unsigned long long int max_time = std::max(a.timestamp, b.timestamp); | ||
| if (!a.first_value.was || !b.first_value.was) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
3bd0def to
c3b6553
Compare
c3b6553 to
e3c0986
Compare
9c3b434 to
4ba5009
Compare
| name, maximal_arity); | ||
| } | ||
|
|
||
| template<std::size_t minimal_arity> |
There was a problem hiding this comment.
(kind of related: performance is not a concern for assertArityAtLeast() + assertArityAtMost() (these functions are called once during some initialization). Therefore, templatization is overkill. We could just pass the min/max arity as a normal size_t parameter and reduce the code size a little bit.)
| #include <limits> | ||
|
|
||
| #include <stdexcept> | ||
| #include <optional> |
There was a problem hiding this comment.
(Cosmetic: let's keep sorted things sorted (l. 3-6))
| }; | ||
|
|
||
| /// Helper struct contains functions for all Counters | ||
| struct DataHelper |
There was a problem hiding this comment.
The name is too generic, what about ExponentiallySmoothedAlphaBase?
Or maybe even better: Don't use virtual inheritance at all (especially rather uncommonprivate inheritance, l. 204), instead name this struct ExponentialSmootingHelper and call the static methods directly, e.g. ExponentialSmootingHelper::scale_one_minus_value(...).
| while (count) | ||
| { | ||
| if (count & 1) | ||
| { |
There was a problem hiding this comment.
(cosmetic: it is common in ClickHouse to omit parentheses in single-line if/for/while statements)
| struct DataHelper | ||
| { | ||
| /// equivalent of pow(value, count). | ||
| /// using binary power for better precision |
There was a problem hiding this comment.
Minor: "binary power" --> "binary exponentiation"
And: is this trick really used for "better precision" or also for performance reasons? (I am actually not sure how std::pow is implemented, maybe it uses a similar technique?)
| @@ -0,0 +1,42 @@ | |||
| /* exponentialSmoothingAlpha tests */ | |||
| SELECT exponentialSmoothingAlpha()(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; -- { serverError BAD_ARGUMENTS } | |||
There was a problem hiding this comment.
Instead of generating the same table data over and over, we could create them once at the beginning of the test (use INSERT SELECT syntax) and then only reference these tables by the queries.
(also, numbers(100) will do the job just fine, the amount of data is too small to require multithreading)
| SELECT exponentialSmoothingAlpha(0.5) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; -- { serverError BAD_ARGUMENTS } | ||
| SELECT exponentialSmoothingAlpha(0)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(0.2)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(0.5)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 3 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; |
There was a problem hiding this comment.
Why are we using different values here (number * 2 in l. 9 vs. number * 3 in l. 10)? I think the results will be easier to verify when all queries use the same data.
EDIT: Thinking about it, maybe some manual test data will be sufficient already and more intuitive. E.g. CREATE a table, insert (e.g.) 6 rows manually, then run the queries with different alpha against that data. We would not need an OFFSET clause because there is only few rows. But because of that, it will also be easy to verify that everything is correct by looking at the expected results file.
| SELECT exponentialSmoothingAlpha(0)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(0.2)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(0.5)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 3 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(0.8)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; |
There was a problem hiding this comment.
I would find one or two tests with more "advanced" window function syntax interesting: PARTITION BY clause, and a limited frame Rows BETWEEN ... PRECEDING AND CURRENT ROW) - just to check that it works as expected.
| SELECT exponentialSmoothingAlpha(0.8)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
| SELECT exponentialSmoothingAlpha(1)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; | ||
|
|
||
| /* exponentialSmoothingAlphaWithTime tests */ |
There was a problem hiding this comment.
Tip: Use SELECT('exponentialSmoothingAlphaWithTime') instead of comments (l. 14). This will conveniently separate expected results for different sections in the reference file.
| * Exponentially smoothed value is weighted average with weight proportional to some function of the time passed. | ||
| * In this class timestamps exist, so time is biggest timestamp minus value timestamp. | ||
| * Skipped values fill by current value of counter. | ||
| * For example, if alpha = 1/3 and it's values timestamps (x0, 0), (x1, 2), (x2, 4) added, result will be x0 * 36/81 + x1 * 18/81 + x2 * 27/81. |
There was a problem hiding this comment.
Sorry, the example is not clear. How are 36, 18 and 27 calculated?
Changelog category:
Changelog entry:
Added a new aggregate function: exponentialSmoothingAlpha. It can be used to calculate exponential smoothing with given alpha parameter.
Documentation entry for user-facing changes