add Window Function ExponentialSmoothingAlpha by aleks5d · Pull Request #1 · aleks5d/ClickHouse

aleks5d · 2023-05-06T20:16:44Z

Changelog category:

New Feature

Changelog entry:

Added a new aggregate function: exponentialSmoothingAlpha. It can be used to calculate exponential smoothing with given alpha parameter.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

rschu1ze

Thanks! I hope that the comments will be useful. Please add some functional SQL tests which demonstrate the new SQL functions.

src/Common/ErrorCodes.cpp

    M(679, IO_URING_SUBMIT_ERROR) \
    M(690, MIXED_ACCESS_PARAMETER_TYPES) \
    M(691, UNKNOWN_ELEMENT_OF_ENUM) \
+    M(692, ILLEGAL_VALUE_OF_ARGUMENT) \


src/Common/ExponentiallySmoothedCounter.h


 #include <cmath>
 #include <limits>
+#include <stdexcept>


src/Common/ExponentiallySmoothedCounter.h

+
+    /// count of applied values. Using in calculating exponential smoothing.
+
+    unsigned long long int count = 0;


src/Common/ExponentiallySmoothedCounter.h

+
+    double value = 0;
+
+    /// count of applied values. Using in calculating exponential smoothing.


src/Common/ExponentiallySmoothedCounter.h

+    /// How much value decays after count_passed.
+    static double scale(unsigned long long int count_passed, double alpha)
+    {
+        /// using binary power because of low precision of pow().


src/Common/ExponentiallySmoothedCounter.h

+    }
+
+    /// Merge two counters. It is done by moving to the same point of reference and summing the values.
+    /// First counter will be 'main' one, and second will be 'additional' one.


src/Common/ExponentiallySmoothedCounter.h

+
+    /// first applied value. Using to avoid multiplying first value on alpha. 
+
+    struct {


src/Common/ExponentiallySmoothedCounter.h

+    struct {
+        double value = 0;
+        unsigned long long int timestamp = 0;
+        bool was = false;


src/Common/ExponentiallySmoothedCounter.h

+    ExponentiallySmoothedAlphaWithTime(double current_value, unsigned long long int current_time, double first_value_, unsigned long long int first_timestamp_)
+        : value(current_value), timestamp(current_time)
+    {
+        first_value.value = first_value_;


src/Common/ExponentiallySmoothedCounter.h

+                                                            double alpha)
+    {
+        unsigned long long int max_time = std::max(a.timestamp, b.timestamp);
+        if (!a.first_value.was || !b.first_value.was)


rschu1ze · 2023-05-23T16:41:28Z

src/AggregateFunctions/FactoryHelpers.h

                    name, maximal_arity);
 }

+template<std::size_t minimal_arity>


(kind of related: performance is not a concern for assertArityAtLeast() + assertArityAtMost() (these functions are called once during some initialization). Therefore, templatization is overkill. We could just pass the min/max arity as a normal size_t parameter and reduce the code size a little bit.)

rschu1ze · 2023-05-23T16:42:17Z

src/Common/ExponentiallySmoothedCounter.h

 #include <limits>
-
+#include <stdexcept>
+#include <optional>


(Cosmetic: let's keep sorted things sorted (l. 3-6))

rschu1ze · 2023-05-23T17:09:07Z

src/Common/ExponentiallySmoothedCounter.h

 };

+/// Helper struct contains functions for all Counters 
+struct DataHelper


The name is too generic, what about ExponentiallySmoothedAlphaBase?

Or maybe even better: Don't use virtual inheritance at all (especially rather uncommonprivate inheritance, l. 204), instead name this struct ExponentialSmootingHelper and call the static methods directly, e.g. ExponentialSmootingHelper::scale_one_minus_value(...).

rschu1ze · 2023-05-23T17:10:55Z

src/Common/ExponentiallySmoothedCounter.h

+        while (count)
+        {
+            if (count & 1)
+            {


(cosmetic: it is common in ClickHouse to omit parentheses in single-line if/for/while statements)

rschu1ze · 2023-05-23T17:21:34Z

src/Common/ExponentiallySmoothedCounter.h

+struct DataHelper
+{
+    /// equivalent of pow(value, count).
+    /// using binary power for better precision


Minor: "binary power" --> "binary exponentiation"

And: is this trick really used for "better precision" or also for performance reasons? (I am actually not sure how std::pow is implemented, maybe it uses a similar technique?)

rschu1ze · 2023-05-23T20:24:07Z

tests/queries/0_stateless/02715_exponential_smoothing_alpha.sql

@@ -0,0 +1,42 @@
+/* exponentialSmoothingAlpha tests */
+SELECT exponentialSmoothingAlpha()(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; -- { serverError BAD_ARGUMENTS }


Instead of generating the same table data over and over, we could create them once at the beginning of the test (use INSERT SELECT syntax) and then only reference these tables by the queries.

(also, numbers(100) will do the job just fine, the amount of data is too small to require multithreading)

rschu1ze · 2023-05-23T20:31:37Z

tests/queries/0_stateless/02715_exponential_smoothing_alpha.sql

+SELECT exponentialSmoothingAlpha(0.5) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99; -- { serverError BAD_ARGUMENTS }
+SELECT exponentialSmoothingAlpha(0)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(0.2)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(0.5)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 3 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;


Why are we using different values here (number * 2 in l. 9 vs. number * 3 in l. 10)? I think the results will be easier to verify when all queries use the same data.

EDIT: Thinking about it, maybe some manual test data will be sufficient already and more intuitive. E.g. CREATE a table, insert (e.g.) 6 rows manually, then run the queries with different alpha against that data. We would not need an OFFSET clause because there is only few rows. But because of that, it will also be easy to verify that everything is correct by looking at the expected results file.

rschu1ze · 2023-05-23T20:33:53Z

tests/queries/0_stateless/02715_exponential_smoothing_alpha.sql

+SELECT exponentialSmoothingAlpha(0)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(0.2)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(0.5)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 3 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(0.8)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;


I would find one or two tests with more "advanced" window function syntax interesting: PARTITION BY clause, and a limited frame Rows BETWEEN ... PRECEDING AND CURRENT ROW) - just to check that it works as expected.

rschu1ze · 2023-05-23T20:41:51Z

tests/queries/0_stateless/02715_exponential_smoothing_alpha.sql

+SELECT exponentialSmoothingAlpha(0.8)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+SELECT exponentialSmoothingAlpha(1)(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 4 - (number % 7 - 3) / 7  as value from numbers_mt(100)) OFFSET 99;
+
+/* exponentialSmoothingAlphaWithTime tests */


Tip: Use SELECT('exponentialSmoothingAlphaWithTime') instead of comments (l. 14). This will conveniently separate expected results for different sections in the reference file.

rschu1ze · 2023-05-23T20:47:38Z

src/Common/ExponentiallySmoothedCounter.h

+  * Exponentially smoothed value is weighted average with weight proportional to some function of the time passed.
+  * In this class timestamps exist, so time is biggest timestamp minus value timestamp.
+  * Skipped values fill by current value of counter.
+  * For example, if alpha = 1/3 and it's values timestamps (x0, 0), (x1, 2), (x2, 4) added, result will be x0 * 36/81 + x1 * 18/81 + x2 * 27/81.


Sorry, the example is not clear. How are 36, 18 and 27 calculated?

add AggregateFunctionExponentialSmoothingAlpha

b21d5f6

aleks5d self-assigned this May 6, 2023

aleks5d added 3 commits May 6, 2023 21:34

add AggregateFunctionExponentialSmoothingAlphaFillGaps

2e49f57

fix bug in FillGaps function

8acd66b

fix unsigned in Counters

22a8494

rschu1ze reviewed May 13, 2023

View reviewed changes

change to WindowFunction

4b54735

aleks5d force-pushed the exponential-smoothing branch from 3bd0def to c3b6553 Compare May 15, 2023 21:49

fix little issues

e3c0986

This comment was marked as outdated.

Sign in to view

aleks5d force-pushed the exponential-smoothing branch from c3b6553 to e3c0986 Compare May 16, 2023 14:18

aleks5d added 2 commits May 16, 2023 14:19

change ull to uint64_t

d19702f

tests

4ba5009

aleks5d force-pushed the exponential-smoothing branch from 9c3b434 to 4ba5009 Compare May 16, 2023 17:42

aleks5d added 2 commits May 17, 2023 12:39

usefull comments

7e5d6de

remove rand() from tests

fa3944d

aleks5d changed the title ~~add AggregateFunctionExponentialSmoothingAlpha~~ add Window Function ExponentialSmoothingAlpha May 18, 2023

rschu1ze reviewed May 23, 2023

View reviewed changes

alexey-milovidov mentioned this pull request Dec 31, 2023

Intern Tasks 2023/2024 ClickHouse/ClickHouse#58394

Closed


		/// count of applied values. Using in calculating exponential smoothing.

		unsigned long long int count = 0;


		double value = 0;

		/// count of applied values. Using in calculating exponential smoothing.


		/// first applied value. Using to avoid multiplying first value on alpha.

		struct {

		@@ -0,0 +1,42 @@
		/* exponentialSmoothingAlpha tests */
		SELECT exponentialSmoothingAlpha()(value) over (ORDER BY timestamp ASC) from (SELECT number as timestamp, number * 2 - (number % 7 - 3) / 7 as value from numbers_mt(100)) OFFSET 99; -- { serverError BAD_ARGUMENTS }

Conversation

aleks5d commented May 6, 2023

Changelog category:

Changelog entry:

Documentation entry for user-facing changes

Uh oh!

rschu1ze left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants