builtins: used Decimals to calculate transition values for aggregate function#74372
Conversation
|
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. Before a member of our team reviews your PR, I have some potential action items for you:
I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
yuzefovich
left a comment
There was a problem hiding this comment.
Thanks for taking this on!
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mneverov)
-- commits, line 2 at r1:
nit: use builtins: as the prefix of the commit title and use the lower case for Used.
-- commits, line 17 at r1:
nit: missing Release note: None.
pkg/sql/sem/builtins/aggregate_builtins.go, line 1849 at r1 (raw file):
} func newRegressionAccumulatorDecimalBase(
super nit: there is a slight difference between new and make for the constructor methods naming (at least in CockroachDB code). new is used for pointer returns whereas make for value returns, so we should use make here and for Empty below as well.
pkg/sql/sem/builtins/aggregate_builtins.go, line 1866 at r1 (raw file):
singleDatumAggregateBase: singleDatumAggregateBase, ed: ed, n: &apd.Decimal{},
Not sure if it matters, but it might be better to have a single large allocation, so something like
var decimals [16]apd.Decimal
b := regressionAccumulatorDecimalBase{
...
n: &decimals[0],
sx: &decimals[1],
...
}
Or maybe the compiler is smart enough to do something like this itself.
In any case, it'd be good to take a quick look at the benchmarks.
pkg/sql/sem/builtins/aggregate_builtins.go, line 1979 at r1 (raw file):
// sums to NaN; otherwise we will falsely report variance zero when // there are no more inputs. if x.Form == apd.NaN || isInf(x) {
super nit: might be cleaner to introduce isNaN helper method as well.
pkg/sql/sem/builtins/aggregate_builtins.go, line 2034 at r1 (raw file):
func mapToDFloat(d *apd.Decimal, err error) (tree.Datum, error) { if err != nil { return tree.DNull, nil
Shouldn't this return err?
Update: our linter caught it too :)
pkg/sql/sem/builtins/aggregate_builtins.go:2034:3: unexpected nil error return after checking for a non-nil error; if this is not a mistake, add a "//nolint:returnerrcheck" comment
pkg/sql/sem/builtins/aggregate_builtins.go, line 2039 at r1 (raw file):
res, err := d.Float64() if err != nil && err.(*strconv.NumError).Err == strconv.ErrRange {
nit: can we replace the second part of the conditional with something like errors.Is(err, strconv.ErrRange)?
Update: linter is suggesting this as well :)
pkg/sql/sem/builtins/aggregate_builtins.go:2039:47: use errors.Is instead of a direct comparison
For example:
if errors.Is(err, errMyOwnErrReference) {
...
}
pkg/sql/sem/builtins/aggregate_builtins.go:2039:23: invalid direct cast on error object
Alternatives:
if _, ok := err.(*T); ok -> if errors.HasType(err, (*T)(nil))
if _, ok := err.(I); ok -> if errors.HasInterface(err, (*I)(nil))
if myErr, ok := err.(*T); ok -> if myErr := (*T)(nil); errors.As(err, &myErr)
if myErr, ok := err.(I); ok -> if myErr := (I)(nil); errors.As(err, &myErr)
switch err.(type) { case *T:... -> switch { case errors.HasType(err, (*T)(nil)): ...
pkg/sql/sem/builtins/aggregate_builtins.go, line 2074 at r1 (raw file):
return tree.DNull, nil } res := &apd.Decimal{}
IIUC we could avoid the allocation of a new decimal here and reuse a.tmp, right?
1bd683c to
bce7cf4
Compare
|
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
mneverov
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mneverov and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing
Release note: None.
👍
pkg/sql/sem/builtins/aggregate_builtins.go, line 1849 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: there is a slight difference between
newandmakefor the constructor methods naming (at least in CockroachDB code).newis used for pointer returns whereasmakefor value returns, so we should usemakehere and forEmptybelow as well.
OK
pkg/sql/sem/builtins/aggregate_builtins.go, line 1979 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: might be cleaner to introduce
isNaNhelper method as well.
Done.
pkg/sql/sem/builtins/aggregate_builtins.go, line 2034 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Shouldn't this return
err?Update: our linter caught it too :)
pkg/sql/sem/builtins/aggregate_builtins.go:2034:3: unexpected nil error return after checking for a non-nil error; if this is not a mistake, add a "//nolint:returnerrcheck" comment
fixed, thanks
pkg/sql/sem/builtins/aggregate_builtins.go, line 2039 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: can we replace the second part of the conditional with something like
errors.Is(err, strconv.ErrRange)?Update: linter is suggesting this as well :)
pkg/sql/sem/builtins/aggregate_builtins.go:2039:47: use errors.Is instead of a direct comparison For example: if errors.Is(err, errMyOwnErrReference) { ... } pkg/sql/sem/builtins/aggregate_builtins.go:2039:23: invalid direct cast on error object Alternatives: if _, ok := err.(*T); ok -> if errors.HasType(err, (*T)(nil)) if _, ok := err.(I); ok -> if errors.HasInterface(err, (*I)(nil)) if myErr, ok := err.(*T); ok -> if myErr := (*T)(nil); errors.As(err, &myErr) if myErr, ok := err.(I); ok -> if myErr := (I)(nil); errors.As(err, &myErr) switch err.(type) { case *T:... -> switch { case errors.HasType(err, (*T)(nil)): ...
Done.
pkg/sql/sem/builtins/aggregate_builtins.go, line 2074 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
IIUC we could avoid the allocation of a new decimal here and reuse
a.tmp, right?
you're right, replaced where possible.
yuzefovich
left a comment
There was a problem hiding this comment.
Last question from me, but otherwise
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mneverov)
pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):
// Variables used across iterations. ed *apd.ErrDecimal n, sx, sxx, sy, syy, sxy *apd.Decimal
Actually, rather than storing decimals by pointer, can we store them by value here? Then we won't need to explicitly set them in the constructor.
mneverov
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Actually, rather than storing decimals by pointer, can we store them by value here? Then we won't need to explicitly set them in the constructor.
Yes we can. I decided against it because then in the add funcion and in all result functions I would need to use & operator.
Don't have strong opinion about it, let me know if the values in the struct are preferable.
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mneverov)
pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):
Previously, mneverov (Max Neverov) wrote…
Yes we can. I decided against it because then in the add funcion and in all result functions I would need to use
&operator.
Don't have strong opinion about it, let me know if the values in the struct are preferable.
I think the values are preferable in this case. My reasoning is that then all apd.Decimal objects don't have to be explicitly instantiated and all of them are also in the same memory location as regressionAccumulatorDecimalBase struct as a whole. Taking the pointer from value (i.e. & operator) is essentially free AFAIK.
…functions. Related to cockroachdb#58347. Previously, when numbers in input exceeded <X>e+16 that led to loss of significance in calculation of the transition values and the result of aggregate functions. This in its turn led to result of aggregate function depends on the order of the numbers in a table. This was OK since this behaviour was identical to what Postgres has. Calculations in the distributed mode comprises two stages: split the data with the calculating transition values locally and the subsequent merge of the results from the first step. Merging operations can happen in any order which means that the order of operation should not affect the result. In cockroachdb#58347 it was decided to use Decimal type to calculate transition values for aggregate functions. This solves the problem with the loss of significance, and hence with the order of the merge operations in distributed mode. Release note: None
bce7cf4 to
494df3f
Compare
mneverov
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think the values are preferable in this case. My reasoning is that then all
apd.Decimalobjects don't have to be explicitly instantiated and all of them are also in the same memory location asregressionAccumulatorDecimalBasestruct as a whole. Taking the pointer from value (i.e.&operator) is essentially free AFAIK.
Sure, & is free, I was thinking from the code aesthetics perspective :)
yuzefovich
left a comment
There was a problem hiding this comment.
bors r=yuzefovich
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mneverov)
pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):
Previously, mneverov (Max Neverov) wrote…
Sure,
&is free, I was thinking from the code aesthetics perspective :)
I see :)
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
Related to #58347.
Previously, when numbers in input exceeded e+16 that led to loss of
significance in calculation of the transition values and the result of aggregate
functions. This in its turn led to result of aggregate function depends on the
order of the numbers in a table. This was OK since this behaviour was identical
to what Postgres has.
Calculations in the distributed mode comprises two stages: split the data with
the calculating transition values locally and the subsequent merge of the
results from the first step. Merging operations can happen in any order which
means that the order of operation should not affect the result. In #58347 it was
decided to use Decimal type to calculate transition values for aggregate
functions. This solves the problem with the loss of significance, and hence with
the order of the merge operations in distributed mode.
Release note: None