Skip to content

builtins: used Decimals to calculate transition values for aggregate function#74372

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
mneverov:decimal_aggregate
Jan 6, 2022
Merged

builtins: used Decimals to calculate transition values for aggregate function#74372
craig[bot] merged 1 commit intocockroachdb:masterfrom
mneverov:decimal_aggregate

Conversation

@mneverov
Copy link
Copy Markdown
Contributor

@mneverov mneverov commented Jan 1, 2022

Related to #58347.

Previously, when numbers in input exceeded e+16 that led to loss of
significance in calculation of the transition values and the result of aggregate
functions. This in its turn led to result of aggregate function depends on the
order of the numbers in a table. This was OK since this behaviour was identical
to what Postgres has.
Calculations in the distributed mode comprises two stages: split the data with
the calculating transition values locally and the subsequent merge of the
results from the first step. Merging operations can happen in any order which
means that the order of operation should not affect the result. In #58347 it was
decided to use Decimal type to calculate transition values for aggregate
functions. This solves the problem with the loss of significance, and hence with
the order of the merge operations in distributed mode.

Release note: None

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jan 1, 2022

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jan 1, 2022
@blathers-crl blathers-crl bot requested a review from yuzefovich January 1, 2022 16:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mneverov)


-- commits, line 2 at r1:
nit: use builtins: as the prefix of the commit title and use the lower case for Used.


-- commits, line 17 at r1:
nit: missing Release note: None.


pkg/sql/sem/builtins/aggregate_builtins.go, line 1849 at r1 (raw file):

}

func newRegressionAccumulatorDecimalBase(

super nit: there is a slight difference between new and make for the constructor methods naming (at least in CockroachDB code). new is used for pointer returns whereas make for value returns, so we should use make here and for Empty below as well.


pkg/sql/sem/builtins/aggregate_builtins.go, line 1866 at r1 (raw file):

		singleDatumAggregateBase: singleDatumAggregateBase,
		ed:                       ed,
		n:                        &apd.Decimal{},

Not sure if it matters, but it might be better to have a single large allocation, so something like

var decimals [16]apd.Decimal
b := regressionAccumulatorDecimalBase{
  ...
  n: &decimals[0],
  sx: &decimals[1],
  ...
}

Or maybe the compiler is smart enough to do something like this itself.

In any case, it'd be good to take a quick look at the benchmarks.


pkg/sql/sem/builtins/aggregate_builtins.go, line 1979 at r1 (raw file):

		// sums to NaN; otherwise we will falsely report variance zero when
		// there are no more inputs.
		if x.Form == apd.NaN || isInf(x) {

super nit: might be cleaner to introduce isNaN helper method as well.


pkg/sql/sem/builtins/aggregate_builtins.go, line 2034 at r1 (raw file):

func mapToDFloat(d *apd.Decimal, err error) (tree.Datum, error) {
	if err != nil {
		return tree.DNull, nil

Shouldn't this return err?

Update: our linter caught it too :)

        pkg/sql/sem/builtins/aggregate_builtins.go:2034:3: unexpected nil error return after checking for a non-nil error; if this is not a mistake, add a "//nolint:returnerrcheck" comment

pkg/sql/sem/builtins/aggregate_builtins.go, line 2039 at r1 (raw file):

	res, err := d.Float64()

	if err != nil && err.(*strconv.NumError).Err == strconv.ErrRange {

nit: can we replace the second part of the conditional with something like errors.Is(err, strconv.ErrRange)?

Update: linter is suggesting this as well :)

        pkg/sql/sem/builtins/aggregate_builtins.go:2039:47: use errors.Is instead of a direct comparison
        For example:
           if errors.Is(err, errMyOwnErrReference) {
             ...
           }
        
        pkg/sql/sem/builtins/aggregate_builtins.go:2039:23: invalid direct cast on error object
        Alternatives:
           if _, ok := err.(*T); ok        ->   if errors.HasType(err, (*T)(nil))
           if _, ok := err.(I); ok         ->   if errors.HasInterface(err, (*I)(nil))
           if myErr, ok := err.(*T); ok    ->   if myErr := (*T)(nil); errors.As(err, &myErr)
           if myErr, ok := err.(I); ok     ->   if myErr := (I)(nil); errors.As(err, &myErr)
           switch err.(type) { case *T:... ->   switch { case errors.HasType(err, (*T)(nil)): ...

pkg/sql/sem/builtins/aggregate_builtins.go, line 2074 at r1 (raw file):

		return tree.DNull, nil
	}
	res := &apd.Decimal{}

IIUC we could avoid the allocation of a new decimal here and reuse a.tmp, right?

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jan 4, 2022

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Copy Markdown
Contributor Author

@mneverov mneverov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mneverov and @yuzefovich)


-- commits, line 17 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: missing Release note: None.

👍


pkg/sql/sem/builtins/aggregate_builtins.go, line 1849 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: there is a slight difference between new and make for the constructor methods naming (at least in CockroachDB code). new is used for pointer returns whereas make for value returns, so we should use make here and for Empty below as well.

OK


pkg/sql/sem/builtins/aggregate_builtins.go, line 1979 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: might be cleaner to introduce isNaN helper method as well.

Done.


pkg/sql/sem/builtins/aggregate_builtins.go, line 2034 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Shouldn't this return err?

Update: our linter caught it too :)

        pkg/sql/sem/builtins/aggregate_builtins.go:2034:3: unexpected nil error return after checking for a non-nil error; if this is not a mistake, add a "//nolint:returnerrcheck" comment

fixed, thanks


pkg/sql/sem/builtins/aggregate_builtins.go, line 2039 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: can we replace the second part of the conditional with something like errors.Is(err, strconv.ErrRange)?

Update: linter is suggesting this as well :)

        pkg/sql/sem/builtins/aggregate_builtins.go:2039:47: use errors.Is instead of a direct comparison
        For example:
           if errors.Is(err, errMyOwnErrReference) {
             ...
           }
        
        pkg/sql/sem/builtins/aggregate_builtins.go:2039:23: invalid direct cast on error object
        Alternatives:
           if _, ok := err.(*T); ok        ->   if errors.HasType(err, (*T)(nil))
           if _, ok := err.(I); ok         ->   if errors.HasInterface(err, (*I)(nil))
           if myErr, ok := err.(*T); ok    ->   if myErr := (*T)(nil); errors.As(err, &myErr)
           if myErr, ok := err.(I); ok     ->   if myErr := (I)(nil); errors.As(err, &myErr)
           switch err.(type) { case *T:... ->   switch { case errors.HasType(err, (*T)(nil)): ...

Done.


pkg/sql/sem/builtins/aggregate_builtins.go, line 2074 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

IIUC we could avoid the allocation of a new decimal here and reuse a.tmp, right?

you're right, replaced where possible.

@yuzefovich yuzefovich changed the title Used Decimals to calculate transition values for aggregate functions. builtins: used Decimals to calculate transition values for aggregate function Jan 4, 2022
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last question from me, but otherwise :lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mneverov)


pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):

	// Variables used across iterations.
	ed                       *apd.ErrDecimal
	n, sx, sxx, sy, syy, sxy *apd.Decimal

Actually, rather than storing decimals by pointer, can we store them by value here? Then we won't need to explicitly set them in the constructor.

Copy link
Copy Markdown
Contributor Author

@mneverov mneverov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Actually, rather than storing decimals by pointer, can we store them by value here? Then we won't need to explicitly set them in the constructor.

Yes we can. I decided against it because then in the add funcion and in all result functions I would need to use & operator.
Don't have strong opinion about it, let me know if the values in the struct are preferable.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mneverov)


pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):

Previously, mneverov (Max Neverov) wrote…

Yes we can. I decided against it because then in the add funcion and in all result functions I would need to use & operator.
Don't have strong opinion about it, let me know if the values in the struct are preferable.

I think the values are preferable in this case. My reasoning is that then all apd.Decimal objects don't have to be explicitly instantiated and all of them are also in the same memory location as regressionAccumulatorDecimalBase struct as a whole. Taking the pointer from value (i.e. & operator) is essentially free AFAIK.

…functions.

Related to cockroachdb#58347.

Previously, when numbers in input exceeded <X>e+16 that led to loss of
significance in calculation of the transition values and the result of aggregate
functions. This in its turn led to result of aggregate function depends on the
order of the numbers in a table. This was OK since this behaviour was identical
to what Postgres has.
Calculations in the distributed mode comprises two stages: split the data with
the calculating transition values locally and the subsequent merge of the
results from the first step. Merging operations can happen in any order which
means that the order of operation should not affect the result. In cockroachdb#58347 it was
decided to use Decimal type to calculate transition values for aggregate
functions. This solves the problem with the loss of significance, and hence with
the order of the merge operations in distributed mode.

Release note: None
Copy link
Copy Markdown
Contributor Author

@mneverov mneverov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)


pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think the values are preferable in this case. My reasoning is that then all apd.Decimal objects don't have to be explicitly instantiated and all of them are also in the same memory location as regressionAccumulatorDecimalBase struct as a whole. Taking the pointer from value (i.e. & operator) is essentially free AFAIK.

Sure, & is free, I was thinking from the code aesthetics perspective :)

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=yuzefovich

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mneverov)


pkg/sql/sem/builtins/aggregate_builtins.go, line 1842 at r2 (raw file):

Previously, mneverov (Max Neverov) wrote…

Sure, & is free, I was thinking from the code aesthetics perspective :)

I see :)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 5, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 6, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 6, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 6, 2022

Build succeeded:

@craig craig bot merged commit 3401e4d into cockroachdb:master Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants