Skip to content

util: improve allocation of combine functions#92396

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
maryliag:improve-combine
Nov 24, 2022
Merged

util: improve allocation of combine functions#92396
craig[bot] merged 1 commit intocockroachdb:masterfrom
maryliag:improve-combine

Conversation

@maryliag
Copy link
Copy Markdown
Contributor

@maryliag maryliag commented Nov 23, 2022

Part Of #89918

Previously, the functions CombineUniqueString and CombineUniqueInt64 were always allocating a new object. This change makes improvements on both functions.

Results after running YCSB for a 10 minutes on each implementation:
https://www.loom.com/share/0073101092ae4dffacb254c0363f2132

Before
Screen Shot 2022-11-23 at 6 17 20 PM

After
Screen Shot 2022-11-23 at 6 17 01 PM

Release note: None

@maryliag maryliag requested a review from a team November 23, 2022 16:39
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@amyyq2 amyyq2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Copy link
Copy Markdown
Contributor

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag)


pkg/util/slices.go line 15 at r1 (raw file):

// CombineUniqueInt64 combines two ordered int64 slices and returns
// the result without duplicates.
func CombineUniqueInt64(a []int64, b []int64) []int64 {

Am i right in seeing that if a and b have the same elements (edit: if a is a superset of b), then just the slice a will be returned. Is that what we want?
It might be unexpected that only sometimes a new slice will be returned.

@xinhaoz
Copy link
Copy Markdown
Contributor

xinhaoz commented Nov 23, 2022

pkg/util/slices.go line 24 at r1 (raw file):

			aIter++
		} else {
			a = append(a, 0)

Commenting mostly to learn, but in most cases we use this function, I'm assuming a and b dont have sufficient capacity for the combined result, so aren't we reallocating in most cases anyway? If that's the case, does this really improve things all that much?

Copy link
Copy Markdown
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/util/slices.go line 15 at r1 (raw file):
that is what we want, we're combining without duplicates, so if
a = 1,2,3
b = 1,2,3

I want the final value to be 1,2,3

It might be unexpected that only sometimes a new slice will be returned.
I don't follow what you mean here


pkg/util/slices.go line 24 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Commenting mostly to learn, but in most cases we use this function, I'm assuming a and b dont have sufficient capacity for the combined result, so aren't we reallocating in most cases anyway? If that's the case, does this really improve things all that much?

the majority of our uses A will be a big slice and B will only have 1 (sometimes a few elements), so what we want to do is adding the elements of B that don't yet exist on A, and keeping A in order.

The first time a statement is executed we don't have the stats, so we're creating the array for the first time, meaning A will be empty, and we just add everything from B. All the following executions we will try add B. If all B is already contained in A, nothing needs to be done and we just return A, if we need to add an element from B, the code is doing:

A = 1,2,4,5
B = 3,6,7

a = append(a, 0)                     -->  A = 1,2,4,5,0
copy(a[aIter+1:], a[aIter:])     -->  A = 1,2,4,4,5
a[aIter] = b[bIter]                   -->  A = 1,2,3,4,5

...
a = append(a, b[bIter:]...)       -->  A = 1,2,3,4,5,6,7

On the previous approach it was always creating a new slice, and then adding elements of A and B, meaning we would always have one extra allocation (the final slice), even on the cases that returning A without any changes was the right answer.
With this approach we barely touch anything on the majority of cases, and for the ones that needs to be added we do the append.

Copy link
Copy Markdown
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/util/slices.go line 15 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

that is what we want, we're combining without duplicates, so if
a = 1,2,3
b = 1,2,3

I want the final value to be 1,2,3

It might be unexpected that only sometimes a new slice will be returned.
I don't follow what you mean here

Btw, this is one of the test cases: https://github.com/cockroachdb/cockroach/pull/92396/files#diff-8937929eae9d09d7fc334c884257ce21095b2a706777347a6fb078be564c93a4R60-R62

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we don't have scheduled 21.2 releases anymore, so only security fixes should be backported.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @maryliag and @xinhaoz)


pkg/util/slices.go line 14 at r1 (raw file):

// CombineUniqueInt64 combines two ordered int64 slices and returns
// the result without duplicates.

It'd be good to explicitly call out here that the assumption is such that this function is efficient when b is small or has mostly the same elements as a so that small number of new elements from b need to be added.

One could construct an example where a and b are both large and have almost no duplicate elements in which the additional copies we perform when inserting an element from b into a with the new algorithm would lead to a quadratic behavior and overall increase in CPU.


pkg/util/slices.go line 16 at r1 (raw file):

// the result without duplicates.
func CombineUniqueInt64(a []int64, b []int64) []int64 {
	aIter, bIter := 0, 0

nit: we could have a quick check and if b is larger than a, we'd swap them. Up to you if this is needed though.

Copy link
Copy Markdown
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz and @yuzefovich)


pkg/util/slices.go line 14 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It'd be good to explicitly call out here that the assumption is such that this function is efficient when b is small or has mostly the same elements as a so that small number of new elements from b need to be added.

One could construct an example where a and b are both large and have almost no duplicate elements in which the additional copies we perform when inserting an element from b into a with the new algorithm would lead to a quadratic behavior and overall increase in CPU.

Good point! Added the comments on both functions


pkg/util/slices.go line 16 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we could have a quick check and if b is larger than a, we'd swap them. Up to you if this is needed though.

Not only is needed, but adding that is what made the missing allocation no longer exist! 😄
I updated the PR images and loom to show the new results, where we no longer see allocations coming from these functions!

Part Of cockroachdb#89918

Previously, the functions `CombineUniqueString` and
`CombineUniqueInt64` were always allocating a new object.
This change makes improvements on both functions.

Release note: None
Copy link
Copy Markdown
Contributor Author

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz and @yuzefovich)


pkg/util/slices.go line 15 at r1 (raw file):
from your edited message

if a is a superset of b, then just the slice a will be returned

If all elements of B are already on A, there is nothing to be added and we can just return A
If some or all elements of B are not on A, they will be added to A and then A will be returned
And that is what we want

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)

@maryliag
Copy link
Copy Markdown
Contributor Author

TFTRs!
bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 24, 2022

Build succeeded:

@craig craig bot merged commit 39250f6 into cockroachdb:master Nov 24, 2022
@maryliag maryliag deleted the improve-combine branch November 24, 2022 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants