Skip to content

coldata: change Batch interface to operate with int length#45534

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:batch-int
Mar 2, 2020
Merged

coldata: change Batch interface to operate with int length#45534
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:batch-int

Conversation

@yuzefovich
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich commented Feb 28, 2020

This commit changes Batch interface to operate with int length
because it unifies the way we store batches - now we will have a single
implementation MemBatch (with an exception for coldata.ZeroBatch). This
allows us to remove bufferedBatch.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@asubiotto
Copy link
Copy Markdown
Contributor

Is this RFAL?

@yuzefovich yuzefovich force-pushed the batch-int branch 2 times, most recently from a6d3205 to dab5c35 Compare February 28, 2020 23:19
@yuzefovich yuzefovich requested review from a team and asubiotto February 28, 2020 23:19
@yuzefovich
Copy link
Copy Markdown
Member Author

RFAL.

@yuzefovich yuzefovich force-pushed the batch-int branch 2 times, most recently from abd9434 to ee985b1 Compare February 29, 2020 04:04
@yuzefovich
Copy link
Copy Markdown
Member Author

yuzefovich commented Feb 29, 2020

Ran some benchmarks, and it seems like int is less performant (with more variance) than uint64.

This uint16 vs int (with nulls assertion) are the benchmarks of master vs only the first commit in this PR.

This uint16 vs int (without nulls assertion) are the benchmarks of master vs both commits in this PR (the motivation was that it seemed like nulls cases became slower, I did change the arguments from uint64 to int and added an assertion, so I wanted to see that that wasn't the issue).

@yuzefovich
Copy link
Copy Markdown
Member Author

Ran some more benchmarks.

Rerunning on gceworker uint16 vs uint64.

And actually comparing the two choices to each other uint64 vs int (without nulls assertion). Seems like uint64 is slightly more performant on average here as well.

@asubiotto
Copy link
Copy Markdown
Contributor

I think part of the problem with nulls is that we simplified the NullAt code when we had uint64's and do a division by 8. In the case of the uint64 this gets optimized to bit operations while in the int case it doesn't. I wonder what happens if you do the division using bit operations in the int case.

@yuzefovich yuzefovich force-pushed the batch-int branch 2 times, most recently from 329c5f4 to 9fc4806 Compare March 1, 2020 20:12
@yuzefovich
Copy link
Copy Markdown
Member Author

I think you're right about nulls. I switched division by 8 and modulo by 8 to explicit bit operations. I also removed some of the unnecessary int() conversions from _tmpl files (apparently, they are not checked by the linter).

Here are the new benchmarks:

Seems like we can proceed with int, there only worrisome benchmark is of the hash joiner, but I think that can be addressed separately. We might want to switch hashTable to operate on ints as well, that might be the issue.

@yuzefovich
Copy link
Copy Markdown
Member Author

One of the nice things about using ints for the selection vector is that we might be able to remove some of the templated if statements which we needed to convert previous uint16 to int - now we can iterate over the loop with for _, idx := range sel[:n] and for idx := 0; idx < n; idx++ and idx will be of the same int type. I actually made such simplification in one of the templates.

@yuzefovich yuzefovich force-pushed the batch-int branch 2 times, most recently from 0a7de14 to 6a42708 Compare March 1, 2020 23:24
@yuzefovich
Copy link
Copy Markdown
Member Author

By accident, when updating uint64 branch to use ints instead, I replaced OpStr of hash overloads in overloads.go. This caused the hash operations to have uint64(int(...)) conversions which showed up in the benchmarks. I updated the hash joiner benchmark with this change reverted, and now they look reasonable, so I don't see any worrisome regressions. RFAL.

@asubiotto
Copy link
Copy Markdown
Contributor

Awesome, will take a look today.

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just to double check, you disabled batch size randomization in the benchmarks, right?

Reviewed 5 of 120 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/col/coldata/nulls_test.go, line 162 at r1 (raw file):

		for _, i := range swapPos {
			for _, j := range swapPos {
				n.swap(uint64(i), uint64(j))

Can we change swap to take in int arguments as well?

This commit changes Batch interface to operate with int length
because it unifies the way we store batches - now we will have a single
implementation MemBatch (with an exception for coldata.ZeroBatch). This
allows us to remove bufferedBatch.

Release note: None
Copy link
Copy Markdown
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, batch size randomization was disabled on both gceworker and MBP.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)


pkg/col/coldata/nulls_test.go, line 162 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Can we change swap to take in int arguments as well?

Done.

@asubiotto
Copy link
Copy Markdown
Contributor

:shipit:

@yuzefovich
Copy link
Copy Markdown
Member Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 2, 2020

Build succeeded

@craig craig bot merged commit 003fa43 into cockroachdb:master Mar 2, 2020
@yuzefovich yuzefovich deleted the batch-int branch March 2, 2020 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants