Skip to content

colexec: improve sort operator a bit#67638

Merged
craig[bot] merged 2 commits into
cockroachdb:masterfrom
yuzefovich:sort-cleanup
Jul 15, 2021
Merged

colexec: improve sort operator a bit#67638
craig[bot] merged 2 commits into
cockroachdb:masterfrom
yuzefovich:sort-cleanup

Conversation

@yuzefovich

@yuzefovich yuzefovich commented Jul 15, 2021

Copy link
Copy Markdown
Member

colexec: improve sort operator a bit

Previously, we were missing memory accounting around some slices
allocated internally by the sort operator which is now added.
Additionally, the references to those slices are now kept by the
operator which will be useful when the sorter is used by the external
sorter.

Additionally, this commit eliminates some bounds checks.

Release note: None

colexec: add some BCE assertions for distinct and sort

This is achieved by templating the inlined execgen functions. Note that
in some cases this templating leaves redundant _ = true lines, but
figuring out what's up with that is left as a TODO.

Additionally, this commit removes some redundant attempts at achieving
BCE for non-sliceable vectors (which had a negative impact in case of
JSONs) and clarifies the name of a utility function.

Release note: None

@yuzefovich yuzefovich requested review from a team and mgartner July 15, 2021 00:05
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

Previously, we were missing memory accounting around some slices
allocated internally by the sort operator which is now added.
Additionally, the references to those slices are now kept by the
operator which will be useful when the sorter is used by the external
sorter.

Additionally, this commit eliminates some bounds checks.

Release note: None
@DrewKimball

Copy link
Copy Markdown
Collaborator

@yuzefovich where are the assertion comments omitted? I probably have enough context to fix it fairly easily.

@yuzefovich

Copy link
Copy Markdown
Member Author

I think the best way to see them is through the web browser on github.com, if you look at Files changed and click Load diff for distinct.eg.go.

@yuzefovich

yuzefovich commented Jul 15, 2021

Copy link
Copy Markdown
Member Author

As Drew correctly pointed out, the gcassert comments are always added into the generated code wherever appropriate, but sometimes there are useless _ = true lines. I think it's ok to leave them as is since they should be compiled away.

I also ran the microbenchmarks for the external sort, and they didn't show much of change (we're only using int64 though).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a BCE here - colBCE should be true.

@DrewKimball DrewKimball left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my two comments are resolved, LGTM.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mgartner mgartner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/sort.go, line 419 at r1 (raw file):

		// that were true in the distinct vector.
		sizeBefore := memsize.Int * int64(cap(p.scratch.partitions))
		p.scratch.partitions = boolVecToSel64(partitionsCol, p.scratch.partitions[:0])

Not related to your change, but why is it called boolVecToSel64 if []int is returned. I'd expect []int64 to be returned.

This is achieved by templating the inlined execgen functions. Note that
in some cases this templating leaves redundant `_ = true` lines, but
figuring out what's up with that is left as a TODO.

Additionally, this commit removes some redundant attempts at achieving
BCE for non-sliceable vectors (which had a negative impact in case of
JSONs) and clarifies the name of a utility function.

Release note: None

@yuzefovich yuzefovich left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


pkg/sql/colexec/sort.go, line 419 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Not related to your change, but why is it called boolVecToSel64 if []int is returned. I'd expect []int64 to be returned.

The selection vector used to be []int64 but it was refactored a couple of releases ago. Fixed.


pkg/sql/colexec/colexecbase/distinct_tmpl.go, line 273 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think we're missing a BCE here - colBCE should be true.

No, I don't think so - actually, we're accessing col with checkIdx == order[outputIdx], and the compiler cannot prove that values in order slice are in bounds. Removed the redundant attempt at BCE for it.


pkg/sql/colexec/colexecbase/distinct_tmpl.go, line 279 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

ditto

Done.

@yuzefovich

Copy link
Copy Markdown
Member Author

TFTRs!

bors r+

@craig

craig Bot commented Jul 15, 2021

Copy link
Copy Markdown
Contributor

Build succeeded:

@craig craig Bot merged commit 21e78bb into cockroachdb:master Jul 15, 2021
@yuzefovich yuzefovich deleted the sort-cleanup branch July 15, 2021 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants