Skip to content

colexec: make router outputs and external operators more dynamic#56935

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:router-dynamic
Nov 21, 2020
Merged

colexec: make router outputs and external operators more dynamic#56935
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:router-dynamic

Conversation

@yuzefovich
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich commented Nov 19, 2020

This commit improves the last uses of the batches with fixed
coldata.BatchSize() size:

  • it moves the allocations of some batches in external operators from
    the constructor into Init method. Note that previously, even if the
    spilling to disk didn't occur, we were still allocating these batches
    (we additionally delay one map creation in the external hash joiner till
    Init too)
  • it makes the allocation of pendingBatch in the router outputs
    dynamic.

Additionally, this commit makes tupleHashDistributor more dynamic too
by allocating the slices at run time.

Release note: None

@yuzefovich yuzefovich requested review from a team and asubiotto November 19, 2020 23:26
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@yuzefovich
Copy link
Copy Markdown
Member Author

The benchmarks are here.

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/routers.go, line 419 at r1 (raw file):

	for toAppend := len(selection); toAppend > 0; {
		if o.mu.pendingBatch == nil {
			if o.pendingBatchCapacity < coldata.BatchSize() {

Is there enough overlap with other dynamically increasing behavior up to coldata.BatchSize to put a helper function that all code uses in coldata?

This commit improves the last uses of the batches with fixed
`coldata.BatchSize()` size:
- it moves the allocations of some batches in external operators from
the constructor into `Init` method. Note that previously, even if the
spilling to disk didn't occur, we were still allocating these batches
(we additionally delay one map creation in the external hash joiner till
`Init` too)
- it makes the allocation of `pendingBatch` in the router outputs
dynamic.

Additionally, this commit makes `tupleHashDistributor` more dynamic too
by allocating the slices at run time.

Release note: None
Copy link
Copy Markdown
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)


pkg/sql/colexec/routers.go, line 419 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is there enough overlap with other dynamically increasing behavior up to coldata.BatchSize to put a helper function that all code uses in coldata?

I don't think so - ResetMaybeReallocate works dynamically based on the assumption that we have an "old batch" that we can check the capacity of and double it if necessary, but in this case we always don't have that old batch.

We could have reused ResetMaybeReallocate method as follows:

diff --git a/pkg/sql/colexec/routers.go b/pkg/sql/colexec/routers.go
index 8688d46357..dbadf0a66e 100644
--- a/pkg/sql/colexec/routers.go
+++ b/pkg/sql/colexec/routers.go
@@ -426,12 +426,9 @@ func (o *routerOutputOp) addBatch(ctx context.Context, batch coldata.Batch, sele
                                        o.pendingBatchCapacity = len(selection)
                                } else {
                                        o.pendingBatchCapacity *= 2
-                                       if o.pendingBatchCapacity > coldata.BatchSize() {
-                                               o.pendingBatchCapacity = coldata.BatchSize()
-                                       }
                                }
                        }
-                       o.mu.pendingBatch = o.mu.unlimitedAllocator.NewMemBatchWithFixedCapacity(o.types, o.pendingBatchCapacity)
+                       o.mu.pendingBatch, _ = o.mu.unlimitedAllocator.ResetMaybeReallocate(o.types, o.mu.pendingBatch, o.pendingBatchCapacity)
                }
                available := o.mu.pendingBatch.Capacity() - o.mu.pendingBatch.Length()
                numAppended := toAppend

but still we are responsible for calculating min capacity dynamically.

Hm, actually it might be still worth using this diff since it indicates that we have a dynamic batch size behavior (although of a slightly different variant). I'll make that change.

@yuzefovich
Copy link
Copy Markdown
Member Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 20, 2020

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 20, 2020

Build failed:

@yuzefovich
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 21, 2020

Build succeeded:

@craig craig bot merged commit f5f0f11 into cockroachdb:master Nov 21, 2020
@yuzefovich yuzefovich deleted the router-dynamic branch November 21, 2020 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants