colexec: make router outputs and external operators more dynamic#56935
colexec: make router outputs and external operators more dynamic#56935craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
2ded6f9 to
3402660
Compare
|
The benchmarks are here. |
asubiotto
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colexec/routers.go, line 419 at r1 (raw file):
for toAppend := len(selection); toAppend > 0; { if o.mu.pendingBatch == nil { if o.pendingBatchCapacity < coldata.BatchSize() {
Is there enough overlap with other dynamically increasing behavior up to coldata.BatchSize to put a helper function that all code uses in coldata?
This commit improves the last uses of the batches with fixed `coldata.BatchSize()` size: - it moves the allocations of some batches in external operators from the constructor into `Init` method. Note that previously, even if the spilling to disk didn't occur, we were still allocating these batches (we additionally delay one map creation in the external hash joiner till `Init` too) - it makes the allocation of `pendingBatch` in the router outputs dynamic. Additionally, this commit makes `tupleHashDistributor` more dynamic too by allocating the slices at run time. Release note: None
3402660 to
829177f
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)
pkg/sql/colexec/routers.go, line 419 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is there enough overlap with other dynamically increasing behavior up to
coldata.BatchSizeto put a helper function that all code uses incoldata?
I don't think so - ResetMaybeReallocate works dynamically based on the assumption that we have an "old batch" that we can check the capacity of and double it if necessary, but in this case we always don't have that old batch.
We could have reused ResetMaybeReallocate method as follows:
diff --git a/pkg/sql/colexec/routers.go b/pkg/sql/colexec/routers.go
index 8688d46357..dbadf0a66e 100644
--- a/pkg/sql/colexec/routers.go
+++ b/pkg/sql/colexec/routers.go
@@ -426,12 +426,9 @@ func (o *routerOutputOp) addBatch(ctx context.Context, batch coldata.Batch, sele
o.pendingBatchCapacity = len(selection)
} else {
o.pendingBatchCapacity *= 2
- if o.pendingBatchCapacity > coldata.BatchSize() {
- o.pendingBatchCapacity = coldata.BatchSize()
- }
}
}
- o.mu.pendingBatch = o.mu.unlimitedAllocator.NewMemBatchWithFixedCapacity(o.types, o.pendingBatchCapacity)
+ o.mu.pendingBatch, _ = o.mu.unlimitedAllocator.ResetMaybeReallocate(o.types, o.mu.pendingBatch, o.pendingBatchCapacity)
}
available := o.mu.pendingBatch.Capacity() - o.mu.pendingBatch.Length()
numAppended := toAppend
but still we are responsible for calculating min capacity dynamically.
Hm, actually it might be still worth using this diff since it indicates that we have a dynamic batch size behavior (although of a slightly different variant). I'll make that change.
|
TFTR! bors r+ |
|
Build failed (retrying...): |
|
Build failed: |
|
bors r+ |
|
Build succeeded: |
This commit improves the last uses of the batches with fixed
coldata.BatchSize()size:the constructor into
Initmethod. Note that previously, even if thespilling to disk didn't occur, we were still allocating these batches
(we additionally delay one map creation in the external hash joiner till
Inittoo)pendingBatchin the router outputsdynamic.
Additionally, this commit makes
tupleHashDistributormore dynamic tooby allocating the slices at run time.
Release note: None