Skip to content

sql/bulkmerge: reuse SST iterator across bulk merge tasks#160632

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
spilchen:general/260107/1043/merge/one-bulkmerge-iterator
Jan 10, 2026
Merged

sql/bulkmerge: reuse SST iterator across bulk merge tasks#160632
craig[bot] merged 1 commit intocockroachdb:masterfrom
spilchen:general/260107/1043/merge/one-bulkmerge-iterator

Conversation

@spilchen
Copy link
Copy Markdown
Contributor

@spilchen spilchen commented Jan 7, 2026

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: @jeffswenson

This change reduces overhead in the bulk merge processor by initializing
a single iterator over all input SSTs at startup, rather than creating a
new one per task. The iterator is reused across tasks, seeking only when
needed.

Informs cockroachdb#159414
Epic: CRDB-48845
Release note: none

Co-authored by: @jeffswenson
@spilchen spilchen self-assigned this Jan 7, 2026
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@spilchen spilchen marked this pull request as ready for review January 7, 2026 17:22
@spilchen spilchen requested review from jeffswenson and mw5h January 7, 2026 17:22
@spilchen
Copy link
Copy Markdown
Contributor Author

spilchen commented Jan 7, 2026

Copy link
Copy Markdown
Contributor

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@mw5h reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson).

@spilchen
Copy link
Copy Markdown
Contributor Author

spilchen commented Jan 9, 2026

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Jan 9, 2026
160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

160760: execbuilder: fix a stats-related flake in a new test r=yuzefovich a=yuzefovich

Fixes: #160752.
Fixes: #160753.
Release note: None

160764: sql/copy: fix rare flake in TestLargeCopy r=yuzefovich a=yuzefovich

We have automatic retry mechanism for COPY but it can only be used for non-atomic COPY. If we have the atomic COPY and hit a txn retry error, it's bubbled up to the client. We now adjust `TestLargeCopy` to match this behavior fixing a rare flake where we'd fail the test on the txn retry error when we should've ignored it.

Fixes: #160537.
Release note: None

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Jan 10, 2026
160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed:

@yuzefovich
Copy link
Copy Markdown
Member

courtesy merge

bors retry

craig bot pushed a commit that referenced this pull request Jan 10, 2026
160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed:

@yuzefovich
Copy link
Copy Markdown
Member

courtesy merge 2

bors retry

craig bot pushed a commit that referenced this pull request Jan 10, 2026
160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed:

@yuzefovich
Copy link
Copy Markdown
Member

courtesy merge 3

bors retry

craig bot pushed a commit that referenced this pull request Jan 10, 2026
160580: opt: fix PruneUnionAllCols panic with outer scope columns r=michae2 a=DrewKimball

Before this change, the PruneUnionAllCols normalization rule would panic in crdb-test builds when the projection above a UnionAll referenced columns from an outer scope (e.g., due to an apply-join or routine). This occurred because the rule computed the needed column set by combining ProjectionOuterCols and passthrough columns, which could include outer scope columns not present in the UnionAll's output. These outer columns were then passed to NeededColMapLeft/Right, which call TranslateColSetStrict and panic when given unknown columns.

This change fixes the issue by intersecting the needed column set with the UnionAll's output columns before passing it to NeededColMapLeft/ Right. This ensures only columns actually present in the UnionAll are translated, preventing the panic.

Fixes #159793

Release note: None

Co-Authored-By: Claude <noreply@anthropic.com>

160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Jan 10, 2026
158029: colfetcher: emit periodic query progress update metadata r=yuzefovich a=yuzefovich

This commit extends the query progress reporting that we do in the row-by-row tableReader to the vectorized scan operators too. Namely, after about 20k rows have been output, we'll emit the RowsRead metadata that we then use in DistSQLReceiver to update progressAtomic. Then the result shows up in `phase` column of SHOW QUERIES.

Fixes: #26639.

Release note (sql change): Queries executed via the vectorized engine now display their progress in `phase` column of SHOW QUERIES. Previously, this feature was only available in the row-by-row engine.

160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Jan 10, 2026
160632: sql/bulkmerge: reuse SST iterator across bulk merge tasks r=spilchen a=spilchen

This change reduces overhead in the bulk merge processor by initializing a single iterator over all input SSTs at startup, rather than creating a new one per task. The iterator is reused across tasks, seeking only when needed.

Informs #159414
Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

160842: schemachange: fix recent flake in TestWorkload r=yuzefovich a=yuzefovich

Recently merged in ad868ab extension to the test is flaky - fix up a couple of minor bugs.

Fixes: #160814.
Release note: None

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 10, 2026

@craig craig bot merged commit df498b8 into cockroachdb:master Jan 10, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants