Skip to content

release-22.2: sql: parallelize FK and UNIQUE constraints#100520

Merged
yuzefovich merged 5 commits intocockroachdb:release-22.2from
yuzefovich:backport22.2-92859-96123-98713
Apr 4, 2023
Merged

release-22.2: sql: parallelize FK and UNIQUE constraints#100520
yuzefovich merged 5 commits intocockroachdb:release-22.2from
yuzefovich:backport22.2-92859-96123-98713

Conversation

@yuzefovich
Copy link
Copy Markdown
Member

Backport 1/1 commits from #92859.
Backport 3/3 commits from #96123.
Backport 1/1 commits from #98713.

/cc @cockroachdb/release


rowcontainer: support concurrent reads and some cleanup

This commit refactors the row container infrastructure in order to make
it possible to read from a single container concurrently (via separate
iterators). It required moving some of the scratch space from the
container directly into the iterators (so that the iterators don't share
anything between each other) as well as making a similar adjustment to
the pebbleMapIterator. As a result, it is now possible to safely
concurrently iterate over all types of row containers; however, the
iterators need to be created and closed under mutex protection (which
seems like an acceptable limitation). This commit also makes a minor
change to make the container a named parameter inside of the iterator
(which makes it easier to see when an iterator is touching the
container).

The ability to read from the same row container (after all rows have
been added to it) concurrently will be utilized by the follow-up commit
in order to support validation of multiple FK and UNIQUE checks in
parallel.

Additionally, this commit improves RowIterator interface a bit by
renaming Row to EncRow (since the method returns EncDatumRows) and
introducing a new Row implementation which returns tree.Datums. The
new method is now utilized in the sql.rowContainerIterator which
removes some of the unnecessary conversions between different row types.

Release note: None

sql: miscellaneous cleanup and improvement

This commit performs a bunch of minor cleanups and improvements in
preparation for the follow-up commit, in which we parallelize the
execution of post-query checks. The main rationale for changes in this
commit is making it easier to reason about concurrency-safety of
DistSQL infra.

The following changes are made:

  • remove unused receiver from FinalizePlan and
    finalizePlanWithRowCount methods
  • remove an unused parameter from getPlanDistribution
  • remove the reference to planner from getPlanDistribution in favor
    of passing a single boolean
  • refactor resetEvalCtx to make it concurrency-safe by extracting out
    the modification of extraTxnState into resetPlanner
  • refactor how topLevelQueryStats are aggregated across all parts of
    the stmt (i.e. across the main query, subqueries, and postqueries) to
    make it easier to introduce concurrency-safety to it in a follow-up
  • set planFlagVectorized only on the main query setup and explicitly
    propagate whether the flow is vectorized into saveFlows
  • fix a couple of typos in comments.

Release note: None

sql: parallelize FK and UNIQUE constraints

This commit makes it so that we parallelize the execution of multiple
"check plans" (which include FOREIGN KEY and UNIQUE constraint checks
that run as "postqueries"). The main idea is that we use the LeafTxns
(derived from the RootTxn that was used for the main mutation query) and
execute each check in a separate task. As a result, the checks should be
completed faster, especially so in multi-region environments.

This required introduction of mutex-protection for several
planning-infra-related objects:

  • saveFlows function that populates miscellaneous info, needed for the
    stmt bundle
  • associateNodeWithComponents function that creates a mapping from
    planNodes to DistSQL processors, needed for different EXPLAIN variants
  • topLevelQueryStats.add function that aggregates some execution stats
    across all "queries" of a stmt.

Additionally, this commit also needed to make scanBufferNodes safe for
concurrent use. The way this works is that in the main query we have
bufferNode which has already accumulated some rows into a row
container. That row container can be read from concurrently as long as
the separate iterators are created (the creation and the closure must be
mutex-protected though).

All of these things are synchronized via a single "global" mutex. We
could have introduced a separate mutex for each of these things, but
a single one seems acceptable given that these operations should be
pretty quick and occur at different points throughout the checks'
execution.

Fixes: #95184.

Release note (performance improvement): The execution of multiple
FOREIGN KEY and UNIQUE constraint checks can be parallelized in some
cases. As a result, these checks can be completed faster, especially
so in multi-region environments where the checks require cross-region
reads. The feature is enabled by default on 23.1 release, and in order
to enable it on 22.2 one must change the private
sql.distsql.parallelize_checks.enabled cluster setting to true.

sql,kv: bubble up retry errors when creating leaf transactions

Previously, if we detected that the transaction was aborted when
trying to construct leaf transaction state, we would handle the retry
error instead of bubbling it up to the caller. When a transaction is
aborted, the TransactionRetryWithProtoRefreshError carries with it a
new transaction that should be used for subsequent attempts. Handling
the retry error entailed swapping out the old TxnCoordSender with a
new one -- one that is associated with this new transaction.

This is bug prone when trying to create multiple leaf transactions in
parallel if the root has been aborted. We would expect the first leaf
transaction to handle the error and all subsequent leaf transactions to
point to the new transaction, as the TxnCoordSender has been swapped
out. This wasn't an issue before as we never really created multiple
leaf transactions in parallel. This recently change in
0f4b431, which started parallelizing FK
and uniqueness checks. With this change, we could see FK or uniqueness
violations when in fact the transaction needed to be retried.

This patch fixes the issue described above by not handling the retry
error when creating leaf transactions. Instead, we expect the
ConnExecutor to retry the entire transaction and prepare it for another
iteration.

Fixes #97141

Epic: none

Release note: None

Release justification: low-risk high-impact performance improvement disabled by default.

This commit addresses an old TODO about figuring out why we cannot reuse
the same buffer in `diskRowIterator.Row` method. The contract of that
method was previously confusing - it says that the call to `Row`
invalidates the row returned on the previous call; however, the
important piece was missing - that the datums themselves cannot be
mutated (this is what we assume elsewhere and perform only the "shallow"
copies). This commit clarifies the contract of the method and explicitly
explains why we need to allocate fresh byte slices (which is done via
`ByteAllocator` to reduce the number of allocations).

Additional context can be found in cockroachdb#43145 which added this copy in the
first place. Here is a relevant quote from Alfonso:
```
I think what's going on here is that this type of contract (you may only
reuse the row until the next call) is a bit unclear. `CopyRow`s of
`EncDatum`s are only implemented as shallow copies, so the reference to
this reuse only applies to the `EncDatumRow`, but not to the `encoded`
field, which is what is rewritten in this case. The `encoded` field will
not be copied, so I think the tests are probably failing due to that.
This is unfortunate and there doesn't seem to be a good reason for it.
To implement deep copies, we will have to implement deep copies for
`Datum`s as well.
```
I think we were under the impression that we'd implement the "deep copy"
in `CopyRow`, but I highly doubt we'll do so given that we're mostly
investing in the columnar infrastructure nowadays, and the current way
of performing shallow copying has worked well long enough.

Epic: None

Release note: None
This commit refactors the row container infrastructure in order to make
it possible to read from a single container concurrently (via separate
iterators). It required moving some of the scratch space from the
container directly into the iterators (so that the iterators don't share
anything between each other) as well as making a similar adjustment to
the `pebbleMapIterator`. As a result, it is now possible to safely
concurrently iterate over all types of row containers; however, the
iterators need to be created and closed under mutex protection (which
seems like an acceptable limitation). This commit also makes a minor
change to make the container a named parameter inside of the iterator
(which makes it easier to see when an iterator is touching the
container).

The ability to read from the same row container (after all rows have
been added to it) concurrently will be utilized by the follow-up commit
in order to support validation of multiple FK and UNIQUE checks in
parallel.

Additionally, this commit improves `RowIterator` interface a bit by
renaming `Row` to `EncRow` (since the method returns `EncDatumRow`s) and
introducing a new `Row` implementation which returns `tree.Datums`. The
new method is now utilized in the `sql.rowContainerIterator` which
removes some of the unnecessary conversions between different row types.

Release note: None
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 3, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

This commit performs a bunch of minor cleanups and improvements in
preparation for the follow-up commit, in which we parallelize the
execution of post-query checks. The main rationale for changes in this
commit is making it easier to reason about concurrency-safety of
DistSQL infra.

The following changes are made:
- remove unused receiver from `FinalizePlan` and
`finalizePlanWithRowCount` methods
- remove an unused parameter from `getPlanDistribution`
- remove the reference to `planner` from `getPlanDistribution` in favor
of passing a single boolean
- refactor `resetEvalCtx` to make it concurrency-safe by extracting out
the modification of `extraTxnState` into `resetPlanner`
- refactor how `topLevelQueryStats` are aggregated across all parts of
the stmt (i.e. across the main query, subqueries, and postqueries) to
make it easier to introduce concurrency-safety to it in a follow-up
- set `planFlagVectorized` only on the main query setup and explicitly
propagate whether the flow is vectorized into `saveFlows`
- fix a couple of typos in comments.

Release note: None
This commit makes it so that we parallelize the execution of multiple
"check plans" (which include FOREIGN KEY and UNIQUE constraint checks
that run as "postqueries"). The main idea is that we use the LeafTxns
(derived from the RootTxn that was used for the main mutation query) and
execute each check in a separate task. As a result, the checks should be
completed faster, especially so in multi-region environments.

This required introduction of mutex-protection for several
planning-infra-related objects:
- `saveFlows` function that populates miscellaneous info, needed for the
stmt bundle
- `associateNodeWithComponents` function that creates a mapping from
`planNode`s to DistSQL processors, needed for different EXPLAIN variants
- `topLevelQueryStats.add` function that aggregates some execution stats
across all "queries" of a stmt.

Additionally, this commit also needed to make `scanBufferNode`s safe for
concurrent use. The way this works is that in the main query we have
`bufferNode` which has already accumulated some rows into a row
container. That row container can be read from concurrently as long as
the separate iterators are created (the creation and the closure must be
mutex-protected though).

All of these things are synchronized via a single "global" mutex. We
could have introduced a separate mutex for each of these things, but
a single one seems acceptable given that these operations should be
pretty quick and occur at different points throughout the checks'
execution.

Release note (performance improvement): The execution of multiple
FOREIGN KEY and UNIQUE constraint checks can be parallelized in some
cases. As a result, these checks can be completed faster, especially
so in multi-region environments where the checks require cross-region
reads. The feature is enabled by default on 23.1 release, and in order
to enable it on 22.2 one must change the private
`sql.distsql.parallelize_checks.enabled` cluster setting to `true`.
@yuzefovich yuzefovich force-pushed the backport22.2-92859-96123-98713 branch from 1b8ab37 to 509c1bf Compare April 3, 2023 22:10
Previously, if we detected that the transaction was aborted when
trying to construct leaf transaction state, we would handle the retry
error instead of bubbling it up to the caller. When a transaction is
aborted, the `TransactionRetryWithProtoRefreshError` carries with it a
new transaction that should be used for subsequent attempts. Handling
the retry error entailed swapping out the old `TxnCoordSender` with a
new one -- one that is associated with this new transaction.

This is bug prone when trying to create multiple leaf transactions in
parallel if the root has been aborted. We would expect the first leaf
transaction to handle the error and all subsequent leaf transactions to
point to the new transaction, as the `TxnCoordSender` has been swapped
out. This wasn't an issue before as we never really created multiple
leaf transactions in parallel. This recently change in
0f4b431, which started parallelizing FK
and uniqueness checks. With this change, we could see FK or uniqueness
violations when in fact the transaction needed to be retried.

This patch fixes the issue described above by not handling the retry
error when creating leaf transactions. Instead, we expect the
ConnExecutor to retry the entire transaction and prepare it for another
iteration.

Fixes cockroachdb#97141

Epic: none

Release note: None
@yuzefovich yuzefovich force-pushed the backport22.2-92859-96123-98713 branch from 509c1bf to ec2c237 Compare April 4, 2023 00:27
@yuzefovich yuzefovich marked this pull request as ready for review April 4, 2023 01:05
@yuzefovich yuzefovich requested review from a team as code owners April 4, 2023 01:05
@yuzefovich yuzefovich requested review from a team April 4, 2023 01:05
@yuzefovich yuzefovich requested a review from a team as a code owner April 4, 2023 01:05
@yuzefovich yuzefovich requested review from jbowens, mgartner, rhu713 and samiskin and removed request for a team April 4, 2023 01:05
@yuzefovich yuzefovich requested review from arulajmani, msirek, nvb and rharding6373 and removed request for a team, jbowens, mgartner, rhu713 and samiskin April 4, 2023 01:06
@yuzefovich
Copy link
Copy Markdown
Member Author

First four commits have already been approved in #97091 by Rachael and Mark, but it'd still be a good idea to take somewhat of a closer look on this backport. This parallelization of FKs is disabled by default, so it should be safe in a point release.

Only the last fifth commit should be reviewed by Arul and / or Nathan.

Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the combined backport. Fifth commit :lgtm:

Reviewed 3 of 3 files at r1, 20 of 20 files at r2, 22 of 22 files at r3, 7 of 7 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msirek, @nvanbenschoten, and @rharding6373)

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, 20 of 20 files at r2, 22 of 22 files at r3, 7 of 7 files at r4, 6 of 6 files at r5.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rharding6373)

@yuzefovich
Copy link
Copy Markdown
Member Author

TFTRs!

@yuzefovich yuzefovich merged commit 6fd4d76 into cockroachdb:release-22.2 Apr 4, 2023
@yuzefovich yuzefovich deleted the backport22.2-92859-96123-98713 branch April 4, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants