Skip to content

crosscluster/logical: commit planner txn early#134997

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
msbutler:butler-planning-txn-explore
Nov 13, 2024
Merged

crosscluster/logical: commit planner txn early#134997
craig[bot] merged 2 commits intocockroachdb:masterfrom
msbutler:butler-planning-txn-explore

Conversation

@msbutler
Copy link
Copy Markdown
Collaborator

@msbutler msbutler commented Nov 12, 2024

During large scale testing, we have seen the CreateForTables remote rpc take
several seconds or even minutes in LDR planning, and was previously within
scope of the planner txn, which could in turn cause txn refresh errors. This
patch commits the planner txn earlier in the job hook, before the
CreateForTables call, to prevent the planner txn from retrying. Now, the
destination side schema validation, locking, and job writing occur in a
subsequent txn.

Epic: none

Release note: none

@msbutler msbutler self-assigned this Nov 12, 2024
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Nov 12, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@msbutler msbutler force-pushed the butler-planning-txn-explore branch from a5c58f6 to df6f002 Compare November 12, 2024 19:37
@msbutler msbutler changed the title Butler planning txn explore crosscluster/logical: commit planner txn early Nov 12, 2024
@msbutler msbutler requested a review from dt November 12, 2024 19:38
@msbutler msbutler marked this pull request as ready for review November 12, 2024 19:38
@msbutler msbutler requested a review from a team as a code owner November 12, 2024 19:38
@msbutler msbutler requested a review from rafiss November 12, 2024 19:53
@msbutler
Copy link
Copy Markdown
Collaborator Author

unrelated flake.

@msbutler msbutler force-pushed the butler-planning-txn-explore branch from df6f002 to 0c53dad Compare November 13, 2024 12:51
During large scale testing, we have seen the CreateForTables remote rpc take
several seconds or even minutes in LDR planning, and was previously within
scope of the planner txn, which could in turn cause txn refresh errors. This
patch commits the planner txn earlier in the job hook, before the
CreateForTables call, to prevent the planner txn from retrying. Now, the
destination side schema validation, locking, and job writing occur in a
subsequent txn.

Epic: none

Release note: none
Release the planner txn leases, so the subsequent txn which writes the job
record and locks the replicating tables is guaranteed to not wait for the
planner txn leases to expire.

Epic: none

Release note: none
@msbutler msbutler force-pushed the butler-planning-txn-explore branch from 0c53dad to 0488e61 Compare November 13, 2024 13:27
@msbutler
Copy link
Copy Markdown
Collaborator Author

TFTR!

bors r=dt

@msbutler msbutler added backport-24.3.x Flags PRs that need to be backported to 24.3 backport-24.3.0-rc labels Nov 13, 2024
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 13, 2024

@craig craig bot merged commit 615462e into cockroachdb:master Nov 13, 2024
msbutler added a commit to msbutler/cockroach that referenced this pull request Nov 22, 2024
PR cockroachdb#134997 introduced a bug where CREATE LOGICAL REPLICATION could return early
with a bogus job id if the txn which writes the job record retried. This patch
prevents this.

Note: I plan to open a follow up PR that moves the ldr planning txn into its
own helper func.

Fixes cockroachdb#135176
Fixes cockroachdb#135490

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this pull request Nov 22, 2024
PR cockroachdb#134997 introduced a bug where CREATE LOGICAL REPLICATION could return early
with a bogus job id if the txn which writes the job record retried. This patch
prevents this.

Note: I plan to open a follow up PR that moves the ldr planning txn into its
own helper func.

Fixes cockroachdb#135176
Fixes cockroachdb#135490

Release note: none
craig bot pushed a commit that referenced this pull request Nov 22, 2024
135994: crosscluster/logical: only return jobID if planner txn succeeded r=kev-cao a=msbutler

PR #134997 introduced a bug where CREATE LOGICAL REPLICATION could return early with a bogus job id if the txn which writes the job record retried. This patch prevents this.

Note: I plan to open a follow up PR that moves the ldr planning txn into its own helper func.

Fixes #135176
Fixes #135490

Release note: none

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this pull request Nov 22, 2024
PR #134997 introduced a bug where CREATE LOGICAL REPLICATION could return early
with a bogus job id if the txn which writes the job record retried. This patch
prevents this.

Note: I plan to open a follow up PR that moves the ldr planning txn into its
own helper func.

Fixes #135176
Fixes #135490

Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-24.3.x Flags PRs that need to be backported to 24.3 T-disaster-recovery

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants