Skip to content

streamingccl: switch to CREATE TENANT ... FROM REPLICATION#92070

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
adityamaru:create-replication-stream
Nov 19, 2022
Merged

streamingccl: switch to CREATE TENANT ... FROM REPLICATION#92070
craig[bot] merged 3 commits intocockroachdb:masterfrom
adityamaru:create-replication-stream

Conversation

@adityamaru
Copy link
Copy Markdown
Contributor

@adityamaru adityamaru commented Nov 17, 2022

This change is a refactor of the replication logic
to support the new syntax that addresses tenants by name
instead of ID.

When setting up the stream ingestion job on the destination
cluster we do not know the source tenant ID. This information is necessary
for us to setup our rekeyer that is used to rekey every KV streamed
from the source before ingestion. The SourceTenantID is obtained from
the source cluster as part of the Topology that governs how we
setup our streaming partitions.

The rest of the streaming logic is largely unchanged.

Fixes: #91240

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@adityamaru
Copy link
Copy Markdown
Contributor Author

adityamaru commented Nov 17, 2022

I want CI to shake out stuff so please hold off on reviewing this PR for now!

@adityamaru adityamaru force-pushed the create-replication-stream branch from 6c71ef0 to 29d4e0e Compare November 17, 2022 19:30
@adityamaru adityamaru changed the title [WIP,DNM] parser: add CREATE TENANT...FROM REPLICATION OF syntax streamingccl: switch to CREATE TENANT ... FROM REPLICATION Nov 17, 2022
@adityamaru adityamaru force-pushed the create-replication-stream branch 2 times, most recently from 9ba1dae to 4df6ece Compare November 17, 2022 19:52
@adityamaru adityamaru marked this pull request as ready for review November 17, 2022 19:53
@adityamaru adityamaru requested review from a team as code owners November 17, 2022 19:53
@adityamaru adityamaru requested review from a team, dt, herkolategan, lidorcarmel, rhu713, smg260 and stevendanna and removed request for a team, herkolategan, rhu713 and smg260 November 17, 2022 19:53
@shermanCRL shermanCRL requested a review from baoalvin1 November 17, 2022 19:59
Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick pass.. just a few questions.

Reviewed 5 of 5 files at r1, 36 of 36 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @dt, and @stevendanna)


-- commits line 12 at r1:
did we say no release notes? so that we will not have docs for now?

Code quote:

  Release note (sql change): add syntax to create a tenant whose data will be
  streamed from a tenant on the source cluster

pkg/ccl/streamingccl/streamclient/client.go line 119 at r2 (raw file):

	SrcAddr       streamingccl.PartitionAddress
	SrcLocality   roachpb.Locality
	SrcTenantID   roachpb.TenantID

can/should we use TenantName here instead of ID?

Code quote:

SrcTenantID   roachpb.TenantID

pkg/sql/sem/tree/create.go line 2190 at r1 (raw file):

	// ReplicationSourceAddress is the address of the source cluster that we are
	// replicating data from.
	ReplicationSourceAddress Expr

those probably are not needed here?

can we drop CreateTenant completely and have only CreateTenantFromReplication? (and rename CreateTenantFromReplication to CreateTenant... I think you see where I'm going..)

Code quote:

	// ReplicationSourceTenantName is the name of the tenant that we are
	// replicating into the newly created tenant.
	ReplicationSourceTenantName Name
	// ReplicationSourceAddress is the address of the source cluster that we are
	// replicating data from.
	ReplicationSourceAddress Expr

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Overall looks reasonable.

I think there is at least 1 more use of the old syntax over in the roachtest.

In a follow up PR, we probably want to rip out the RESTORE TENANT ... FROM REPLICATION syntax from sql.y and friends as well.

Comment on lines +37 to +38
sourceTenantID roachpb.TenantID,
destinationTenantID roachpb.TenantID,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 much better names.

{
$$.val = &tree.CreateTenant{Name: tree.Name($3)}
}
| CREATE TENANT name FROM REPLICATION OF name ON string_or_placeholder
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this statement might be the one where we really end up wanting to use placeholders for these names. But, no need to change it now, we can do a follow-up PR where we just change them.

Copy link
Copy Markdown
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @baoalvin1, @dt, @lidorcarmel, and @stevendanna)


-- commits line 12 at r1:

Previously, lidorcarmel (Lidor Carmel) wrote…

did we say no release notes? so that we will not have docs for now?

Old habits die hard, removing!


pkg/ccl/streamingccl/streamclient/client.go line 119 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

can/should we use TenantName here instead of ID?

nice catch, its now an unused field. Deleted.


pkg/ccl/streamingccl/streamproducer/stream_lifetime.go line 35 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Looking at this file, it looks like this is done? We create the PTS record in this function, advanced in updateReplicationStreamProgress, and then released in the producer job's OnFailOrCancel routine.

indeed, removed!


pkg/sql/parser/sql.y line 4166 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I think this statement might be the one where we really end up wanting to use placeholders for these names. But, no need to change it now, we can do a follow-up PR where we just change them.

👍 makes sense

@adityamaru adityamaru force-pushed the create-replication-stream branch from 4df6ece to 24bdc3a Compare November 18, 2022 14:51
This change adds the following syntax:

`CREATE TENANT <dest-name> FROM REPLICATION OF <source-name> ON <source-cluster>`

The syntax will be hooked up in a future commit.

Informs: cockroachdb#91240

Release note:None
This change is a refactor of the replication logic
to support the new syntax that addresses tenants by name
instead of ID.

When setting up the stream ingestion job on the destination
cluster we do not know the source tenant ID. This information is necessary
for us to setup our rekeyer that is used to rekey every KV streamed
from the source before ingestion. The `SourceTenantID` is obtained from
the source cluster as part of the `Topology` that governs how we
setup our streaming partitions.

The rest of the streaming logic is largely unchanged.

Fixes: cockroachdb#91240

Release note: None
@adityamaru adityamaru force-pushed the create-replication-stream branch from 24bdc3a to cd00c95 Compare November 18, 2022 15:51
@adityamaru
Copy link
Copy Markdown
Contributor Author

I think there is at least 1 more use of the old syntax over in the roachtest.

@stevendanna i tacked on one more commit with changes to the roachtest. I'm just running it to confirm I didn't break anything but it looks to be running smoothly as of now.

Copy link
Copy Markdown
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I'll fast follow with the deletion.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @baoalvin1, @dt, @lidorcarmel, and @stevendanna)


pkg/sql/sem/tree/create.go line 2190 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

those probably are not needed here?

can we drop CreateTenant completely and have only CreateTenantFromReplication? (and rename CreateTenantFromReplication to CreateTenant... I think you see where I'm going..)

my bad, this is detritus from when I was debating between introducing a new tree.Node or re-using CreateTenant.

The reason we need two distinct nodes is that CreateTenantFromReplication is a CCL node which means it relies on having a planHook that is used to plan the statement (see stream_ingestion_planning.go). This is the pattern we follow for calling CCL code from a non-CCL package like pkg/sqlsince having a non-CCL package import a CCL package is not allowed except for testing purposes.CreateTenant` OTOH is a non-ccl node and doesn't require a planhook for execution.

Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 38 of 38 files at r3, 37 of 37 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @dt, and @stevendanna)


pkg/cmd/roachtest/tests/cluster_to_cluster.go line 109 at r5 (raw file):

			t.Status("starting replication stream")
			streamReplStmt := fmt.Sprintf("CREATE TENANT \"%s\" FROM REPLICATION OF \"%s\" ON '%s'",

tiny nit: maybe use %q to avoid the \"? or quote everything with ' instead of "?

Code quote:

\"%s\" FROM REPLICATION OF \"%s\" 

pkg/sql/sem/tree/create.go line 2190 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

my bad, this is detritus from when I was debating between introducing a new tree.Node or re-using CreateTenant.

The reason we need two distinct nodes is that CreateTenantFromReplication is a CCL node which means it relies on having a planHook that is used to plan the statement (see stream_ingestion_planning.go). This is the pattern we follow for calling CCL code from a non-CCL package like pkg/sqlsince having a non-CCL package import a CCL package is not allowed except for testing purposes.CreateTenant` OTOH is a non-ccl node and doesn't require a planhook for execution.

good to know, thanks for explaining!

@adityamaru adityamaru force-pushed the create-replication-stream branch from cd00c95 to 1b42eb4 Compare November 18, 2022 20:30
Copy link
Copy Markdown
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @baoalvin1, @dt, @lidorcarmel, and @stevendanna)


pkg/cmd/roachtest/tests/cluster_to_cluster.go line 109 at r5 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

tiny nit: maybe use %q to avoid the \"? or quote everything with ' instead of "?

Done.

@adityamaru
Copy link
Copy Markdown
Contributor Author

TFTRs!

bors r=lidorcarmel,stevendanna

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 19, 2022

Build succeeded:

@craig craig bot merged commit 3c9c4aa into cockroachdb:master Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

c2c: Extend CREATE TENANT to support FROM REPLICATION

4 participants