Skip to content

sql/tenant: synch clear range data on tenant drop request#55935

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
spaskob:tenant-synchronous-destroy
Oct 29, 2020
Merged

sql/tenant: synch clear range data on tenant drop request#55935
craig[bot] merged 1 commit intocockroachdb:masterfrom
spaskob:tenant-synchronous-destroy

Conversation

@spaskob
Copy link
Copy Markdown
Contributor

@spaskob spaskob commented Oct 24, 2020

Informs #47904.

This is a stop gap solution that will perform the operation
synchronously inside the transaction that marks the tenant row
as DROP and finally deletes the row.

This PR is intended to be backported to 20.2 release branch.
The long-term solution that implements #48775 is done in #55756
and will be our long term approach for release 21.0 and beyond.

Release note: none.

@spaskob spaskob requested review from a team and dt and removed request for a team October 24, 2020 09:14
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from 9673e32 to edb51f2 Compare October 24, 2020 09:15
@spaskob spaskob requested a review from ajwerner October 24, 2020 09:17
@spaskob spaskob force-pushed the tenant-synchronous-destroy branch 8 times, most recently from 2383bfa to f0e42e4 Compare October 24, 2020 15:16
@spaskob spaskob requested a review from tbg October 26, 2020 08:51
@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from f0e42e4 to 497018f Compare October 27, 2020 10:26
@spaskob spaskob requested a review from a team as a code owner October 27, 2020 10:26
@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from 497018f to cc3027a Compare October 27, 2020 11:23
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want or do we want to have a separate function from destroy tenant to actually do the gc and row removal? That function should validate the state of the tenant row. My feeling is the latter because it will more closely mimic behavior of an async job. Will give us flexibility to remove tenants later and to deal with already dropped tenants which have not yet been GC'd.

Reviewed 1 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @spaskob, and @tbg)


pkg/ccl/backupccl/backup_test.go, line 5864 at r1 (raw file):

		restoreDB.Exec(t, `SELECT crdb_internal.destroy_tenant(10)`)
		restoreDB.CheckQueryResults(t, `select * from system.tenants`, [][]string{})

Can you use the kv.DB to scan the entire tenant prefix and ensure that it is empty?

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from cc3027a to 88296b6 Compare October 27, 2020 15:15
Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @tbg)


pkg/ccl/backupccl/backup_test.go, line 5864 at r1 (raw file):

Previously, ajwerner wrote…

Can you use the kv.DB to scan the entire tenant prefix and ensure that it is empty?

Done.

Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @tbg)

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from 88296b6 to 4a1ce62 Compare October 27, 2020 15:25
@spaskob
Copy link
Copy Markdown
Contributor Author

spaskob commented Oct 27, 2020

Is this what we want or do we want to have a separate function from destroy tenant to actually do the gc and row removal? That function should validate the state of the tenant row. My feeling is the latter because it will more closely mimic behavior of an async job. Will give us flexibility to remove tenants later and to deal with already dropped tenants which have not yet been GC'd.

Discussed offline - the functions for these are separated, if there's a need it will be trivial to define a new builtin to just GC the tenant.

@dt
Copy link
Copy Markdown
Contributor

dt commented Oct 27, 2020

oh, whoops, looks like it dropped my other comment because it was submitted after an update but it said:

ditto @ajwerner -- I think we should gc separately, in part because GC isn't txn-safe. If we gc in standalone cmd, we can assert we see state drop first (in its own txn that commits), then do the drop. We should also asset p.Txn is implicit and/or commit it in batch to avoid accidentally appearing transactional

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch 2 times, most recently from 3ba44fc to 2bda1fb Compare October 28, 2020 12:30
Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @tbg)


pkg/sql/tenant.go, line 238 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I think it is weird to pass this a txn and then we clearRange not using that txn. Indeed, passing a txn is weird since clear range isn't transaction-safe.

Done.


return nil
}
if err := p.execCfg.DB.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment explaining why we're db.Txn'ing, e.g. something like "confirm tenant is ready to be cleared. This uses a standalone txn that must commit before we move on, since we will be acting non-transactionally on the result when we ClearRange."

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch 2 times, most recently from 853ac8f to e820719 Compare October 28, 2020 14:06
Copy link
Copy Markdown
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @spaskob, and @tbg)


pkg/sql/tenant.go, line 287 at r2 (raw file):

Previously, dt (David Taylor) wrote…

These checks are repeated in DeleteTenantRecord, which got me looking at who calls it outside here, and I'm not sure about the benefit of splitting out and exporting both DeleteTenantRecord and ClearTenant as standalone functions: we really should never be calling the latter without the former first.

I'm thinking it might make sense to un-export them or just inlining them in this method instad, and then making the majority of this method (form this line onwards) an exported function, with this planner method just becoming a wrapper around it (that checks p's txnImplicit and then calls the function). Then RESTORE can call this function to check, clear and delete all at once, and we don't need the repeated checks?

Done.


pkg/sql/tenant.go, line 313 at r2 (raw file):

Previously, dt (David Taylor) wrote…

should we return p.txn.Commit() before we let go of txn? we did non-txn'l side effects (clearing range) so I'm wary of leaving p.txn uncommitted, but I might just be paranoid.

Alternatively, we could just fire up our own txn again and completely ignore p.Txn (other than asserting it is implicit) which might be cleaner if we do the method/function re-org I mention above, because then this function isn't passed a txn at all which makes clear that it isn't transactional.

Done.

return errors.Errorf("gc_tenant cannot be used inside a transaction")
}

info, err := getTenantRecord(ctx, p.execCfg, p.txn, tenID)
Copy link
Copy Markdown
Contributor

@dt dt Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we're going to use p.Txn here, I'd Commit() it here, after err check before moving on to call GCTenant using its results?

@spaskob spaskob force-pushed the tenant-synchronous-destroy branch 4 times, most recently from 75794f5 to fad24d5 Compare October 29, 2020 05:29
Informs cockroachdb#47904.

This is a stop gap solution that will perform the operation
synchronously inside the transaction that marks the tenant row
as DROP and finally deletes the row.

This PR is intended to be backported to 20.2 release branch.
The long-term solution that implements cockroachdb#48775 is done in cockroachdb#55756
and will be our long term approach for release 21.0 and beyond.

Release note: none.
@spaskob spaskob force-pushed the tenant-synchronous-destroy branch from fad24d5 to 3145bc4 Compare October 29, 2020 07:41
@spaskob
Copy link
Copy Markdown
Contributor Author

spaskob commented Oct 29, 2020

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 29, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants