Skip to content

backup: include tenant usage data in backup#70595

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
RaduBerinde:tenant-usage-backup
Sep 28, 2021
Merged

backup: include tenant usage data in backup#70595
craig[bot] merged 3 commits intocockroachdb:masterfrom
RaduBerinde:tenant-usage-backup

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Sep 22, 2021

backup: minor refactoring of tenant backup planning

This commit extracts the code that generates the tenant metadata into
a separate file.

Release note: None

Release justification: Part of a larger change that is necessary to
fully support backup and restore of non-system tenants. The change
only concerns multi-tenant deployments and should not affect single
tenant deployments.

backup: include tenant usage data in backup

We now back up and restore relevant data from tenant_usage: the last
token bucket state, and (more importantly) total all-time tenant
consumption.

The backup manifest proto had to change, but we still support
restoring tenant data from older manifests.

Release note: None

Release justification: This change is necessary to fully support
backup and restore of non-system tenants. The change only concerns
multi-tenant deployments and should not affect single tenant
deployments.

Informs #68479.

tenantcostserver: check tenant when initializing state

The tenant_usage data is initialized as needed unless it is configured
explicitly. This commit adds validation of the tenant in this case,
preventing the state from appearing for a deleted tenant.

Release note: None

Release justification: Part of a larger change that is necessary to
fully support backup and restore of non-system tenants. The change
only concerns multi-tenant deployments and should not affect single
tenant deployments.

@RaduBerinde RaduBerinde requested review from a team, ajwerner, andy-kimball, dt, nvb and tbg September 22, 2021 21:17
@RaduBerinde RaduBerinde requested a review from a team as a code owner September 22, 2021 21:17
@RaduBerinde RaduBerinde requested a review from a team September 22, 2021 21:17
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the tenant-usage-backup branch 3 times, most recently from 6b02efd to 6ada78d Compare September 24, 2021 18:26
Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @dt, @nvanbenschoten, and @tbg)


pkg/jobs/jobspb/jobs.proto, line 199 at r2 (raw file):

Previously, dt (David Taylor) wrote…

was this supposed to be 21?

Indeed, good catch!

@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 24, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 25, 2021

Build failed:

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @dt, @nvanbenschoten, and @tbg)


pkg/sql/tenant.go, line 95 at r5 (raw file):

		return errors.Wrap(err, "inserting new tenant")
	} else if num != 1 {
		log.Fatalf(ctx, "unexpected number of rows affected: %d", num)

I'm investigating a test failure under stressrace. While trying to repro, I hit this. I'll try to figure out what's wrong with the test.

But should this be a Fatal? Seems like if you somehow try to destroy a tenant two times in parallel, you could hit this; doesn't seem right to crash the node..

@RaduBerinde
Copy link
Copy Markdown
Member Author

Added a commit - we now check the tenant ID when we initialize the tenant usage state. This eliminates a race in the test, where existing tenant processes could cause re-creation of the usage state after the tenant was deleted.

This commit extracts the code that generates the tenant metadata into
a separate file.

Release note: None

Release justification: Part of a larger change that is necessary to
fully support backup and restore of non-system tenants. The change
only concerns multi-tenant deployments and should not affect single
tenant deployments.
We now back up and restore relevant data from tenant_usage: the last
token bucket state, and (more importantly) total all-time tenant
consumption.

The backup manifest proto had to change, but we still support
restoring tenant data from older manifests.

Release note: None

Release justification: This change is necessary to fully support
backup and restore of non-system tenants. The change only concerns
multi-tenant deployments and should not affect single tenant
deployments.
The tenant_usage data is initialized as needed unless it is configured
explicitly. This commit adds validation of the tenant in this case,
preventing the state from appearing for a deleted tenant.

Release note: None

Release justification: Part of a larger change that is necessary to
fully support backup and restore of non-system tenants. The change
only concerns multi-tenant deployments and should not affect single
tenant deployments.
Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @dt, @nvanbenschoten, and @tbg)


pkg/sql/tenant.go, line 95 at r5 (raw file):

Previously, RaduBerinde wrote…

I'm investigating a test failure under stressrace. While trying to repro, I hit this. I'll try to figure out what's wrong with the test.

But should this be a Fatal? Seems like if you somehow try to destroy a tenant two times in parallel, you could hit this; doesn't seem right to crash the node..

Sorry, I pointed to the wrong code, the question was about the Fatalf after the DELETE TENANT query in GCTenantSync below

@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 28, 2021

Build succeeded:

@craig craig bot merged commit 04f75fc into cockroachdb:master Sep 28, 2021
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Sep 28, 2021

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 6ce23d0 to blathers/backport-release-21.2-70595: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants