Skip to content

sql: change system.tenant_usage to use a single consumption column#70778

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
RaduBerinde:tenant-usage-proto
Sep 29, 2021
Merged

sql: change system.tenant_usage to use a single consumption column#70778
craig[bot] merged 2 commits intocockroachdb:masterfrom
RaduBerinde:tenant-usage-proto

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Sep 27, 2021

There are ongoing discussions about adding more components to the RU
cost. The prospect of having to change the columns of
system.tenant_usage each time is daunting.

This commit modifies the table to use a single total_consumption
column which encodes the TenantConsumption proto. We can add fields to
the proto in the future without changing the schema.

The values are still readable via SQL using pb_to_json:

crdb_internal.pb_to_json('cockroach.roachpb.TenantConsumption', total_consumption)

Informs #68479.

Release note: None

Release justification: Necessary change for supporting changes to the
cost model for the upcoming Serverless MVP release. This functionality
is only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. I do get the drive to avoid the schema changes. The counter-argument to the pain of adding such columns is the relative ease of this change. When we go and do things like this, it does tend to leave me wondering about the experience that our customers have writing sql.

The change itself :lgtm:.

This is going to break the clusters which @darinpp has been prototyping with, no? That's probably fine, just want to call it out.

Reviewed 3 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @celiala, and @RaduBerinde)


pkg/migration/migrations/tenant_usage.go, line 61 at r1 (raw file):

		  DROP COLUMN total_sql_pod_cpu_seconds,
		  ADD COLUMN total_consumption BYTES FAMILY "primary"`,
		schemaExistsFn: hasColumn,

A paranoid part of me might make me feel better if there were some way to compose this hasColumn check with some form of notHasColumn that ensures the other columns are gone. I'm not too worried about it. This framework is a bit brittle for my liking but alas, it's better than nothing.


pkg/sql/catalog/systemschema/system.go, line 591 at r1 (raw file):

	-- Current shares value for this instance.
  instance_shares FLOAT,

nit while you're here: align the indentation of this column and generally within this statement


pkg/sql/catalog/systemschema/system.go, line 591 at r1 (raw file):

	-- Current shares value for this instance.
  instance_shares FLOAT,

would we ever be in the game of protobuf-ing instance state too?

@RaduBerinde RaduBerinde force-pushed the tenant-usage-proto branch 2 times, most recently from 5e130e2 to 20033b9 Compare September 27, 2021 19:26
Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

The migration will reset existing tenant consumption on existing clusters, but shouldn't lead to crashes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @andy-kimball, and @celiala)


pkg/migration/migrations/tenant_usage.go, line 61 at r1 (raw file):

Previously, ajwerner wrote…

A paranoid part of me might make me feel better if there were some way to compose this hasColumn check with some form of notHasColumn that ensures the other columns are gone. I'm not too worried about it. This framework is a bit brittle for my liking but alas, it's better than nothing.

Added a TODO


pkg/sql/catalog/systemschema/system.go, line 591 at r1 (raw file):

Previously, ajwerner wrote…

nit while you're here: align the indentation of this column and generally within this statement

Fixed. There was a mix of tabs and spaces, changed everything to spaces.


pkg/sql/catalog/systemschema/system.go, line 591 at r1 (raw file):

Previously, ajwerner wrote…

would we ever be in the game of protobuf-ing instance state too?

I don't think so..

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter-argument to the pain of adding such columns is the relative ease of this change

The bigger problem is not the change itself but the versioning - we can't add new versions in patch releases (right?) and that would make it impossible to add an RU component to 21.2.x.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @andy-kimball, and @celiala)

@andy-kimball
Copy link
Copy Markdown
Contributor

Should we consider having a strongly-typed "total_ru_usage" column, but a weakly typed protobuf column for the component metrics? Having to crack a protobuf to see how many RUs have been used seems very painful.

@ajwerner
Copy link
Copy Markdown
Contributor

Fortunately the crdb_internal.pb_to_json function makes it not so bad. Should we add a virtual view that is the table but with the column converted to json on the system tenant?

@RaduBerinde
Copy link
Copy Markdown
Member Author

Great idea, added a commit:

sql: add crdb_internal.tenant_usage_details

This commit adds a virtual view which decodes the tenant consumption
details. Example:

root@:26257/defaultdb> select * from crdb_internal.tenant_usage_details;
  tenant_id |     total_ru      | total_read_bytes | total_read_requests | total_write_bytes | total_write_requests | total_sql_pod_seconds
------------+-------------------+------------------+---------------------+-------------------+----------------------+------------------------
        123 | 9618.311247130436 |           751504 |                4660 |             47485 |                 1877 |     4.454030256529996

Release note: None

Release justification: Necessary change for supporting changes to the
cost model for the upcoming Serverless MVP release. This functionality
is only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.

@RaduBerinde RaduBerinde requested a review from a team as a code owner September 28, 2021 02:30
@RaduBerinde RaduBerinde force-pushed the tenant-usage-proto branch 2 times, most recently from 045d3b5 to 76202f7 Compare September 28, 2021 15:55
@RaduBerinde RaduBerinde requested review from a team and dt and removed request for a team September 28, 2021 15:55
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 7 files at r1, 1 of 2 files at r3, 2 of 5 files at r6, 9 of 11 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @celiala, @dt, and @RaduBerinde)


pkg/ccl/backupccl/backup_planning_tenant.go, line 37 at r6 (raw file):

	  tenants.id = tenant_usage.tenant_id AND tenant_usage.instance_id = 0`

func tenantMetadataFromRow(row tree.Datums) (descpb.TenantInfoWithUsage, error) {

defensive programming thought: want to assert the number of columns here? I'm not really worried about it but an assertion failure would be better than a panic if it were to happen.


pkg/sql/crdb_internal.go, line 5326 at r7 (raw file):

Quoted 18 lines of code…
CREATE VIEW crdb_internal.tenant_usage_details AS
SELECT
  tenant_id,
  (j->>'rU')::FLOAT AS total_ru,
	(j->>'readBytes')::INT AS total_read_bytes,
	(j->>'readRequests')::INT AS total_read_requests,
	(j->>'writeBytes')::INT AS total_write_bytes,
	(j->>'writeRequests')::INT AS total_write_requests,
	(j->>'sqlPodsCpuSeconds')::FLOAT AS total_sql_pod_seconds
FROM (
  SELECT
	  tenant_id,
		crdb_internal.pb_to_json('cockroach.roachpb.TenantConsumption', total_consumption) AS j
	FROM
	  system.tenant_usage
	WHERE
	  instance_id = 0
)`,

nit: sqlfmt this again?

CREATE VIEW crdb_internal.tenant_usage_details AS
  SELECT
    tenant_id,
    (j->>'rU')::FLOAT8 AS total_ru,
    (j->>'readBytes')::INT8 AS total_read_bytes,
    (j->>'readRequests')::INT8 AS total_read_requests,
    (j->>'writeBytes')::INT8 AS total_write_bytes,
    (j->>'writeRequests')::INT8 AS total_write_requests,
    (j->>'sqlPodsCpuSeconds')::FLOAT8 AS total_sql_pod_seconds
  FROM
    (
      SELECT
        tenant_id,
        crdb_internal.pb_to_json(
          'cockroach.roachpb.TenantConsumption',
          total_consumption
        )
          AS j
      FROM
        system.tenant_usage
      WHERE
        instance_id = 0
    );

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @andy-kimball, @celiala, and @dt)


pkg/ccl/backupccl/backup_planning_tenant.go, line 37 at r6 (raw file):

Previously, ajwerner wrote…

defensive programming thought: want to assert the number of columns here? I'm not really worried about it but an assertion failure would be better than a panic if it were to happen.

Done.


pkg/sql/crdb_internal.go, line 5326 at r7 (raw file):

Previously, ajwerner wrote…
CREATE VIEW crdb_internal.tenant_usage_details AS
SELECT
  tenant_id,
  (j->>'rU')::FLOAT AS total_ru,
	(j->>'readBytes')::INT AS total_read_bytes,
	(j->>'readRequests')::INT AS total_read_requests,
	(j->>'writeBytes')::INT AS total_write_bytes,
	(j->>'writeRequests')::INT AS total_write_requests,
	(j->>'sqlPodsCpuSeconds')::FLOAT AS total_sql_pod_seconds
FROM (
  SELECT
	  tenant_id,
		crdb_internal.pb_to_json('cockroach.roachpb.TenantConsumption', total_consumption) AS j
	FROM
	  system.tenant_usage
	WHERE
	  instance_id = 0
)`,

nit: sqlfmt this again?

CREATE VIEW crdb_internal.tenant_usage_details AS
  SELECT
    tenant_id,
    (j->>'rU')::FLOAT8 AS total_ru,
    (j->>'readBytes')::INT8 AS total_read_bytes,
    (j->>'readRequests')::INT8 AS total_read_requests,
    (j->>'writeBytes')::INT8 AS total_write_bytes,
    (j->>'writeRequests')::INT8 AS total_write_requests,
    (j->>'sqlPodsCpuSeconds')::FLOAT8 AS total_sql_pod_seconds
  FROM
    (
      SELECT
        tenant_id,
        crdb_internal.pb_to_json(
          'cockroach.roachpb.TenantConsumption',
          total_consumption
        )
          AS j
      FROM
        system.tenant_usage
      WHERE
        instance_id = 0
    );

Done. Again mixing tabs with spaces was the problem.

@RaduBerinde
Copy link
Copy Markdown
Member Author

Unrelated flake (filed #70839).

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 28, 2021

Build failed (retrying...):

@yuzefovich
Copy link
Copy Markdown
Member

Looks like the generated code is not up to date.

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 28, 2021

Canceled.

@RaduBerinde
Copy link
Copy Markdown
Member Author

Looks like the generated code is not up to date.

How come? The failure looks like a flake in acceptance/cli/node-status.

@yuzefovich
Copy link
Copy Markdown
Member

I'm looking at the CI build on this PR in isolation: https://teamcity.cockroachdb.com/viewLog.html?buildId=3510938&buildTypeId=Cockroach_UnitTests

@RaduBerinde
Copy link
Copy Markdown
Member Author

I'm looking at the CI build on this PR in isolation: https://teamcity.cockroachdb.com/viewLog.html?buildId=3510938&buildTypeId=Cockroach_UnitTests

And? There is a flake (TestMultiRegionDataDriven/regional_by_table, tracked by #70846) and a timeout..

@yuzefovich
Copy link
Copy Markdown
Member

The generated code issue is above:
Screen Shot 2021-09-28 at 3 35 06 PM

@yuzefovich
Copy link
Copy Markdown
Member

@RaduBerinde
Copy link
Copy Markdown
Member Author

Ah, thanks, I missed that!

There are ongoing discussions about adding more components to the RU
cost. The prospect of having to change the columns of
`system.tenant_usage` each time is daunting.

This commit modifies the table to use a single `total_consumption`
column which encodes the TenantConsumption proto. We can add fields to
the proto in the future without changing the schema.

The values are still readable via SQL using `pb_to_json`:
```
crdb_internal.pb_to_json('cockroach.roachpb.TenantConsumption', total_consumption)
```

Informs cockroachdb#68479.

Release note: None

Release justification: Necessary change for supporting changes to the
cost model for the upcoming Serverless MVP release. This functionality
is only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.
This commit adds a virtual view which decodes the tenant consumption
details. Example:
```
root@:26257/defaultdb> select * from crdb_internal.tenant_usage_details;
  tenant_id |     total_ru      | total_read_bytes | total_read_requests | total_write_bytes | total_write_requests | total_sql_pod_seconds
------------+-------------------+------------------+---------------------+-------------------+----------------------+------------------------
        123 | 9618.311247130436 |           751504 |                4660 |             47485 |                 1877 |     4.454030256529996
```

Release note: None

Release justification: Necessary change for supporting changes to the
cost model for the upcoming Serverless MVP release. This functionality
is only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.
@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 29, 2021

Build succeeded:

@craig craig bot merged commit ab0c462 into cockroachdb:master Sep 29, 2021
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Sep 29, 2021

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from dd7cbf2 to blathers/backport-release-21.2-70778: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Copy Markdown
Contributor

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, tenant_usage_details is going to be available only to the KV cluster, right? I'd imagine we won't leak data if a regular tenant tries to query that internal table?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@RaduBerinde
Copy link
Copy Markdown
Member Author

Correct. The tenant has their own tenant_usage table which should just be empty (and modifying it would be inconsequential). Ideally, these wouldn't exist at all on the tenant but I think we're missing a bit of infrastructure to make that happen.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Nov 16, 2021
The part of the backup code that retrieves the tenant metadata was
updated in cockroachdb#70778. The new code depends on the `system.tenant_usage`
table which is not present in 21.1.

Consequently, this query fails when running 21.2 code on a cluster
that was not finalized to 21.2 version.

This change fixes this by adding an alternate variant of the query
when the tenant usage table is not present (according to the cluster
version).

Fixes cockroachdb#72839.

Release note (bug fix): Fixes "backup-lookup-tenants: descriptor not
found" when doing full cluster backup while upgrading from 21.1 to 21.2.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Nov 16, 2021
The part of the backup code that retrieves the tenant metadata was
updated in cockroachdb#70778. The new code depends on the `system.tenant_usage`
table which is not present in 21.1.

Consequently, this query fails when running 21.2 code on a cluster
that was not finalized to 21.2 version.

This change fixes this by adding an alternate variant of the query
when the tenant usage table is not present (according to the cluster
version).

Fixes cockroachdb#72839.

Release note (bug fix): Fixes "backup-lookup-tenants: descriptor not
found" when doing full cluster backup while upgrading from 21.1 to 21.2.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Nov 17, 2021
The part of the backup code that retrieves the tenant metadata was
updated in cockroachdb#70778. The new code depends on the `system.tenant_usage`
table which is not present in 21.1.

Consequently, this query fails when running 21.2 code on a cluster
that was not finalized to 21.2 version.

This change fixes this by adding an alternate variant of the query
when the tenant usage table is not present (according to the cluster
version).

Fixes cockroachdb#72839.

Release note (bug fix): Fixes "backup-lookup-tenants: descriptor not
found" when doing full cluster backup while upgrading from 21.1 to 21.2.
celiala pushed a commit that referenced this pull request Nov 18, 2021
The part of the backup code that retrieves the tenant metadata was
updated in #70778. The new code depends on the `system.tenant_usage`
table which is not present in 21.1.

Consequently, this query fails when running 21.2 code on a cluster
that was not finalized to 21.2 version.

This change fixes this by adding an alternate variant of the query
when the tenant usage table is not present (according to the cluster
version).

Fixes #72839.

Release note (bug fix): Fixes "backup-lookup-tenants: descriptor not
found" when doing full cluster backup while upgrading from 21.1 to 21.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants