builtins: add pg_total_relation_size builtin by jordanlewis · Pull Request #59604 · cockroachdb/cockroach

jordanlewis · 2021-01-29T23:35:42Z

Closes #20712.

Depends on #59865.

This builtin function returns the number of bytes in a table. For
CockroachDB, this means summing up the number of bytes in the ranges of
a table.

Notably, we do not multiply the number of bytes in the table's ranges by
the number of replicas - this is left up to the user to do if desired.

Release note (sql change): add the pg_total_relation_size builtin for estimating
the amount of data in a table.

cockroach-teamcity · 2021-01-29T23:35:50Z

This change is

otan · 2021-01-31T20:03:06Z

pkg/sql/logictest/testdata/logic_test/pg_builtins

+# crdb_internal.ranges in the logic test context.
+# Make sure also that a non-existent table id returns null.
+query II
+SELECT pg_table_size('size_tester'::regclass), pg_table_size(387438)


is there no way to test this works outside of logic tests?

I added a test, thanks for the push!

jordanlewis · 2021-02-01T04:11:41Z

I rewrote the function to avoid depending on the virtual table, which has to fetch all range and table and node descriptors, and to avoid getting the range stats for every range in a separate RPC.

Now, it fetches just the range descriptors it needs, and does one big RPC to get all of their stats at once.

Also I addeda test.

jordanlewis · 2021-02-01T04:17:57Z

Actually, this might be subtly wrong or confusing.

pg_indexes_size ( regclass ) → bigintComputes the total disk space used by indexes attached to the specified table.
pg_table_size ( regclass ) → bigintComputes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map).
pg_total_relation_size ( regclass ) → bigintComputes the total disk space used by the specified table, including all indexes and TOAST data. The result is equivalent to pg_table_size + pg_indexes_size.

So I guess this function as written would be closer to pg_indexes_size. What would CockroachDB's pg_table_size be, since CockroachDB has no concept of heap? Maybe we could just do the table's primary index?

otan · 2021-02-01T04:21:02Z

So I guess this function as written would be closer to pg_indexes_size.

Am I missig something -- why is it not pg_total_relation_size?

What would CockroachDB's pg_table_size be, since CockroachDB has no concept of heap? Maybe we could just do the table's primary index?

Hmm, yeah sounds about right

jordanlewis · 2021-02-02T04:26:38Z

PTAL

@tbg i know you weighed in on #20712, with a comment

The API call we have right now gives the approximate total replicated size of a table and it needs to fan out across the cluster and also scans from the meta ranges. Hooking this up to pg_table_size is possibly something we could regret.

but given we haven't had something here in the last 3 years and no plans to improve it, any objections remaining? I now see that we have yet another way of calculating this via ComputeStatsForKeySpan, should we use that instead of this mvcc stats method?

otan

Reviewed 1 of 2 files at r1, 1 of 3 files at r3, 3 of 3 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @otan)

pkg/sql/crdb_internal.go, line 2444 at r4 (raw file):

	end_key,
	end_pretty,
  table_id,

nit: indentation misaligned

tbg · 2021-02-03T12:58:40Z

@jordanlewis I can only warn, not make the call. Anything that fans out to the whole cluster is dangerous when called frequently. Is this going to be called frequently? I don't know. Probably not, but not my call to make. An individual call shouldn't be too bad, since this is only reading from the leaseholders.

Orthogonally, is this the right implementation? It looks like people want this to reflect the disk usage, not the number of logical bytes in the cluster, which is what you're returning now. By implementing it as such we're just going to switch users from asking for this to be implemented to asking why it returns seemingly bogus numbers, especially seeing how the UI perhaps does not return the same thing (I haven't checked what "table size" means there). If there has been a discussion and it was resolved that it should do what this PR proposes, I haven't seen it neither here nor on the linked issues.

rafiss · 2021-02-05T16:41:53Z

I don't have much context on this PR, but just inquiring: if the pg_table_size stuff is still under debate, should we cherry-pick off the first commit so #59601 is fixed?

jordanlewis · 2021-02-05T17:28:48Z

Yes, I think so. I'll take care of that today.

jordanlewis · 2021-02-05T20:24:39Z

@jordanlewis I can only warn, not make the call. Anything that fans out to the whole cluster is dangerous when called frequently. Is this going to be called frequently? I don't know. Probably not, but not my call to make. An individual call shouldn't be too bad, since this is only reading from the leaseholders.

This function tends to get called by metadata scraping admin consoles and such - prometheus exporters - and other things that presumably will call this function once per table in the whole database. So maybe this is not really the best idea to implement it this way.

Orthogonally, is this the right implementation?

I think, based on the text you wrote below this question, the answer is no. I didn't realize that the MVCC stats didn't reflect the physical disk usage of the range. I'll figure out another way to do this, after I merge the first commit to fix #59601.

jordanlewis · 2021-02-05T23:19:31Z

I took a look at redoing this today. I ended up having the builtin function call into the admin server's TableStatsRequest API, the same as what the DB Console does. This will prevent any confusion where the numbers don't align - something that I agree would be deeply confusing for everyone.

For now, though, this builtin will still fan out. It's not ideal. But, I think that ultimately it's going to be better to provide information in both SQL and DB console, even if inefficient. I think it would good to collect these statistics during, say, table statistics collection, and store them in a new table or a new way in the existing table statistics system table, but that's going to be a pretty significant chunk of work.

knz

Jordan, I'm sorry but this is not going to fly. Our multi-tenant architecture is going to be the way moving forward, even for dedicated / on-prem users, and the SQL pods don't serve the admin UI / status server.

What you really need here is to access the (s *adminServer) statsForSpan() method from SQL. But this method has no reason to be on adminServer. It doesn't even use adminServer specifics. It could as well be implemented on (*Server) or even (*sql.Server). If you were to move it to the latter, you could access it safely from a SQL builtin, and the changes to server/admin.go to get there would be minimal: from adminServer to sql.Server is just s.server.sqlServer.

Reviewed 2 of 11 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jordanlewis, and @otan)

knz · 2021-02-08T12:33:36Z

(Alternatively, you can move it to a method of kv/client which makes it even easier.)

knz · 2021-02-08T12:34:00Z

@tbg can you check whether my proposal to integrate statsForSpan in the KV interface above is sound?

jordanlewis · 2021-02-12T22:54:30Z

I embarked on an exciting plumbing adventure and moved the stats for span implementation into the sql server. However, we still need to call into the status server, once per KV node that has a relevant leaseholder, to fetch the span stats underneath... is that acceptable? In the end, I think we actually could have called into a remote adminServer, right? Don't KV nodes in multitenant world still have an admin and status server all happily ready to server?

... and crdb_internal.ranges_no_leases This commit adds schema_name to crdb_internal.ranges and crdb_internal.ranges_no_leases to ensure that it's possible to disambiguate between ranges that are contained by a table with the same name in two different user-defined schemas. In addition, it also adds the table_id column which allows unambiguous lookups of ranges for a given table id. This will also enable making a virtual index on the table_id column later, which should be a nice win for some introspection commands. Release note (sql change): add the schema_name and table_id columns to the crdb_internal.ranges and crdb_internal.ranges_no_leases virtual tables.

knz · 2021-02-15T18:13:41Z

Don't KV nodes in multitenant world still have an admin and status server all happily ready to server?

That is not the plan, no.

statusServer / adminServer are components that are dedicated to serving the admin UI / DB console exclusively. They are not meant to be available on SQL pods in multi-tenant, and they are certainly not meant to be used as dependency from SQL.

(Which also means that our current implementation of SQL session handling, which is currently routed through these things, is incorrect. See related issue #60584.)

However, we still need to call into the status server, once per KV node that has a relevant leaseholder, to fetch the span stats underneath...

I don't think that's true. The entity that does this fetch should be the distsender, via the client.Txn interface. I don't see why it needs to hang off statusServer. It's not specific to the HTTP APIs / UI console.
in multi-tenant, you'll need to talk to multiple KV nodes (once per node holding relevant leases). I'm not sure our multi-tenant architecture is able to do fan-outs to the entire KV cluster from a single SQL pod.

IT seems to me that we need an API like this:

SQL code talks to internal KV client (via kv/client),
SQL client talks through RPC to just 1 KV node on the KV side,
that KV node then performs fan-out RPCs to other KV nodes as needed.

I think we actually could have called into a remote adminServer, right?

Again, adminSErver is a DB-console-only concept. We need something else.

knz · 2021-02-16T16:09:40Z

This topic was discussed in this week's KV team meeting.

@tbg: this doesn’t fit well into the kv/client package, which is focused around executing requests on ranges. The span stats code is intentionally operating on a raw key range as it basically just passes through to the storage layer (note the “approx disk bytes” here). This is fundamentally a per-Store API but “broadcasting” is our only addressing scheme for that.

@tbg: note that I’m assuming here that we want to expose the physical disk usage. The logical usage would fit very well into the KV API, you can simply model it as a ranged read request. But I would want to double-check with the storage team whether calling engine.ApproximateDiskBytes(rangeStartKey, rangeEndKey) and summing up over the ranges like that is a) likely to give reasonable results (I think not) or b) expensive (might be).

@tbg from a POV of our multi-tenancy usage, I don’t see why tenants should be able to read off their approx physical disk usage. Their billing will likely not depend on it (imo it’s better to base that on logical bytes).

@knz the counterpoint to this is that we do not want to build features that provide a different SQL experience on CC dedicated and free/shared-tier clusters. Also eventually we also want to move CC dedicated clusters towards the multi-tenant architecture, so we need a way to build the SQL service in a way that works with multi-tenant.

So in summary, we have a strong uncertainty / lack of leadership/vision about how to expose KV-level (or storage-level) information through SQL in the multi-tenant deployment.

I am currently working with PM to figure out who owns the architectural vision here. Will report when we know more.

jordanlewis · 2021-10-08T23:05:39Z

Just a note of intent - I want to come back to this one now that we've gotten further in our MT architecture.

brandonrjacobs · 2024-04-03T20:44:18Z

Any chance this can be revisited @jordanlewis ? We're finding a need for this and wanted to see if there are any updates to this.

jordanlewis requested a review from a team January 29, 2021 23:35

otan approved these changes Jan 31, 2021

View reviewed changes

jordanlewis force-pushed the pg_table_size branch from c7e9038 to 73d1d9a Compare February 1, 2021 04:09

jordanlewis mentioned this pull request Feb 1, 2021

sql: add pg_database_size #26398

Open

jordanlewis force-pushed the pg_table_size branch 5 times, most recently from c3ebc79 to 91a87c9 Compare February 2, 2021 04:22

otan approved these changes Feb 2, 2021

View reviewed changes

jordanlewis changed the title ~~builtins: add pg_table_size builtin~~ builtins: add pg_total_relation_size builtin Feb 5, 2021

jordanlewis requested a review from a team as a code owner February 5, 2021 22:21

jordanlewis force-pushed the pg_table_size branch 6 times, most recently from 6e27449 to 602905e Compare February 5, 2021 23:02

jordanlewis requested review from a team and dt and removed request for a team February 5, 2021 23:02

jordanlewis force-pushed the pg_table_size branch 3 times, most recently from 1708a26 to e7a7ac6 Compare February 5, 2021 23:17

knz suggested changes Feb 8, 2021

View reviewed changes

jordanlewis force-pushed the pg_table_size branch 3 times, most recently from 29eed06 to ff432d3 Compare February 12, 2021 22:51

jordanlewis added 2 commits February 12, 2021 18:57

wip: moving stats for span impl to sql server

5b867ab

jordanlewis force-pushed the pg_table_size branch from ff432d3 to 5b867ab Compare February 12, 2021 22:59

knz mentioned this pull request Feb 15, 2021

sql: improper dependency on the RPC servers from SQL #60584

Open

wsamoht mentioned this pull request Dec 1, 2023

fix: Any artisan command that drops tables erroring with unknown function: pg_total_relation_size() ylsideas/cockroachdb-laravel#26

Merged

Conversation

jordanlewis commented Jan 29, 2021 • edited by rafiss Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 29, 2021

Uh oh!

otan Jan 31, 2021

Choose a reason for hiding this comment

Uh oh!

jordanlewis Feb 1, 2021

Choose a reason for hiding this comment

Uh oh!

jordanlewis commented Feb 1, 2021

Uh oh!

jordanlewis commented Feb 1, 2021

Uh oh!

otan commented Feb 1, 2021

Uh oh!

jordanlewis commented Feb 2, 2021

Uh oh!

otan left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented Feb 3, 2021

Uh oh!

rafiss commented Feb 5, 2021

Uh oh!

jordanlewis commented Feb 5, 2021

Uh oh!

jordanlewis commented Feb 5, 2021

Uh oh!

jordanlewis commented Feb 5, 2021

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented Feb 8, 2021

Uh oh!

knz commented Feb 8, 2021

Uh oh!

jordanlewis commented Feb 12, 2021

Uh oh!

knz commented Feb 15, 2021

Uh oh!

knz commented Feb 16, 2021

Uh oh!

jordanlewis commented Oct 8, 2021

Uh oh!

brandonrjacobs commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jordanlewis commented Jan 29, 2021 •

edited by rafiss

Loading

brandonrjacobs commented Apr 3, 2024 •

edited

Loading