Skip to content

sql: add fullStatisticsID column to system.table_statistics#93751

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
faizaanmadhani:faizaan/partial-stat-key
Dec 17, 2022
Merged

sql: add fullStatisticsID column to system.table_statistics#93751
craig[bot] merged 1 commit intocockroachdb:masterfrom
faizaanmadhani:faizaan/partial-stat-key

Conversation

@faizaanmadhani
Copy link
Copy Markdown
Contributor

This commit adds a column to system.table_statistics used by partial statistics to store an id that references the full statistic that the partial statistics was derived from. For full statistics, the value in this column will be NULL.

This commit updates a migration/upgrade that just included the partialPredicate column.

Epic: CRDB-19449

Release note (sql change): system.table_statistics now contains a column called fullStatisticsID to store an id referencing the full table statistic the partial statistic was derived from.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@faizaanmadhani faizaanmadhani marked this pull request as ready for review December 15, 2022 22:09
@faizaanmadhani faizaanmadhani requested a review from a team December 15, 2022 22:09
@faizaanmadhani faizaanmadhani requested a review from a team as a code owner December 15, 2022 22:09
@faizaanmadhani faizaanmadhani force-pushed the faizaan/partial-stat-key branch 9 times, most recently from 0c2cca0 to 8d6d4f3 Compare December 16, 2022 14:26
Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for staying up late to do this! I really appreciate that you worked hard to make this feature extra safe. This will definitely help me sleep better at night when I'm on call. 🙂

Reviewed 12 of 16 files at r1, 1 of 1 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani, @mgartner, and @rytaft)


pkg/sql/alter_table.go line 1352 at r3 (raw file):

					histogram,
					"partialPredicate",
          "fullStatisticsID"

nit: Indentation is off here.


pkg/sql/show_stats.go line 53 at r3 (raw file):

	{Name: "avg_size", Typ: types.Int},
	{Name: "partial_predicate", Typ: types.String},
	{Name: "full_statistics_id", Typ: types.Int},

bikeshedding: In SHOW STATISTICS output instead of "statisticID" we call it "histogram_id", so maybe this should be "full_histogram_id" or "parent_histogram_id" or something like that. And it might be nice to put it after histogram_id.


pkg/sql/show_stats.go line 148 at r3 (raw file):

				avgSizeIdx
				partialPredicateIdx
				fullStatisticsIdx

micro-nit: fullStatisticsIDIdx


pkg/sql/catalog/systemschema/system.go line 268 at r3 (raw file):

	"avgSize"            INT8       NOT NULL DEFAULT 0,
	"partialPredicate"   STRING,
	"fullStatisticsID"    INT8,

suggestion: If we're going to use PartialPredicate = "" and FullStatisticID = 0 for full statistics in various places, then it might be simpler to make these columns NOT NULL and use those sentinel values on disk, too.


pkg/sql/stats/new_stat.go line 98 at r3 (raw file):

	if !settings.Version.IsActive(ctx, clusterversion.V23_1AddPartialStatisticsColumns) {
		_, err := executor.Exec(

While we're here, could you add a check that returns an error if partialPredicate != "" (as described in this comment)?


pkg/sql/stats/new_stat.go line 141 at r3 (raw file):

					histogram,
					"partialPredicate",
          "fullStatisticsID"

nit: Indentation is off.


pkg/upgrade/upgrades/upgrades.go line 210 at r3 (raw file):

		descIDSequenceForSystemTenant,
	),
	upgrade.NewTenantUpgrade("add a partial predicate column to system.table_statistics",

nit: Would be nice to also change this message.


pkg/upgrade/upgrades/alter_table_statistics_partial_predicate_and_id.go line 26 at r3 (raw file):

ADD COLUMN IF NOT EXISTS "partialPredicate" STRING,
ADD COLUMN IF NOT EXISTS "fullStatisticsID" INT8
FAMILY "fam_0_tableID_statisticID_name_columnIDs_createdAt_rowCount_distinctCount_nullCount_histogram"`

nit: I don't think we need this FAMILY part. (This table only has one column family, which these columns will be added to automatically.)

Code quote:

fam_0_tableID_statisticID_name_columnIDs_createdAt_rowCount_distinctCount_nullCount_histogram

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 16 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani and @rytaft)


pkg/sql/stats/table_statistic.proto line 65 at r3 (raw file):

  // that it was created from. It is 0 for full statistics which will be
  // NULL when stored in system.table_statistics.
  uint64 full_statistics_id = 12 [(gogoproto.customname) = "FullStatisticsID", (gogoproto.nullable) = true];

So if this will be 0 for full statistics, should (gogoproto.nullable) be false?

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani and @rytaft)


pkg/sql/catalog/systemschema/system.go line 268 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

suggestion: If we're going to use PartialPredicate = "" and FullStatisticID = 0 for full statistics in various places, then it might be simpler to make these columns NOT NULL and use those sentinel values on disk, too.

One more bikeshed / nit: the other column is singular ("statisticID" rather than "statisticsID") so could we make this new column singular, to match ("fullStatisticID")?

@faizaanmadhani faizaanmadhani force-pushed the faizaan/partial-stat-key branch from 8d6d4f3 to dc73ace Compare December 16, 2022 20:46
Copy link
Copy Markdown
Contributor Author

@faizaanmadhani faizaanmadhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews! I've addressed all the comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rytaft)


pkg/sql/alter_table.go line 1352 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

nit: Indentation is off here.

Fixed! Sorry this keeps happening -- i don't actually see the incorrect indentation in Goland so it's almost guesswork to try to align it correctly.


pkg/sql/show_stats.go line 53 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

bikeshedding: In SHOW STATISTICS output instead of "statisticID" we call it "histogram_id", so maybe this should be "full_histogram_id" or "parent_histogram_id" or something like that. And it might be nice to put it after histogram_id.

Done.


pkg/sql/show_stats.go line 148 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

micro-nit: fullStatisticsIDIdx

Done.


pkg/sql/catalog/systemschema/system.go line 268 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

One more bikeshed / nit: the other column is singular ("statisticID" rather than "statisticsID") so could we make this new column singular, to match ("fullStatisticID")?

I still think it would be better to keep the columns as NULL. It's not very complicated to translate between a tree.DNull datum and the value that we need and we do it consistently for other nullable fields (notably histograms -- I think histogram_id is NULL for multi-column statistics but an id for single column statistics, even though no histogram_id would ever be 0). Additionally, from the users perspective, it makes more sense that the column is NULL rather than 0 because 0 may indicate it references to a full statistic even though no statisticID can be 0.


pkg/sql/stats/new_stat.go line 98 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

While we're here, could you add a check that returns an error if partialPredicate != "" (as described in this comment)?

Done.


pkg/sql/stats/table_statistic.proto line 65 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

So if this will be 0 for full statistics, should (gogoproto.nullable) be false?

So, we can't make it true because doing so throws this error when running ./dev gen protobuf.

ERROR: field TableStatisticProto.FullStatisticsId is a native type and in proto3 syntax with nullable=false there exists conflicting implementations when encoding zero values--gogoroach_out: protoc-gen-gogoroach: Plugin failed with status code 1.
2022/12/16 15:38:53 error running protoc: exit status 1

However, I removed them and I think this is just true by default because it didn't change the resulting generated go file.

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close! This is looking great!

Reviewed 6 of 13 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani, @mgartner, and @rytaft)


pkg/sql/alter_table.go line 1352 at r3 (raw file):

Previously, faizaanmadhani (Faizaan Madhani) wrote…

Fixed! Sorry this keeps happening -- i don't actually see the incorrect indentation in Goland so it's almost guesswork to try to align it correctly.

No big deal 🙂 (There might be a Goland setting to highlight tabs vs spaces?)


pkg/sql/alter_table.go line 1364 at r4 (raw file):

		histogram,
		predicateValue,
		s.FullStatisticsID,

Will this sometimes be 0? Do we need to do the same interface{} stuff we're doing above for the predicate to insert NULL?


pkg/sql/show_stats.go line 126 at r4 (raw file):

						  %s
    				  histogram,
              %s

If partialStatsVerActive is false, this will leave a trailing comma after histogram, which will throw a SQL error. So I think the query needs to be something like "avgSize", %s histogram %s FROM and then fullStatisticIDCol needs to be something like ,"fullStatisticID".

(Also indentation is off here 😛)


pkg/sql/catalog/systemschema/system.go line 268 at r3 (raw file):

Previously, faizaanmadhani (Faizaan Madhani) wrote…

I still think it would be better to keep the columns as NULL. It's not very complicated to translate between a tree.DNull datum and the value that we need and we do it consistently for other nullable fields (notably histograms -- I think histogram_id is NULL for multi-column statistics but an id for single column statistics, even though no histogram_id would ever be 0). Additionally, from the users perspective, it makes more sense that the column is NULL rather than 0 because 0 may indicate it references to a full statistic even though no statisticID can be 0.

👍 Good points! Never mind.


pkg/sql/execinfrapb/processors_table_stats.proto line 61 at r4 (raw file):

  // statistics. It is the statistic id of the full statistic that this partial
  // statistic was derived from.
  optional uint64 full_statistics_id = 8 [(gogoproto.customname) = "FullStatisticsID", (gogoproto.nullable) = false];

nit: I swear this is the last one, could we also make this one singular to match? :sorry_face:


pkg/sql/stats/stats_cache.go line 769 at r4 (raw file):

	"avgSize",
  %s
  histogram,

Same problem here: I think it needs to be "avgSize", %s histogram %s FROM without the comma after histogram, and the comma will need to be added to fullStatisticIDCol.


pkg/sql/stats/table_statistic.proto line 65 at r3 (raw file):

Previously, faizaanmadhani (Faizaan Madhani) wrote…

So, we can't make it true because doing so throws this error when running ./dev gen protobuf.

ERROR: field TableStatisticProto.FullStatisticsId is a native type and in proto3 syntax with nullable=false there exists conflicting implementations when encoding zero values--gogoroach_out: protoc-gen-gogoroach: Plugin failed with status code 1.
2022/12/16 15:38:53 error running protoc: exit status 1

However, I removed them and I think this is just true by default because it didn't change the resulting generated go file.

(You might hate me for this) nit: could we also make this one singular? (i.e. "full_statistic_id" and "FullStatisticID" instead of "full_statistics_id" and "FullStatisticsID"). :ducks_flying_shoe:

@faizaanmadhani faizaanmadhani force-pushed the faizaan/partial-stat-key branch 3 times, most recently from ba07f2f to f204793 Compare December 16, 2022 22:35
This commit adds a column to system.table_statistics used by
partial statistics to store an id that references the full
statistic that the partial statistics was derived from.
For full statistics, the value in this column will be NULL.

This commit updates a migration/upgrade that just included
the partialPredicate column.

Epic: CRDB-19449

Release note (sql change): system.table_statistics now
contains a column called fullStatisticsID to store an
id referencing the full table statistic the partial
statistic was derived from.
@faizaanmadhani faizaanmadhani force-pushed the faizaan/partial-stat-key branch from f204793 to aab61b6 Compare December 16, 2022 22:39
Copy link
Copy Markdown
Contributor Author

@faizaanmadhani faizaanmadhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rytaft)


pkg/sql/alter_table.go line 1364 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Will this sometimes be 0? Do we need to do the same interface{} stuff we're doing above for the predicate to insert NULL?

Yup! Done.


pkg/sql/show_stats.go line 126 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

If partialStatsVerActive is false, this will leave a trailing comma after histogram, which will throw a SQL error. So I think the query needs to be something like "avgSize", %s histogram %s FROM and then fullStatisticIDCol needs to be something like ,"fullStatisticID".

(Also indentation is off here 😛)

Done.


pkg/sql/stats/stats_cache.go line 769 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Same problem here: I think it needs to be "avgSize", %s histogram %s FROM without the comma after histogram, and the comma will need to be added to fullStatisticIDCol.

Done.


pkg/sql/stats/table_statistic.proto line 65 at r3 (raw file):

Previously, michae2 (Michael Erickson) wrote…

(You might hate me for this) nit: could we also make this one singular? (i.e. "full_statistic_id" and "FullStatisticID" instead of "full_statistics_id" and "FullStatisticsID"). :ducks_flying_shoe:

Done.

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice! Thank you!

Reviewed 2 of 13 files at r4, 9 of 9 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

@faizaanmadhani
Copy link
Copy Markdown
Contributor Author

TFTRs!! 🎉

bors r=michae2

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 16, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 17, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 17, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants