sql: Fix panic when transitioning from REGIONAL BY ROW#61889
sql: Fix panic when transitioning from REGIONAL BY ROW#61889craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
41649e7 to
06d7e89
Compare
|
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
180d58b to
51c56e3
Compare
otan
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, and @arulajmani)
pkg/ccl/multiregionccl/regional_by_row_test.go, line 646 at r1 (raw file):
res, err = sqlDB.Query(`WITH jobs AS ( SELECT status, crdb_internal.pb_to_json(
feels like this SQL query can be function-ised with parameter "status", or made a const
pkg/ccl/multiregionccl/regional_by_row_test.go, line 675 at r1 (raw file):
// Wait. time.Sleep(time.Second * 2)
this time.Sleep seems potentially racy, esp under testrace. two possible ways around this, either:
- use / add a test knob to the GC job to signal that it is complete using a channel, i.e.
done := make(struct{})
... SeverParams.Knob... = {
GCCompleteKnob: func() {
done <- struct{}
}
}
....
<-done
// Validate indexes are cleaned up.
// ....
- use testutils.SucceedsSoon` for the rest below.
pkg/sql/region_util.go, line 469 at r1 (raw file):
for _, indexID := range indexIDs { for _, region := range regionConfig.Regions { zoneConfig.DeleteSubzone(uint32(indexID), string(region.Name))
urgh, DeleteSubzone is O(n) so this is O(n*regions)
guess it's not important to fix now.
pkg/sql/region_util.go, line 591 at r1 (raw file):
newZoneConfigIsEmpty := newZoneConfig.Equal(zonepb.NewZoneConfig()) currentZoneConfigIsEmpty := currentZoneConfig.Equal(zonepb.NewZoneConfig()) reWriteZoneConfig := !newZoneConfigIsEmpty
nit: style rewrite as a single word, i.e.rewriteZoneConfig
|
also to catch this should we consider setting a lower gc ttl for MR alter_table_locality tests? |
51c56e3 to
cb05804
Compare
ajstorm
left a comment
There was a problem hiding this comment.
Yes, I think we need to assess gc ttl behavior more holistically for MR test cases. I've added it to the test plan.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, and @otan)
pkg/ccl/multiregionccl/regional_by_row_test.go, line 646 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
feels like this SQL query can be function-ised with parameter "status", or made a const
Done.
pkg/ccl/multiregionccl/regional_by_row_test.go, line 675 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
this
time.Sleepseems potentially racy, esp under testrace. two possible ways around this, either:
- use / add a test knob to the GC job to signal that it is complete using a channel, i.e.
done := make(struct{}) ... SeverParams.Knob... = { GCCompleteKnob: func() { done <- struct{} } } .... <-done // Validate indexes are cleaned up. // ....
- use testutils.SucceedsSoon` for the rest below.
Thanks for the pointer to testutils.SucceedsSoon. Very helpful.
pkg/sql/region_util.go, line 591 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
nit: style rewrite as a single word, i.e.
rewriteZoneConfig
Fixed.
|
Hey @ajwerner. Any chance you could have a look at this sometime today? This issue is a release blocker. |
ajwerner
left a comment
There was a problem hiding this comment.
On it.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @arulajmani, and @otan)
cb05804 to
98b2165
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Regarding the more general problem, it feels to me like the right thing to do is to always remove the index's zone configs in the same transaction that writes the index GC job. You already cannot update the zone configs for an index which has been dropped so that wouldn't be a regression. That alone should fix the bug, right? In short, why don't we just always remove the zone config. We'll need to leave the code in the gcjob for gcjobs created in the older version for one release.
Reviewed 1 of 2 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, and @otan)
ajstorm
left a comment
There was a problem hiding this comment.
TFTR @ajwerner. I agree, and was thinking the same thing about the more generalized fix. Once I have the general repro, I'll open an issue.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @ajwerner, @arulajmani, and @otan)
Before this commit, the transition from REGIONAL BY ROW to REGIONAL BY TABLE would drop the partitioning on the table not drop the zone configurations. As a result, when the index GC runs, it would try and cleanup the zone configurations. This hit an issue though, as the types are not properly hydrated in the index GC code path. The end result was that the index GC would panic and bring down the node. To work around this problem this commit adds code to remove the zone configs when dropping the table partitioning. Note: This commit does not solve the root cause of this issue, but instead, prevent the failure from occurring when transitioning from REGIONAL BY ROW tables to REGIONAL BY TABLE tables. A new issue will be opened for the general problem. Release note: None
98b2165 to
70198b3
Compare
|
Merging with priority, as this is the last remaining release blocker for Beta 1. bors r+ p=9999 |
|
Should've done single mode as well if you want priority heheh |
|
Build succeeded: |
62008: sql: Change to using a session variable instead of FORCE syntax for overriding zone configuration settings for MR databases and tables r=otan a=ajstorm sql: Block zone config updates to multi-region tables Block users from updating the zone configurations of multi-region tables, without first having them set the session variable `override_multi_region_zone_config` to true. We block updates to multi-region table/partition/index zone configurations because we don't want users to accidentally override the prescribed settings, leaving them open to sub-optimal performance. Note that only the multi-region fields of the zone configuration are blocked behind this variable. All other fields (gc.ttlseconds, range_min/max_bytes, etc) can be updated without overriding. Release note (sql change): Block users from updating the zone configurations of multi-region tables. sql: Use session variable instead of FORCE to override zone configs With #61499 we introduced new syntax (FORCE) to override setting the zone configurations on multi-region databases. Upon further reflection, it was decided that a session variable would be better suited to the task. This commit pulls out the FORCE syntax and replaces it with the use of override_multi_region_zone_config; Release note (sql change): Revert the release notes on #61499. We now use a session variable (override_multi_region_zone_config) to override the zone configuration on multi-region databases (and tables). Resolves: #57668. Note to reviewers: Please ignore the first commit, as it's being separately reviewed as part of #61889. That commit was pulle out into a separate PR as it's a release blocker, and there was a need to get it in urgently. Co-authored-by: Adam Storm <storm@cockroachlabs.com>
Before this commit, the transition from REGIONAL BY ROW to REGIONAL
BY TABLE would drop the partitioning on the table not drop the
zone configurations. As a result, when the index GC runs, it would
try and cleanup the zone configurations. This hit an issue though,
as the types are not properly hydrated in the index GC code path.
The end result was that the index GC would panic and bring down the
node. To work around this problem this commit adds code to remove
the zone configs when dropping the table partitioning.
Note: This commit does not solve the root cause of this issue, but
instead, prevent the failure from occurring when transitioning from
REGIONAL BY ROW tables to REGIONAL BY TABLE tables. A new issue
will be opened for the general problem.
Release note: None
Resolves #61751 .