Skip to content

serverutils: nil check system config cache#89419

Closed
ajwerner wants to merge 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/deflake-testcluster-purge-cache
Closed

serverutils: nil check system config cache#89419
ajwerner wants to merge 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/deflake-testcluster-purge-cache

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner commented Oct 5, 2022

See here

Release note: None

@ajwerner ajwerner requested a review from msirek October 5, 2022 17:44
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @msirek)


pkg/testutils/serverutils/test_cluster_shim.go line 267 at r1 (raw file):

		sysconfigProvider := cluster.Server(i).SystemConfigProvider()
		if sysconfig := sysconfigProvider.GetSystemConfig(); sysconfig != nil {
			sysconfig.PurgeZoneConfigCache()

If sysconfig could be nil, would you mind also fixing this location? :

func (t *logicTest) purgeZoneConfig() {
for i := 0; i < t.cluster.NumServers(); i++ {
sysconfigProvider := t.cluster.Server(i).SystemConfigProvider()
sysconfig := sysconfigProvider.GetSystemConfig()
sysconfig.PurgeZoneConfigCache()
}
}

@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Oct 5, 2022

Looking more closely, I don't think I understand #87391. It clears a cache but the cache's source data should be immutable. Every time the system config values change, we construct a new SystemConfig object.

Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the details of how the zone config gets propagated, but I've seen the SystemConfig get created and be available, but at the time getZoneEntry is called for a database (as the referenced parent of a table), the zone config is not available yet, so calling thezoneConfigHook causes a nil ZoneConfig to be returned:

if errors.Is(err, errNoZoneConfigApplies) {
return nil, nil, true, nil
} else if err != nil {

We cache the nil ZoneConfig, so even if the config becomes available later, we won't load it because the nil is cached. Clearing out the cache causes future calls togetZoneEntry to look it up again and find it. I saw this comment in tests, not sure if it's related:
# Set the closed timestamp interval to be short to shorten the amount of time
# we need to wait for the system config to propagate.
statement ok
SET CLUSTER SETTING kv.closed_timestamp.side_transport_interval = '10ms';
statement ok
SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10ms';

Is it problematic for zone configs to not be immediately available if a table which depends on one is used? I don't know if there's a bigger issue here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @msirek)

@ajwerner
Copy link
Copy Markdown
Contributor Author

The part I don't get is that when the changes propagate, the object is replaced wholesale -- cache included. Do you know of a test which fails without the cache clearing change?

@msirek
Copy link
Copy Markdown
Contributor

msirek commented Oct 27, 2022

Do you know of a test which fails without the cache clearing change?

Yes, regional_by_row_query_behavior fails under --stress unless the cache is cleared. #87391

@msirek
Copy link
Copy Markdown
Contributor

msirek commented Oct 27, 2022

Fixing this through #90802, including the PurgeZoneConfigCache call in pkg/sql/logictest/logic.go.

@ajwerner ajwerner closed this Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants