multitenant: replace EXPERIMENTAL_RELOCATE StoreID to NodeID lookup#92709
multitenant: replace EXPERIMENTAL_RELOCATE StoreID to NodeID lookup#92709craig[bot] merged 1 commit intocockroachdb:masterfrom ecwall:multitenant_experimental_relocate
Conversation
knz
left a comment
There was a problem hiding this comment.
I like it. But let's see what Arul thinks.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @rafiss)
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 435 at r1 (raw file):
} // StoreIDToNodeID implements the serverpb.TenantStatusServer interface.
Did we consider mirroring the strategy that we use to proxy gossip subscriptions for NodeDescriptors through to the storage node? See Connector.updateNodeAddress and Connector.GetNodeDescriptor. We could do the same for StoreDescriptors by subscribing (gossipSubsHandlers) to gossip.MakePrefixPattern(gossip.KeyStoreDescPrefix).
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 435 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did we consider mirroring the strategy that we use to proxy gossip subscriptions for
NodeDescriptors through to the storage node? SeeConnector.updateNodeAddressandConnector.GetNodeDescriptor. We could do the same forStoreDescriptors by subscribing (gossipSubsHandlers) togossip.MakePrefixPattern(gossip.KeyStoreDescPrefix).
I'm not familiar with this part of the code so I didn't think of that originally, but I updated it based on your suggestion which has simplified it a lot.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewed 1 of 8 files at r1, 2 of 10 files at r2, 5 of 5 files at r3, 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 150 at r3 (raw file):
c.mu.nodeDescs = make(map[roachpb.NodeID]*roachpb.NodeDescriptor) c.mu.nodeDescsByStoreID = make(map[roachpb.StoreID]*roachpb.NodeDescriptor)
Did you consider making this a bit more generalizable and storing the entire StoreDescriptor here instead?
pkg/ccl/kvccl/kvtenantccl/connector.go line 324 at r3 (raw file):
} // TODO(nvanbenschoten): this doesn't handle StoreDescriptor removal from the
I haven't looked deeply into what that comment is saying, but to confirm, have we validated the same issue applies to store descriptors?
pkg/ccl/kvccl/kvtenantccl/connector.go line 317 at r4 (raw file):
// updateStoreAddress handles updates to "store" gossip keys, performing the // corresponding update to the Connector's cached NodeDescriptor set. func (c *Connector) updateStoreAddress(ctx context.Context, key string, content roachpb.Value) {
Can we add a test to make sure this map is populated and we receive updates to StoreDescriptors?
pkg/kv/kvclient/kvcoord/node_store.go line 22 at r3 (raw file):
// It returns an error if the node is not known by the store. GetNodeDescriptor(roachpb.NodeID) (*roachpb.NodeDescriptor, error) // GetNodeDescriptorByStoreID looks up the descriptor of the node by store ID.
Similar to my comment elsewhere, should we instead make this more generalizable and have it return a StoreDescriptor instead?
pkg/sql/relocate_range.go line 58 at r3 (raw file):
} // todo(ewall): replace with NodeDescStore.GetNodeDescriptorByStoreID.
nit: "TODO(ewall)"
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 150 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Did you consider making this a bit more generalizable and storing the entire
StoreDescriptorhere instead?
I think that is more complicated because the NodeDescriptor inside the StoreDescriptor needs to be kept in sync when a NodeDescriptor updates. This is also true for the impl I currently have so I am changing this to a map[roachpb.StoreID]roachpb.NodeID and doing a double lookup in order to use nodeDescs as a single source of truth.
If at some point in the future we need to cache both NodeDescriptors and StoreDescriptors, we might need to think of how to keep the caches in sync.
pkg/ccl/kvccl/kvtenantccl/connector.go line 324 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I haven't looked deeply into what that comment is saying, but to confirm, have we validated the same issue applies to store descriptors?
Keys are only added to the map/cache, but never removed.
pkg/ccl/kvccl/kvtenantccl/connector.go line 317 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Can we add a test to make sure this map is populated and we receive updates to
StoreDescriptors?
I'm not sure how to test this independently (TestMultiTenantAdminFunction tests this along with every other layer). Is there an existing more specific test for updateNodeAddress?
pkg/sql/relocate_range.go line 58 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: "TODO(ewall)"
Done.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 150 at r3 (raw file):
we might need to think of how to keep the caches in sync.
Is this possible? We Gossip store descriptors around here --
cockroach/pkg/kv/kvserver/store.go
Lines 2598 to 2605 in 830ab32
Tracing through that s.Descriptor call, it seems like this is set from the NodeDescriptor on the Node struct. My reading of this field is that it's immutable, so they should never be out of sync, by design.
There's benefit to storing the entire StoreDescriptor here. One that comes to mind is this special casing we do around zone config validation for secondary tenants. We'd no longer need to make a distinction here between the system tenant and secondary tenants if we had access to the entire StoreDescriptor.
cockroach/pkg/sql/set_zone_config.go
Lines 1003 to 1015 in 830ab32
pkg/ccl/kvccl/kvtenantccl/connector.go line 317 at r4 (raw file):
Previously, ecwall (Evan Wall) wrote…
I'm not sure how to test this independently (
TestMultiTenantAdminFunctiontests this along with every other layer). Is there an existing more specific test forupdateNodeAddress?
I was looking around and I came across NewStoreGossiper. Maybe we can make use of that to trigger gossip updates and make sure we're able to see them?
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 317 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I was looking around and I came across
NewStoreGossiper. Maybe we can make use of that to trigger gossip updates and make sure we're able to see them?
Added some assertions to TestConnectorGossipSubscription.
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 150 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
we might need to think of how to keep the caches in sync.
Is this possible? We Gossip store descriptors around here --
cockroach/pkg/kv/kvserver/store.go
Lines 2598 to 2605 in 830ab32
Tracing through that
s.Descriptorcall, it seems like this is set from theNodeDescriptoron theNodestruct. My reading of this field is that it's immutable, so they should never be out of sync, by design.There's benefit to storing the entire
StoreDescriptorhere. One that comes to mind is this special casing we do around zone config validation for secondary tenants. We'd no longer need to make a distinction here between the system tenant and secondary tenants if we had access to the entireStoreDescriptor.cockroach/pkg/sql/set_zone_config.go
Lines 1003 to 1015 in 830ab32
Updated based on our meeting.
arulajmani
left a comment
There was a problem hiding this comment.
Thanks for making the change! I really like that this will help us add the same zone config validation for secondary tenants that only exists for the system tenant today.
This should be good to merge once we've reverted the rename of NodeDescStore to DescCache.
Reviewed 18 of 30 files at r9, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, @nvanbenschoten, and @rafiss)
pkg/ccl/kvccl/kvtenantccl/connector.go line 317 at r4 (raw file):
Previously, ecwall (Evan Wall) wrote…
Added some assertions to
TestConnectorGossipSubscription.
Nice!
pkg/gossip/gossip.go line 897 at r9 (raw file):
// updateStoreAddress is a gossip callback which is used to update storeMap. func (g *Gossip) updateStoreAddress(key string, content roachpb.Value) {
I don't think we're updating any addresses here. Should we switch back to updateStoreMap instead?
pkg/kv/kvclient/kvcoord/node_store.go line 18 at r9 (raw file):
// // Implementations of the interface are expected to be threadsafe. type DescCache interface {
Can we revert this change? DescCache is too generalized given how overloaded the term is. I think it's fine if StoreDescriptors get lumped under NodeDescsStore.
pkg/kv/kvclient/kvtenant/connector.go line 50 at r9 (raw file):
// DescCache provides information on each of the KV nodes in the cluster // in the form of NodeDescriptors. This obviates the need for SQL-only
"in the form of NodeDescriptors and StoreDescriptors"
pkg/sql/exec_util.go line 1345 at r9 (raw file):
EventsExporter obs.EventsExporter // DescCache stores node descriptors in an in-memory cache.
"stores {Store,Node]Descriptors .. "
knz
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r3, 28 of 30 files at r9, 2 of 2 files at r10, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, @nvanbenschoten, and @rafiss)
pkg/sql/exec_util.go line 1345 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
"stores {Store,Node]Descriptors .. "
I think you meant a } here not ]
pkg/sql/relocate_range.go line 58 at r10 (raw file):
} // TODO(ewall): replace with DescCache.GetStoreDescriptor.
I think it is time to do so?
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/gossip/gossip.go line 897 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I don't think we're updating any addresses here. Should we switch back to
updateStoreMapinstead?
There is no longer a storeMap field so I don't think there should be a updateStoreMap function.
The corresponding NodeDescriptor function is called updateNodeAddress, so I made it consistent with that.
pkg/kv/kvclient/kvcoord/node_store.go line 22 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Similar to my comment elsewhere, should we instead make this more generalizable and have it return a
StoreDescriptorinstead?
Replied in other thread.
pkg/kv/kvclient/kvcoord/node_store.go line 18 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Can we revert this change?
DescCacheis too generalized given how overloaded the term is. I think it's fine ifStoreDescriptorsget lumped underNodeDescsStore.
I think NodeDescsStore is confusing for 2 reasons
- It is now storing StoreDescriptors as much as it is storing NodeDescriptors. Why not call it StoreDescStore?
- "Store" is now overloaded to mean both StoreDescriptor and an in-memory data store.
pkg/kv/kvclient/kvtenant/connector.go line 50 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
"in the form of NodeDescriptors and StoreDescriptors"
Fixed.
pkg/sql/exec_util.go line 1345 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
"stores {Store,Node]Descriptors .. "
Fixed.
pkg/sql/relocate_range.go line 58 at r10 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I think it is time to do so?
I have a separate PR for this.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @knz, @nvanbenschoten, and @rafiss)
pkg/gossip/gossip.go line 897 at r9 (raw file):
The corresponding NodeDescriptor function is called updateNodeAddress, so I made it consistent with that.
Well, that's because that function actually updates node addresses. There's no addresses to speak of when we're dealing with StoreDescriptors, unlike NodeDescriptors, so I'm not sure what being consistent with updateNodeAddresses gives us.
I buy that you're changing how the map is stored, but it's still a map at the end of the day.
pkg/kv/kvclient/kvcoord/node_store.go line 18 at r9 (raw file):
Previously, ecwall (Evan Wall) wrote…
I think
NodeDescsStoreis confusing for 2 reasons
- It is now storing StoreDescriptors as much as it is storing NodeDescriptors. Why not call it StoreDescStore?
- "Store" is now overloaded to mean both StoreDescriptor and an in-memory data store.
Sure, but it isn't obvious to me what a kvcoord.DescCache stores given how overloaded/generalizable those two words are. I'd wager other readers would have the same problem.
Can we call this something more specific, if you don't want to revert the change?
ecwall
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @nvanbenschoten, and @rafiss)
pkg/gossip/gossip.go line 897 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
The corresponding NodeDescriptor function is called updateNodeAddress, so I made it consistent with that.
Well, that's because that function actually updates node addresses. There's no addresses to speak of when we're dealing with
StoreDescriptors, unlikeNodeDescriptors, so I'm not sure what being consistent withupdateNodeAddressesgives us.I buy that you're changing how the map is stored, but it's still a map at the end of the day.
It looks like it updates the NodeDesc in general, not just the address. It just has some special logic around writing the address specifically.
pkg/kv/kvclient/kvcoord/node_store.go line 18 at r9 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Sure, but it isn't obvious to me what a
kvcoord.DescCachestores given how overloaded/generalizable those two words are. I'd wager other readers would have the same problem.Can we call this something more specific, if you don't want to revert the change?
I'm not sure what else to call it since I thought it would cache any Descriptors from KV so I'll just revert it.
arulajmani
left a comment
There was a problem hiding this comment.
Reviewed 30 of 30 files at r12, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ecwall, @knz, @nvanbenschoten, and @rafiss)
pkg/kv/kvclient/kvcoord/node_store.go line 18 at r9 (raw file):
Previously, ecwall (Evan Wall) wrote…
I'm not sure what else to call it since I thought it would cache any Descriptors from KV so I'll just revert it.
Thanks for reverting. As an example, we would never expect this to cache things like RangeDescriptors, which is why the overloaded term brought about more questions than it answered.
Fixes #91628 Replace EXPERIMENTAL_RELOCATE's StoreID -> NodeID lookup's gossip impl with an in-memory cache. Release note: None
|
bors r=arulajmani |
|
Build succeeded: |
Fixes #91628
Replace EXPERIMENTAL_RELOCATE's StoreID -> NodeID lookup's gossip impl with an in-memory cache.
Release note: None