base,server,sql: merge node and SQL instance ID containers by knz · Pull Request #73156 · cockroachdb/cockroach

knz · 2021-11-25T13:39:24Z

Informs #58938.

Based off a comment by Radu here.

This commit ensures that the same data structure can store both SQL
instance IDs and Node IDs. The decision is made upon instantiation of
the container which of the two types of IDs the container stores.

This simplifies the code and paves the road to using a single
container in servers for the purpose of identifying the server
instance in traces and logs.

This also revisits the change in #72607 by switching the
tracing/logging prefix sqli to nsql for SQL instance servers.

Release note: None

cockroach-teamcity · 2021-11-25T13:39:30Z

This change is

73157: sidetransport: fix a test r=erikgrinaker a=knz Found via a lint failure in #73156. A `NodeIDContainer` is not an integer. Use `%s` to format it. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

RaduBerinde

It would be useful to mark (with TODOs) the parts of the API that are meant to be temporary (and not part of the desired end-state).

CC @tbg too

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)

pkg/base/node_id.go, line 26 at r1 (raw file):

)

// NodeIDContainer is used to share a single roachpb.NodeID instance between

[nit] NodeID or SQLInstanceID

pkg/base/node_id.go, line 41 at r1 (raw file):

	str atomic.Value

	// sqlInstance is set to true when the node is a SQL instance.

[nit] I'd move this next to nodeID to maintain the size of the struct.

Btw, don't we want to be able to set both atomically? Maybe we should just switch to using negative values for one kind.

pkg/base/node_id.go, line 65 at r1 (raw file):

//
// Note that if the container is configured to contain SQL instance IDs, Get still
// works: this ensures that the components such as the RPC Context which

The desired end-state is to never convert one type of ID to the other implicitly, correct? If this is temporary, I think there should be a (preferred) GetNodeID() which panics if sqlInstance is true and ConvertToNodeDeprecated() version which doesn't. And there should be a GetSQLInstanceID() which panics if sqlInstance is false.

pkg/base/node_id.go, line 86 at r1 (raw file):

			log.Infof(ctx, "ID set to %d", val)
		}
	} else if n.sqlInstance != sqlInstance {

This is racy, no? If two goroutines do setInternal(ctx, val, true) in parallel with the same val, that should be allowed I think? One might see the nodeID value before the bool flag is set. Also, accessing the flag without atomics might upset the race detector.

pkg/base/node_id.go, line 193 at r1 (raw file):

// NewSQLIDContainerForNode sets up a SQLIDContainer which serves the underlying
// NodeID as the SQL instance ID.
func NewSQLIDContainerForNode(nodeID *NodeIDContainer) *SQLIDContainer {

I don't get this API. We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance. Feels like it should be the opposite. I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID. And I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).

pkg/base/node_id.go, line 204 at r1 (raw file):

// container will report the SQL instance ID value as NodeID via
// its Get() method.
func (n *NodeIDContainer) SwitchToSQLIDContainer() *SQLIDContainer {

It looks like this is only used on zero values for the receiver, can't we just make it a constructor?

knz

mark (with TODOs) the parts of the API that are meant to be temporary (and not part of the desired end-state).

I don't see anything temporary here.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)

pkg/base/node_id.go, line 41 at r1 (raw file):

I'd move this next to nodeID to maintain the size of the struct.

Done.

don't we want to be able to set both atomically?

No. The determination of whether the container will be for a node ID or SQL instance ID is done just once, in a non-concurrent context.

The setting of the ID is done later, in a concurrent context.

pkg/base/node_id.go, line 65 at r1 (raw file):

The desired end-state is to never convert one type of ID to the other implicitly, correct

That's incorrect - this is already the case, not a "designed end-state". We never convert.

What is perhaps unclear (and I'm adding one more sentence to the docstring for it) is that the distinction between a node ID and a SQL instance ID does not matter for certain parts of the code (specifically - everything that has to do with server-to-server RPCs) and we are OK using the type "node ID" even when the value is a SQL instance ID. It would be silly to duplicate that code.

pkg/base/node_id.go, line 86 at r1 (raw file):
It's not racy because the value of n.sqlInstance is not written concurrently with the read accesses.

One might see the nodeID value before the bool flag is set

No. The determination of .sqlInstance is done before the IDs get populated.

(I could add an additional bool to the struct to prove this fact with an assertion, if that would assuage you? But don't you trust the race detector, which is not failing here?)

pkg/base/node_id.go, line 193 at r1 (raw file):

We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance

You're misreading the assertion: it's there to ensure we're not redundantly call NewSQLIDContainerForNode, then convert the resulting *SQLIDContainer to *NodeIDContainer, then call NewSQLIDContainerForNode again. This is an API sequencing assertion.

I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID.

That sounds right, why do you think you would be able to?

I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).

That's already enforced, by the check n.sqlInstance == sqlInstance in Set().

It's perhaps hard to understand in the context of this PR, but it might be clarified in the next PR that uses it: #73159

pkg/base/node_id.go, line 204 at r1 (raw file):

It looks like this is only used on zero values for the receiver

correct.

can't we just make it a constructor?

No: the NodeIDContainer is already instantiated and already shared by reference in multiple places by the time this method is called.
(Believe me, I tried: #72912 - it's a bottomless pit of yak shaving).

Instead I found this API is simpler and used advantageously here: #73159.

I appreciate that you may subjectively thing "this is code stink, I wish it was different". I would agree. But I am looking for an incremental improvement here, not a multi-week project. Unless you can prove me this can be simplified by a demonstration that costs both you and I less than an hour of work, I suggest we don't discuss this at this time and perhaps file a followup issue instead.

RaduBerinde · 2021-11-29T13:05:59Z

pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, knz (kena) wrote…

The desired end-state is to never convert one type of ID to the other implicitly, correct

That's incorrect - this is already the case, not a "designed end-state". We never convert.

What is perhaps unclear (and I'm adding one more sentence to the docstring for it) is that the distinction between a node ID and a SQL instance ID does not matter for certain parts of the code (specifically - everything that has to do with server-to-server RPCs) and we are OK using the type "node ID" even when the value is a SQL instance ID. It would be silly to duplicate that code.

Then this is the part I strongly disagree with. That allows anybody to easily add code that uses the NodeID as a.. node ID and not a "whatever" ID. We are converting between different go types here. There should be a separate type for the code that doesn't care (eg ServerID) from which it wouldn't be legal to get a node or sql instance ID. There wouldn't be duplication, though we'd need to change the type used in various parts of the code.

That all doesn't need to be done now, but we need to first agree on the "vision" for the desired end state, and then we need to mark code that isn't "legal" in that end state as temporary. My strong opinion is that the desired end state is where 1) there is a single piece of "setup" code (which isn't part of the infrastructure like id containers) which knows that for the system tenant the SQL instance ID is equal to the node ID and 2) it is never possible to write code (that doesn't involve explicit casts) which can interpret one type of ID as the other.

RaduBerinde · 2021-11-29T13:12:38Z

pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, knz (kena) wrote…

We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance

You're misreading the assertion: it's there to ensure we're not redundantly call NewSQLIDContainerForNode, then convert the resulting *SQLIDContainer to *NodeIDContainer, then call NewSQLIDContainerForNode again. This is an API sequencing assertion.

I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID.

That sounds right, why do you think you would be able to?

I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).

That's already enforced, by the check n.sqlInstance == sqlInstance in Set().

It's perhaps hard to understand in the context of this PR, but it might be clarified in the next PR that uses it: #73159

I could create a NodeIDContainer, populate it with a node ID, then call this method. Now I have a SQLIDContainer which holds a node ID (or am I missing something?). Maybe we could assert that nodeID contains the zero value?

RaduBerinde · 2021-11-29T13:46:24Z

pkg/base/node_id.go, line 86 at r1 (raw file):

Previously, knz (kena) wrote…

It's not racy because the value of n.sqlInstance is not written concurrently with the read accesses.

One might see the nodeID value before the bool flag is set

No. The determination of .sqlInstance is done before the IDs get populated.

(I could add an additional bool to the struct to prove this fact with an assertion, if that would assuage you? But don't you trust the race detector, which is not failing here?)

I see, I will look again. Do you think it would make things more clear to change the bool to a tri-state (uninitialized, node, sql instance) and only allow state transitions from uninitialized?

rimadeodhar

Reviewed 6 of 8 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)

pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, RaduBerinde wrote…

Then this is the part I strongly disagree with. That allows anybody to easily add code that uses the NodeID as a.. node ID and not a "whatever" ID. We are converting between different go types here. There should be a separate type for the code that doesn't care (eg ServerID) from which it wouldn't be legal to get a node or sql instance ID. There wouldn't be duplication, though we'd need to change the type used in various parts of the code.

That all doesn't need to be done now, but we need to first agree on the "vision" for the desired end state, and then we need to mark code that isn't "legal" in that end state as temporary. My strong opinion is that the desired end state is where 1) there is a single piece of "setup" code (which isn't part of the infrastructure like id containers) which knows that for the system tenant the SQL instance ID is equal to the node ID and 2) it is never possible to write code (that doesn't involve explicit casts) which can interpret one type of ID as the other.

+1 to Radu's comment. I think it is important to keep the semantic differences between a node and a SQL instance ID very obvious with the unified API. It would be nice to have something along the lines of GetNodeID() to explicitly fetch node id which will throw if the container is a sql id container and vice versa. And for code that doesn't care either which way, we can have a GetID or something similar to make it very obvious that the caller for this method should be agnostic to the container type.

pkg/base/node_id.go, line 30 at r2 (raw file):

// and getting the value. Once a value is set, the value cannot
// change.
type NodeIDContainer struct {

Nit: It will be nice to rename it at some point to IDContainer instead of NodeIDContainer since we will be using it to store either SQL instance ID or node ID. Also, nodeID below can be renamed to just id.

knz

All right in light of the new RFC in #73309, let's accept that this PR is an intermediate state of affairs. I have added references to the RFC as comments in the code that needs to be changed.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)

pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

+1 to Radu's comment. I think it is important to keep the semantic differences between a node and a SQL instance ID very obvious with the unified API. It would be nice to have something along the lines of GetNodeID() to explicitly fetch node id which will throw if the container is a sql id container and vice versa. And for code that doesn't care either which way, we can have a GetID or something similar to make it very obvious that the caller for this method should be agnostic to the container type.

Yes we agreed on this offline. Adding a reference to #73309 .

pkg/base/node_id.go, line 86 at r1 (raw file):

Previously, RaduBerinde wrote…

I see, I will look again. Do you think it would make things more clear to change the bool to a tri-state (uninitialized, node, sql instance) and only allow state transitions from uninitialized?

I could do this, but I'd rather use the encoding you proposed in #73309. Let's do so in a separate PR.

pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, RaduBerinde wrote…

I could create a NodeIDContainer, populate it with a node ID, then call this method. Now I have a SQLIDContainer which holds a node ID (or am I missing something?). Maybe we could assert that nodeID contains the zero value?

Good idea. Done.

pkg/base/node_id.go, line 30 at r2 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Nit: It will be nice to rename it at some point to IDContainer instead of NodeIDContainer since we will be using it to store either SQL instance ID or node ID. Also, nodeID below can be renamed to just id.

Added a reference to #73309 . We'll do this renaming in htat context.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)

pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, knz (kena) wrote…

Good idea. Done.

Turns out we can't do that - we have multiple tests that manually inject a non-zero value before server initialization.

This commit ensures that the same data structure can store both SQL instance IDs and Node IDs. The decision is made upon instantiation of the container which of the two types of IDs the container stores. This simplifies the code and paves the road to using a single container in servers for the purpose of identifying the server instance in traces and logs. This also revisits the change in cockroachdb#72607 by switching the tracing/logging prefix `sqli` to `nsql` for SQL instance servers. Release note: None

knz requested review from RaduBerinde and rimadeodhar November 25, 2021 13:39

knz requested a review from a team as a code owner November 25, 2021 13:39

knz requested a review from a team November 25, 2021 13:39

knz force-pushed the 20211125-instance-node-id branch from ee9cb8a to d44fa97 Compare November 25, 2021 13:43

knz mentioned this pull request Nov 25, 2021

sidetransport: fix a test #73157

Merged

knz force-pushed the 20211125-instance-node-id branch from d44fa97 to 8dd1d82 Compare November 25, 2021 14:28

This was referenced Nov 25, 2021

catalog/lease: connect test ambient context to node ID #73158

Merged

server: use a shared ID container + guarantee log tag #73159

Merged

knz force-pushed the 20211125-instance-node-id branch from 8dd1d82 to d828b01 Compare November 25, 2021 14:48

postamar removed the request for review from a team November 25, 2021 14:50

knz force-pushed the 20211125-instance-node-id branch from d828b01 to daefceb Compare November 25, 2021 15:30

RaduBerinde reviewed Nov 27, 2021

View reviewed changes

knz force-pushed the 20211125-instance-node-id branch from daefceb to 1706db5 Compare November 29, 2021 11:13

knz commented Nov 29, 2021

View reviewed changes

rimadeodhar reviewed Nov 29, 2021

View reviewed changes

knz commented Dec 8, 2021

View reviewed changes

knz force-pushed the 20211125-instance-node-id branch from 1706db5 to fa43497 Compare December 8, 2021 12:52

knz commented Dec 8, 2021

View reviewed changes

knz force-pushed the 20211125-instance-node-id branch from fa43497 to 6d6a241 Compare December 8, 2021 13:12

craig bot merged commit 6d6a241 into cockroachdb:master Dec 8, 2021

knz deleted the 20211125-instance-node-id branch December 8, 2021 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base,server,sql: merge node and SQL instance ID containers#73156

base,server,sql: merge node and SQL instance ID containers#73156
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20211125-instance-node-id

knz commented Nov 25, 2021 •

edited

Loading

Uh oh!

cockroach-teamcity commented Nov 25, 2021

Uh oh!

RaduBerinde left a comment

Uh oh!

knz left a comment

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

rimadeodhar left a comment

Uh oh!

knz left a comment

Uh oh!

knz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

knz commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Nov 25, 2021

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

RaduBerinde commented Nov 29, 2021

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

knz commented Nov 25, 2021 •

edited

Loading