Skip to content

base,server,sql: merge node and SQL instance ID containers#73156

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20211125-instance-node-id
Dec 8, 2021
Merged

base,server,sql: merge node and SQL instance ID containers#73156
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20211125-instance-node-id

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Nov 25, 2021

Informs #58938.

Based off a comment by Radu here.

This commit ensures that the same data structure can store both SQL
instance IDs and Node IDs. The decision is made upon instantiation of
the container which of the two types of IDs the container stores.

This simplifies the code and paves the road to using a single
container in servers for the purpose of identifying the server
instance in traces and logs.

This also revisits the change in #72607 by switching the
tracing/logging prefix sqli to nsql for SQL instance servers.

Release note: None

@knz knz requested a review from a team as a code owner November 25, 2021 13:39
@knz knz requested a review from a team November 25, 2021 13:39
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20211125-instance-node-id branch from ee9cb8a to d44fa97 Compare November 25, 2021 13:43
@knz knz mentioned this pull request Nov 25, 2021
@knz knz force-pushed the 20211125-instance-node-id branch from d44fa97 to 8dd1d82 Compare November 25, 2021 14:28
@knz knz force-pushed the 20211125-instance-node-id branch from 8dd1d82 to d828b01 Compare November 25, 2021 14:48
@postamar postamar removed the request for review from a team November 25, 2021 14:50
craig bot pushed a commit that referenced this pull request Nov 25, 2021
73157: sidetransport: fix a test r=erikgrinaker a=knz

Found via a lint failure in #73156.

A `NodeIDContainer` is not an integer. Use `%s` to format it.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@knz knz force-pushed the 20211125-instance-node-id branch from d828b01 to daefceb Compare November 25, 2021 15:30
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to mark (with TODOs) the parts of the API that are meant to be temporary (and not part of the desired end-state).

CC @tbg too

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)


pkg/base/node_id.go, line 26 at r1 (raw file):

)

// NodeIDContainer is used to share a single roachpb.NodeID instance between

[nit] NodeID or SQLInstanceID


pkg/base/node_id.go, line 41 at r1 (raw file):

	str atomic.Value

	// sqlInstance is set to true when the node is a SQL instance.

[nit] I'd move this next to nodeID to maintain the size of the struct.

Btw, don't we want to be able to set both atomically? Maybe we should just switch to using negative values for one kind.


pkg/base/node_id.go, line 65 at r1 (raw file):

//
// Note that if the container is configured to contain SQL instance IDs, Get still
// works: this ensures that the components such as the RPC Context which

The desired end-state is to never convert one type of ID to the other implicitly, correct? If this is temporary, I think there should be a (preferred) GetNodeID() which panics if sqlInstance is true and ConvertToNodeDeprecated() version which doesn't. And there should be a GetSQLInstanceID() which panics if sqlInstance is false.


pkg/base/node_id.go, line 86 at r1 (raw file):

			log.Infof(ctx, "ID set to %d", val)
		}
	} else if n.sqlInstance != sqlInstance {

This is racy, no? If two goroutines do setInternal(ctx, val, true) in parallel with the same val, that should be allowed I think? One might see the nodeID value before the bool flag is set. Also, accessing the flag without atomics might upset the race detector.


pkg/base/node_id.go, line 193 at r1 (raw file):

// NewSQLIDContainerForNode sets up a SQLIDContainer which serves the underlying
// NodeID as the SQL instance ID.
func NewSQLIDContainerForNode(nodeID *NodeIDContainer) *SQLIDContainer {

I don't get this API. We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance. Feels like it should be the opposite. I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID. And I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).


pkg/base/node_id.go, line 204 at r1 (raw file):

// container will report the SQL instance ID value as NodeID via
// its Get() method.
func (n *NodeIDContainer) SwitchToSQLIDContainer() *SQLIDContainer {

It looks like this is only used on zero values for the receiver, can't we just make it a constructor?

@knz knz force-pushed the 20211125-instance-node-id branch from daefceb to 1706db5 Compare November 29, 2021 11:13
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark (with TODOs) the parts of the API that are meant to be temporary (and not part of the desired end-state).

I don't see anything temporary here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)


pkg/base/node_id.go, line 41 at r1 (raw file):

I'd move this next to nodeID to maintain the size of the struct.

Done.

don't we want to be able to set both atomically?

No. The determination of whether the container will be for a node ID or SQL instance ID is done just once, in a non-concurrent context.

The setting of the ID is done later, in a concurrent context.


pkg/base/node_id.go, line 65 at r1 (raw file):

The desired end-state is to never convert one type of ID to the other implicitly, correct

That's incorrect - this is already the case, not a "designed end-state". We never convert.

What is perhaps unclear (and I'm adding one more sentence to the docstring for it) is that the distinction between a node ID and a SQL instance ID does not matter for certain parts of the code (specifically - everything that has to do with server-to-server RPCs) and we are OK using the type "node ID" even when the value is a SQL instance ID. It would be silly to duplicate that code.


pkg/base/node_id.go, line 86 at r1 (raw file):
It's not racy because the value of n.sqlInstance is not written concurrently with the read accesses.

One might see the nodeID value before the bool flag is set

No. The determination of .sqlInstance is done before the IDs get populated.

(I could add an additional bool to the struct to prove this fact with an assertion, if that would assuage you? But don't you trust the race detector, which is not failing here?)


pkg/base/node_id.go, line 193 at r1 (raw file):

We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance

You're misreading the assertion: it's there to ensure we're not redundantly call NewSQLIDContainerForNode, then convert the resulting *SQLIDContainer to *NodeIDContainer, then call NewSQLIDContainerForNode again. This is an API sequencing assertion.

I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID.

That sounds right, why do you think you would be able to?

I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).

That's already enforced, by the check n.sqlInstance == sqlInstance in Set().

It's perhaps hard to understand in the context of this PR, but it might be clarified in the next PR that uses it: #73159


pkg/base/node_id.go, line 204 at r1 (raw file):

It looks like this is only used on zero values for the receiver

correct.

can't we just make it a constructor?

No: the NodeIDContainer is already instantiated and already shared by reference in multiple places by the time this method is called.
(Believe me, I tried: #72912 - it's a bottomless pit of yak shaving).

Instead I found this API is simpler and used advantageously here: #73159.

I appreciate that you may subjectively thing "this is code stink, I wish it was different". I would agree. But I am looking for an incremental improvement here, not a multi-week project. Unless you can prove me this can be simplified by a demonstration that costs both you and I less than an hour of work, I suggest we don't discuss this at this time and perhaps file a followup issue instead.

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, knz (kena) wrote…

The desired end-state is to never convert one type of ID to the other implicitly, correct

That's incorrect - this is already the case, not a "designed end-state". We never convert.

What is perhaps unclear (and I'm adding one more sentence to the docstring for it) is that the distinction between a node ID and a SQL instance ID does not matter for certain parts of the code (specifically - everything that has to do with server-to-server RPCs) and we are OK using the type "node ID" even when the value is a SQL instance ID. It would be silly to duplicate that code.

Then this is the part I strongly disagree with. That allows anybody to easily add code that uses the NodeID as a.. node ID and not a "whatever" ID. We are converting between different go types here. There should be a separate type for the code that doesn't care (eg ServerID) from which it wouldn't be legal to get a node or sql instance ID. There wouldn't be duplication, though we'd need to change the type used in various parts of the code.

That all doesn't need to be done now, but we need to first agree on the "vision" for the desired end state, and then we need to mark code that isn't "legal" in that end state as temporary. My strong opinion is that the desired end state is where 1) there is a single piece of "setup" code (which isn't part of the infrastructure like id containers) which knows that for the system tenant the SQL instance ID is equal to the node ID and 2) it is never possible to write code (that doesn't involve explicit casts) which can interpret one type of ID as the other.

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, knz (kena) wrote…

We are interpreting a NodeIDContainer as a sql ID, but we expressly disallow it from storing a sqlInstance

You're misreading the assertion: it's there to ensure we're not redundantly call NewSQLIDContainerForNode, then convert the resulting *SQLIDContainer to *NodeIDContainer, then call NewSQLIDContainerForNode again. This is an API sequencing assertion.

I (as a user of this API) shouldn't be able to ever legally hold a SQLIDContainer that has a node ID.

That sounds right, why do you think you would be able to?

I shouldn't be able to alias it to a NodeIDContainer that I can later set to a node ID (and not a sql instance ID).

That's already enforced, by the check n.sqlInstance == sqlInstance in Set().

It's perhaps hard to understand in the context of this PR, but it might be clarified in the next PR that uses it: #73159

I could create a NodeIDContainer, populate it with a node ID, then call this method. Now I have a SQLIDContainer which holds a node ID (or am I missing something?). Maybe we could assert that nodeID contains the zero value?

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/base/node_id.go, line 86 at r1 (raw file):

Previously, knz (kena) wrote…

It's not racy because the value of n.sqlInstance is not written concurrently with the read accesses.

One might see the nodeID value before the bool flag is set

No. The determination of .sqlInstance is done before the IDs get populated.

(I could add an additional bool to the struct to prove this fact with an assertion, if that would assuage you? But don't you trust the race detector, which is not failing here?)

I see, I will look again. Do you think it would make things more clear to change the bool to a tri-state (uninitialized, node, sql instance) and only allow state transitions from uninitialized?

Copy link
Copy Markdown
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @RaduBerinde)


pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, RaduBerinde wrote…

Then this is the part I strongly disagree with. That allows anybody to easily add code that uses the NodeID as a.. node ID and not a "whatever" ID. We are converting between different go types here. There should be a separate type for the code that doesn't care (eg ServerID) from which it wouldn't be legal to get a node or sql instance ID. There wouldn't be duplication, though we'd need to change the type used in various parts of the code.

That all doesn't need to be done now, but we need to first agree on the "vision" for the desired end state, and then we need to mark code that isn't "legal" in that end state as temporary. My strong opinion is that the desired end state is where 1) there is a single piece of "setup" code (which isn't part of the infrastructure like id containers) which knows that for the system tenant the SQL instance ID is equal to the node ID and 2) it is never possible to write code (that doesn't involve explicit casts) which can interpret one type of ID as the other.

+1 to Radu's comment. I think it is important to keep the semantic differences between a node and a SQL instance ID very obvious with the unified API. It would be nice to have something along the lines of GetNodeID() to explicitly fetch node id which will throw if the container is a sql id container and vice versa. And for code that doesn't care either which way, we can have a GetID or something similar to make it very obvious that the caller for this method should be agnostic to the container type.


pkg/base/node_id.go, line 30 at r2 (raw file):

// and getting the value. Once a value is set, the value cannot
// change.
type NodeIDContainer struct {

Nit: It will be nice to rename it at some point to IDContainer instead of NodeIDContainer since we will be using it to store either SQL instance ID or node ID. Also, nodeID below can be renamed to just id.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right in light of the new RFC in #73309, let's accept that this PR is an intermediate state of affairs. I have added references to the RFC as comments in the code that needs to be changed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)


pkg/base/node_id.go, line 65 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

+1 to Radu's comment. I think it is important to keep the semantic differences between a node and a SQL instance ID very obvious with the unified API. It would be nice to have something along the lines of GetNodeID() to explicitly fetch node id which will throw if the container is a sql id container and vice versa. And for code that doesn't care either which way, we can have a GetID or something similar to make it very obvious that the caller for this method should be agnostic to the container type.

Yes we agreed on this offline. Adding a reference to #73309 .


pkg/base/node_id.go, line 86 at r1 (raw file):

Previously, RaduBerinde wrote…

I see, I will look again. Do you think it would make things more clear to change the bool to a tri-state (uninitialized, node, sql instance) and only allow state transitions from uninitialized?

I could do this, but I'd rather use the encoding you proposed in #73309. Let's do so in a separate PR.


pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, RaduBerinde wrote…

I could create a NodeIDContainer, populate it with a node ID, then call this method. Now I have a SQLIDContainer which holds a node ID (or am I missing something?). Maybe we could assert that nodeID contains the zero value?

Good idea. Done.


pkg/base/node_id.go, line 30 at r2 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Nit: It will be nice to rename it at some point to IDContainer instead of NodeIDContainer since we will be using it to store either SQL instance ID or node ID. Also, nodeID below can be renamed to just id.

Added a reference to #73309 . We'll do this renaming in htat context.

@knz knz force-pushed the 20211125-instance-node-id branch from 1706db5 to fa43497 Compare December 8, 2021 12:52
Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)


pkg/base/node_id.go, line 193 at r1 (raw file):

Previously, knz (kena) wrote…

Good idea. Done.

Turns out we can't do that - we have multiple tests that manually inject a non-zero value before server initialization.

This commit ensures that the same data structure can store both SQL
instance IDs and Node IDs. The decision is made upon instantiation of
the container which of the two types of IDs the container stores.

This simplifies the code and paves the road to using a single
container in servers for the purpose of identifying the server
instance in traces and logs.

This also revisits the change in  cockroachdb#72607 by switching the
tracing/logging prefix `sqli` to `nsql` for SQL instance servers.

Release note: None
@knz knz force-pushed the 20211125-instance-node-id branch from fa43497 to 6d6a241 Compare December 8, 2021 13:12
@craig craig bot merged commit 6d6a241 into cockroachdb:master Dec 8, 2021
@knz knz deleted the 20211125-instance-node-id branch December 8, 2021 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants