Skip to content

stop using protos as the expected value in CPut #38308

@danhhz

Description

@danhhz

#38147 and #38004 each introduced a regression that broke RangeDescriptor updates (replica changes, splits, merges) after a cluster was upgraded from a prior version because of a bad interaction between CPut and protos.

Both bugs where the same general idea: A non-nullable field was added to RangeDescriptor (or ReplicaDescriptor, which is stored on RangeDescriptor). If a RangeDescriptor was written by an old version of cockroach, it wouldn't have the new field. Whenever we update RangeDescriptor, we do it with CPut to catch races. So a new binary would read the old serialized descriptor from kv (which doesn't have the new field), parse it into a RangeDescriptor, compute the update, then try to CPut the new RangeDescriptor. Proto2 includes all fields in the serialization, including ones with the default value, so it would include the new field. This wouldn't match what was in kv and the CPut would return a ConditionFailedError.

We could work around these two bugs by adding the fields as nullable, but that has an allocation cost. Worse though, the proto spec is pretty explicit about not relying on serialization of a given message to always produce exactly the same bytes. From https://developers.google.com/protocol-buffers/docs/encoding#implications

    Do not assume the byte output of a serialized message is stable.

Instead, what we should do is fetch the raw roachpb.Value from kv for the old RangeDescriptor, parse it into a RangeDescriptor and use that to compute the update, but also pass along the actual roachpb.Value we got. Then the CPut will use the roachpb.Value, which is guaranteed to match. If we need to do any equality of protos, we should be doing it using the Equal method on the parsed versions.

Further, there are more places where we CPut using a parsed proto message as the expected value, which are also footguns waiting to happen. We'll fix all of them and prevent future occurrences by changing the signature of CPut to accept a *roachpb.Value instead of interface{} for the expected value.

Metadata

Metadata

Assignees

Labels

A-kv-clientRelating to the KV client and the KV interface.C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions