Skip to content

For discussion: rework error plumbing when proto.Call is involved #1403

@tamird

Description

@tamird

We currently have an inconsistent error handling story where proto.Response is concerned (usually via proto.Call). Some grep of all the functions which handle these arguments (excluding tests) is below. Note that many of these have an explicit error return.

$ git grep -E 'proto\.(Call|Response)' | grep func | grep -v _test
client/batch.go:func (b *Batch) InternalAddCall(call proto.Call) {
client/db.go:func (db *DB) send(calls ...proto.Call) (err error) {
client/http_sender.go:func (s *httpSender) Send(_ context.Context, call proto.Call) {
client/http_sender.go:func (s *httpSender) post(call proto.Call) error {
client/rpc/rpc_sender.go:func (s *Sender) Send(_ context.Context, call proto.Call) {
client/sender.go:type SenderFunc func(context.Context, proto.Call)
client/sender.go:func (f SenderFunc) Send(ctx context.Context, c proto.Call) {
client/txn.go:func (ts *txnSender) Send(ctx context.Context, call proto.Call) {
client/txn.go:func (txn *Txn) send(calls ...proto.Call) error {
client/txn.go:func (txn *Txn) updateState(calls []proto.Call) {
kv/db.go:func createArgsAndReply(method string) (proto.Request, proto.Response) {
kv/db.go:func (s *rpcDBServer) executeCmd(args proto.Request, reply proto.Response) error {
kv/dist_sender.go:func (ds *DistSender) sendAttempt(desc *proto.RangeDescriptor, call proto.Call) (retry.Status, error) {
kv/dist_sender.go:func (ds *DistSender) getDescriptors(call proto.Call) (*proto.RangeDescriptor, *proto.RangeDescriptor, error) {
kv/dist_sender.go:func (ds *DistSender) Send(_ context.Context, call proto.Call) {
kv/local_sender.go:func (ls *LocalSender) Send(ctx context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) Send(_ context.Context, call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) sendOne(call proto.Call) {
kv/txn_coord_sender.go:func (tc *TxnCoordSender) updateResponseTxn(argsHeader *proto.RequestHeader, replyHeader *proto.ResponseHeader) {
server/node.go:func (n *nodeServer) executeCmd(args proto.Request, reply proto.Response) error {
server/status/feed.go:func (nef NodeEventFeed) CallComplete(args proto.Request, reply proto.Response) {
storage/range.go:var TestingCommandFilter func(proto.Request, proto.Response) bool
storage/range.go:func (r *Range) AddCmd(ctx context.Context, call proto.Call, wait bool) error {
storage/range.go:func (r *Range) addAdminCmd(ctx context.Context, args proto.Request, reply proto.Response) error {
storage/range.go:func (r *Range) addReadOnlyCmd(ctx context.Context, args proto.Request, reply proto.Response) error {
storage/range.go:func (r *Range) addWriteCmd(ctx context.Context, args proto.Request, reply proto.Response, wait bool) error {
storage/range.go:func (r *Range) proposeRaftCommand(ctx context.Context, args proto.Request, reply proto.Response) (<-chan error, *pendingCmd) {
storage/range.go:func (r *Range) applyRaftCommand(ctx context.Context, index uint64, originNodeID proto.RaftNodeID, args proto.Request, reply proto.Response) error {
storage/range_command.go:func (r *Range) executeCmd(batch engine.Engine, ms *proto.MVCCStats, args proto.Request, reply proto.Response) ([]proto.Intent, error) {
storage/response_cache.go:func (rc *ResponseCache) GetResponse(cmdID proto.ClientCmdID, reply proto.Response) (bool, error) {
storage/response_cache.go:func (rc *ResponseCache) PutResponse(cmdID proto.ClientCmdID, reply proto.Response) error {
storage/response_cache.go:func (rc *ResponseCache) shouldCacheResponse(reply proto.Response) bool {
storage/store.go:func (s *Store) ExecuteCmd(ctx context.Context, call proto.Call) error {

Some methods return the error outright, some set it on the proto.Response directly, and some do a mix of the two. We should resolve this inconsistency because it leads to confusing code and a very high probability of subtle bugs. There are two approaches we could take:

standardize on ResponseHeader.SetGoError()

This requires that we remove error returns from all methods that operate on proto.Responses. In addition, it will mean that we will need to remember to check .GoError() each time we call such a method. This is fragile, and we will have no tool support.

standardize on error returns

This requires that we create a new structure to be obtained by unwrapping the proto.Response which will not be able to carry an error. All methods which currently operate on proto.Response (with the exception of those that live at process boundaries) will need to change to return explicit errors rather than attaching errors directly to their proto.Response argument.

This option is my preference. It provides for tooling support via errcheck and is more idiomatic. That being said, the floor is open. Please 🚲 🏡 !

Metadata

Metadata

Assignees

Labels

C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions