-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: use non-cancelable contexts in raft application #75656
Description
Describe the problem
As of d064059, raft application on the proposal happens under the proposer's context, which may well be canceled before or mid-way through the application. However, raft application should not care about the cancellation, and we just diagnosed a bug where it did.
Also, we had problems with cancellation on the raft goroutine in the past and check against it, but evidently not well enough yet.
Expected behavior
The raft apply loop should use a ctx that is not cancelable. It makes sense to derive certain parts from the proposer - such as a trace span - but not the cancellation policy. That is, in
cockroach/pkg/kv/kvserver/replica_application_decoder.go
Lines 139 to 171 in 852a80c
| // createTracingSpans creates and assigns a new tracing span for each decoded | |
| // command. If a command was proposed locally, it will be given a tracing span | |
| // that follows from its proposal's span. | |
| func (d *replicaDecoder) createTracingSpans(ctx context.Context) { | |
| const opName = "raft application" | |
| var it replicatedCmdBufSlice | |
| for it.init(&d.cmdBuf); it.Valid(); it.Next() { | |
| cmd := it.cur() | |
| if cmd.IsLocal() { | |
| cmd.ctx, cmd.sp = tracing.ChildSpan(cmd.proposal.ctx, opName) | |
| } else if cmd.raftCmd.TraceData != nil { | |
| // The proposal isn't local, and trace data is available. Extract | |
| // the remote span and start a server-side span that follows from it. | |
| spanMeta, err := d.r.AmbientContext.Tracer.ExtractMetaFrom(tracing.MapCarrier{ | |
| Map: cmd.raftCmd.TraceData, | |
| }) | |
| if err != nil { | |
| log.Errorf(ctx, "unable to extract trace data from raft command: %s", err) | |
| } else { | |
| cmd.ctx, cmd.sp = d.r.AmbientContext.Tracer.StartSpanCtx( | |
| ctx, | |
| opName, | |
| // NB: Nobody is collecting the recording of this span; we have no | |
| // mechanism for it. | |
| tracing.WithRemoteParent(spanMeta), | |
| tracing.WithFollowsFrom(), | |
| ) | |
| } | |
| } else { | |
| cmd.ctx, cmd.sp = tracing.ChildSpan(ctx, opName) | |
| } | |
| } | |
| } |
we should not use a child context of cmd.proposal.ctx but create one derived from ctx (the worker ctx) or context.Background().
Additional data / screenshots
Should also write a test that uses a TestingPostApplyHook to ensure that the ctx is not cancelable, or at least that the cancellation doesn't carry over to the child. (Or more simply, that a value stashed in the proposer context is not visible at apply time).
@erikgrinaker didn't we put in some invariant check about this sometime recently?
Jira issue: CRDB-12769