-
Notifications
You must be signed in to change notification settings - Fork 4.1k
bulkio: error from ingestKvs causes deadlock #49989
Description
Running with #49979, I'm still seeing the stall that the change was hoping to fix.
Here's a dump of the goroutines: https://gist.github.com/nvanbenschoten/3cffff71d9847e09e366bfaeed0f15b4.
Notably, we still have a collection of importWorkers all stuck in (*DatumRowConverter).SendBatch. However, we have no one pulling from this channel. After reading through this code a bit, I believe this channel is supposed to be consumed by this goroutine:
cockroach/pkg/ccl/importccl/read_import_base.go
Lines 90 to 104 in 0dd70ca
| group.GoCtx(func(ctx context.Context) error { | |
| summary, err = ingestKvs(ctx, flowCtx, spec, progCh, kvCh) | |
| if err != nil { | |
| return err | |
| } | |
| var prog execinfrapb.RemoteProducerMetadata_BulkProcessorProgress | |
| prog.ResumePos = make(map[int32]int64) | |
| prog.CompletedFraction = make(map[int32]float32) | |
| for i := range spec.Uri { | |
| prog.CompletedFraction[i] = 1.0 | |
| prog.ResumePos[i] = math.MaxInt64 | |
| } | |
| progCh <- prog | |
| return nil | |
| }) |
That goroutine is not in the stacktrace. runImport is pretty suspicious. It links together two ctxgroup instances here:
cockroach/pkg/ccl/importccl/read_import_base.go
Lines 80 to 85 in 0dd70ca
| // This group links together the producers (via producerGroup) and the KV ingester. | |
| group := ctxgroup.WithContext(ctx) | |
| group.Go(func() error { | |
| defer close(kvCh) | |
| return producerGroup.Wait() | |
| }) |
However, it only seems to care about producerGroup hitting an error before group (consumerGroup), not the other way around. If ingestKvs throws an error, I don't see anything that would stop this from deadlocking.