Skip to content

server,storage: apparent deadlock in storage._Cfunc_DBDumpThreadStacks #64079

@knz

Description

@knz

As part of #59863, we want to retrieve data from multiple nodes concurrently.

The main unit test in that PR runs multiple nodes in-memory, and then issues the RPCs concurrently across the multiple nodes.

Currently, the test fails when retrieving the C++ thread stacks, via the Stacks() status RPC.

The error appears as a timeout in the DBDumpThreadStacks function on the CI agents when running the test under stress, at the beginning of the CI run.

If I let the test ignore the timeout, then the test still fails with a leaked goroutine:

zip_test.go:193: Leaked goroutine: goroutine 1667 [semacquire]:
sync.runtime_Semacquire(0xc0033f33c8)
  /usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc0033f33c0)
  /usr/local/go/src/sync/waitgroup.go:130 +0x65
google.golang.org/grpc.(*Server).serveStreams(0xc0002388c0, 0x55b27c0, 0xc002c32180)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:853 +0x10c
google.golang.org/grpc.(*Server).handleRawConn.func1(0xc0002388c0, 0x55b27c0, 0xc002c32180)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:786 +0x3f
created by google.golang.org/grpc.(*Server).handleRawConn
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:785 +0x52b
Leaked goroutine: goroutine 5807 [syscall]:
github.com/cockroachdb/cockroach/pkg/storage._Cfunc_DBDumpThreadStacks(0x0, 0x0)
  _cgo_gotypes.go:68 +0x49
github.com/cockroachdb/cockroach/pkg/storage.ThreadStacks(0xc0015069a0, 0x74bd170)
  /go/src/github.com/cockroachdb/cockroach/pkg/storage/stacks.go:44 +0x25
github.com/cockroachdb/cockroach/pkg/server.(*statusServer).Stacks(0xc0015069a0, 0x554c2a0, 0xc004634e40, 0xc005618840, 0xc0015069a0, 0x0, 0x0)
  /go/src/github.com/cockroachdb/cockroach/pkg/server/status.go:1169 +0x215
github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_Stacks_Handler.func1(0x554c2a0, 0xc004634db0, 0x4626280, 0xc005618840, 0x0, 0x0, 0x1, 0xc0013e0dd0)
  /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:5311 +0x8b
github.com/cockroachdb/cockroach/pkg/util/tracing.ServerInterceptor.func1(0x554c2a0, 0xc004634db0, 0x4626280, 0xc005618840, 0xc005618860, 0xc005618880, 0x0, 0x0, 0x0, 0x0)
  /go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/grpc_interceptor.go:126 +0x4a9
google.golang.org/grpc.getChainUnaryHandler.func1(0x554c2a0, 0xc004634db0, 0x4626280, 0xc005618840, 0xc00043bac8, 0xb7e308, 0x44ded60, 0xc001a41f40)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1019 +0xe7
github.com/cockroachdb/cockroach/pkg/rpc.NewServer.func1(0x554c2a0, 0xc004634db0, 0x4626280, 0xc005618840, 0xc005618860, 0xc001a41f40, 0xc001a41f40, 0x20, 0x42569e0, 0x1)
  /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:173 +0xa8
google.golang.org/grpc.chainUnaryServerInterceptors.func1(0x554c2a0, 0xc004634db0, 0x4626280, 0xc005618840, 0xc005618860, 0xc005618880, 0xc00043bba0, 0x5403a6, 0x451efa0, 0xc004634db0)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1005 +0xd0
github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_Stacks_Handler(0x47a27e0, 0xc0015069a0, 0x554c2a0, 0xc004634db0, 0xc0045b9440, 0xc001563d00, 0x554c2a0, 0xc004634db0, 0xc005dd5800, 0x5)
  /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:5313 +0x150
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002388c0, 0x55b27c0, 0xc002c32180, 0xc002204c00, 0xc00068fb90, 0x7445038, 0x0, 0x0, 0x0)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1180 +0x522
google.golang.org/grpc.(*Server).handleStream(0xc0002388c0, 0x55b27c0, 0xc002c32180, 0xc002204c00, 0x0)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1503 +0xd05
google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0033f33c0, 0xc0002388c0, 0x55b27c0, 0xc002c32180, 0xc002204c00)
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:843 +0xa5
created by google.golang.org/grpc.(*Server).serveStreams.func1
  /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:841 +0x1fd

FWIW, I am unable to reproduce this failure with make stress on my local machine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-storageRelating to our storage engine (Pebble) on-disk storage.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions