-
Notifications
You must be signed in to change notification settings - Fork 1.5k
spanner: Deadlock if iterator.Stop() is called after client.Close() #6590
Description
Client
Cloud Spanner
Environment
MacOS 11.6.8
Go Environment
$ go version
go version go1.18.1 darwin/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jonathanrhodes/Library/Caches/go-build"
GOENV="/Users/jonathanrhodes/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/jonathanrhodes/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/jonathanrhodes/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.18.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.18.1/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/8p/tkgykgfd3s32wzhkb84q4zww0000gp/T/go-build4267546782=/tmp/go-build -gno-record-gcc-switches -fno-common"
Code
e.g.
package main
import (
"context"
"fmt"
"cloud.google.com/go/spanner"
)
func main() {
ctx := context.Background()
client, err := spanner.NewClient(ctx, project)
if err != nil {
panic(err)
}
iter := client.Single().Query(ctx, spanner.NewStatement("SELECT * FROM Table"))
client.Close()
_, err = iter.Next()
fmt.Println("iterator error:", err) // grpc: the client connection is closing
iter.Stop() // <-- deadlock
}Expected behavior
Execution completes.
Actual behavior
Deadlocks on the iter.Stop() call.
Additional context
This worked fine in the v1.34.1 release, but triggers in the v1.35.0 release, probably from this commit?
In testing, I see the following stack:
goroutine 137 [semacquire, 9 minutes]:
sync.runtime_SemacquireMutex(0x268e1c0?, 0xc0?, 0x0?)
/usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc000193a00)
/usr/local/go/src/sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:81
cloud.google.com/go/spanner.(*sessionPool).remove(0xc000193a00, 0x268e1c0?, 0x0?)
.../cloud.google.com/go/spanner/session.go:1112 +0x74
cloud.google.com/go/spanner.(*session).destroyWithContext(0xc0011280b0, {0x19bed98, 0xc0004f1ec0}, 0x34?)
.../cloud.google.com/go/spanner/session.go:341 +0x37
cloud.google.com/go/spanner.(*session).destroy(0xc000193a00?, 0xb0?)
.../cloud.google.com/go/spanner/session.go:336 +0x76
cloud.google.com/go/spanner.(*session).recycle(0xc0011280b0)
.../cloud.google.com/go/spanner/session.go:326 +0xa9
cloud.google.com/go/spanner.(*sessionHandle).recycle(0x1998e20?)
.../cloud.google.com/go/spanner/session.go:81 +0x10f
cloud.google.com/go/spanner.(*ReadOnlyTransaction).release(0xc0008ba8c0, {0x1998e20, 0xc000d33940})
.../cloud.google.com/go/spanner/transaction.go:787 +0xc6
cloud.google.com/go/spanner.(*RowIterator).Stop(0xc00112c080)
If I'm reading this correctly, it appears that the pool mutex is being acquired in session.recycle(), then the code attempts (but fails) to reacquire the mutex in sessionPool.remove().
Note: I'm not actually sure it is valid to call iterator.Stop() after client.Close(). We were just following the defer iter.Stop() pattern from the documentation, but maybe we need to take care not to close the client first?
RowIterator also follows the standard pattern for the Google Cloud Client Libraries:
defer iter.Stop()
for {
row, err := iter.Next()
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
// TODO: use row
}
Always call Stop when you finish using an iterator this way, whether or not you iterate to the end. (Failing to call Stop could lead you to exhaust the database's session quota.)