Skip to content

spanner: Deadlock if iterator.Stop() is called after client.Close() #6590

@jonathanrhodes

Description

@jonathanrhodes

Client

Cloud Spanner

Environment

MacOS 11.6.8

Go Environment

$ go version
go version go1.18.1 darwin/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jonathanrhodes/Library/Caches/go-build"
GOENV="/Users/jonathanrhodes/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/jonathanrhodes/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/jonathanrhodes/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.18.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.18.1/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/8p/tkgykgfd3s32wzhkb84q4zww0000gp/T/go-build4267546782=/tmp/go-build -gno-record-gcc-switches -fno-common"

Code

e.g.

package main

import (
	"context"
	"fmt"

	"cloud.google.com/go/spanner"
)

func main() {
	ctx := context.Background()

	client, err := spanner.NewClient(ctx, project)
	if err != nil {
		panic(err)
	}

	iter := client.Single().Query(ctx, spanner.NewStatement("SELECT * FROM Table"))
	client.Close()

	_, err = iter.Next()
	fmt.Println("iterator error:", err) // grpc: the client connection is closing

	iter.Stop() // <-- deadlock
}

Expected behavior

Execution completes.

Actual behavior

Deadlocks on the iter.Stop() call.

Additional context

This worked fine in the v1.34.1 release, but triggers in the v1.35.0 release, probably from this commit?

In testing, I see the following stack:

goroutine 137 [semacquire, 9 minutes]:
sync.runtime_SemacquireMutex(0x268e1c0?, 0xc0?, 0x0?)
	/usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*Mutex).lockSlow(0xc000193a00)
	/usr/local/go/src/sync/mutex.go:162 +0x165
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:81
cloud.google.com/go/spanner.(*sessionPool).remove(0xc000193a00, 0x268e1c0?, 0x0?)
	.../cloud.google.com/go/spanner/session.go:1112 +0x74
cloud.google.com/go/spanner.(*session).destroyWithContext(0xc0011280b0, {0x19bed98, 0xc0004f1ec0}, 0x34?)
	.../cloud.google.com/go/spanner/session.go:341 +0x37
cloud.google.com/go/spanner.(*session).destroy(0xc000193a00?, 0xb0?)
	.../cloud.google.com/go/spanner/session.go:336 +0x76
cloud.google.com/go/spanner.(*session).recycle(0xc0011280b0)
	.../cloud.google.com/go/spanner/session.go:326 +0xa9
cloud.google.com/go/spanner.(*sessionHandle).recycle(0x1998e20?)
	.../cloud.google.com/go/spanner/session.go:81 +0x10f
cloud.google.com/go/spanner.(*ReadOnlyTransaction).release(0xc0008ba8c0, {0x1998e20, 0xc000d33940})
	.../cloud.google.com/go/spanner/transaction.go:787 +0xc6
cloud.google.com/go/spanner.(*RowIterator).Stop(0xc00112c080)

If I'm reading this correctly, it appears that the pool mutex is being acquired in session.recycle(), then the code attempts (but fails) to reacquire the mutex in sessionPool.remove().

Note: I'm not actually sure it is valid to call iterator.Stop() after client.Close(). We were just following the defer iter.Stop() pattern from the documentation, but maybe we need to take care not to close the client first?

RowIterator also follows the standard pattern for the Google Cloud Client Libraries:

defer iter.Stop()
for {
    row, err := iter.Next()
    if err == iterator.Done {
        break
    }
    if err != nil {
        // TODO: Handle error.
    }
    // TODO: use row
}

Always call Stop when you finish using an iterator this way, whether or not you iterate to the end. (Failing to call Stop could lead you to exhaust the database's session quota.)

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the Spanner API.priority: p2Moderately-important priority. Fix may not be included in next release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions