Skip to content

spanner: nil pointer dereference in ReadWriteTransaction.rollback when session handle is nil #14025

@will-arxed

Description

@will-arxed

Client

Spanner

Environment

Linux (Debian-based, amd64) on GCE
go version go1.25.0
cloud.google.com/go/spanner v1.88.0
github.com/googleapis/go-sql-spanner v1.24.0

Code and Dependencies

This panic seems to occur when a transaction commit is aborted by Spanner (due to a conflict, for example) and the retry fails because the context passed into the transaction has been cancelled.

Expected behavior

When a transaction commit is aborted and the retry fails, an error should be returned.

Actual behavior

cloud.google.com/go/spanner panics as such:

  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x103318a]

  goroutine 3024 [running]:
  sync/atomic.(*Int32).Add(...)
      /usr/local/go/src/sync/atomic/type.go:94
  sync.(*RWMutex).RLock(...)
      /usr/local/go/src/sync/rwmutex.go:72
  cloud.google.com/go/spanner.(*sessionHandle).getID(0x10468d5?)
      .../cloud.google.com/go/spanner@v1.88.0/session.go:108 +0x2a
  cloud.google.com/go/spanner.(*ReadWriteTransaction).rollback(0x1c04b179f340, {0x1f61a58, 0x3351180})
      .../cloud.google.com/go/spanner@v1.88.0/transaction.go:1940 +0xbf
  cloud.google.com/go/spanner.(*ReadWriteStmtBasedTransaction).Rollback(0x1c04b179f340, {0x1f61a58?, 0x3351180?})
      .../cloud.google.com/go/spanner@v1.88.0/transaction.go:2156 +0x25
  github.com/googleapis/go-sql-spanner.(*readWriteTransaction).rollback(0x1c04cd640380)
      .../github.com/googleapis/go-sql-spanner@v1.24.0/transaction.go:609 +0x32
  github.com/googleapis/go-sql-spanner.(*readWriteTransaction).Rollback(0x1c04cd640380)
      .../github.com/googleapis/go-sql-spanner@v1.24.0/transaction.go:602 +0xc5
  github.com/googleapis/go-sql-spanner.(*delegatingTransaction).Rollback(0x0?)
      .../github.com/googleapis/go-sql-spanner@v1.24.0/transaction.go:159 +0x22
  github.com/googleapis/go-sql-spanner.(*conn).Rollback(0x0?, {0x0?, 0x0?})
      .../github.com/googleapis/go-sql-spanner@v1.24.0/conn.go:1757 +0x1c
  github.com/googleapis/go-sql-spanner.(*conn).Close(0x1c04cd604680)
      .../github.com/googleapis/go-sql-spanner@v1.24.0/conn.go:1368 +0x2d
  database/sql.(*driverConn).finalClose.func2()
      /usr/local/go/src/database/sql/sql.go:695 +0x32
  database/sql.withLock({0x1f584b0, 0x1c04cd640280}, 0x1c04cc3cb950)
      /usr/local/go/src/database/sql/sql.go:3572 +0x71
  database/sql.(*driverConn).finalClose(0x1c04cd640280)
      /usr/local/go/src/database/sql/sql.go:693 +0xe5
  database/sql.(*driverConn).Close(0x1c04cd640280)
      /usr/local/go/src/database/sql/sql.go:674 +0x138
  database/sql.(*DB).putConn(0x1c04a1fa21a0, 0x1c04cd640280, {0x0, 0x0}, 0x1)
      /usr/local/go/src/database/sql/sql.go:1518 +0x333
  database/sql.(*driverConn).releaseConn(...)
      /usr/local/go/src/database/sql/sql.go:578
  database/sql.(*Conn).close(0x1c04cc3cbaa8?, {0x0, 0x0})
      /usr/local/go/src/database/sql/sql.go:2141 +0xc5
  database/sql.(*Conn).Close(...)
      /usr/local/go/src/database/sql/sql.go:2153
  github.com/googleapis/go-sql-spanner.runTransactionWithOptions.func1()
      .../github.com/googleapis/go-sql-spanner@v1.24.0/driver.go:1122 +0x1c
  github.com/googleapis/go-sql-spanner.runTransactionWithOptions(...)
      .../github.com/googleapis/go-sql-spanner@v1.24.0/driver.go:1194 +0x42c

Additional context

It seems like the issue is that t.sh can be nil in the following check:

sid, client := t.sh.getID(), t.sh.getClient()

The following is an AI-generated description of the issue and a proposed fix, if it's helpful:

  Root cause

  ReadWriteTransaction.rollback at https://github.com/googleapis/google-cloud-go/blob/main/spanner/transaction.go#L1940 dereferences t.sh without a nil check:

  func (t *ReadWriteTransaction) rollback(ctx context.Context) {
        t.mu.Lock()
        t.state = txClosed
        if t.tx == nil {
                t.mu.Unlock()
                return
        }
        t.mu.Unlock()
        // In case that sessionHandle was destroyed but transaction body fails to
        // report it.
        sid, client := t.sh.getID(), t.sh.getClient() // panics when t.sh is nil
        // ...
  }

  The comment acknowledges the session handle can be destroyed, but the code only checks the values returned by t.sh.getID() and t.sh.getClient() — it doesn't
  check whether t.sh itself is nil.

  How t.sh becomes nil

  ReadWriteStmtBasedTransaction.CommitWithReturnResp unconditionally recycles and nils the session handle, including for Aborted errors:

  // transaction.go:2148-2153
  if t.sh != nil {
        t.sh.recycle()
        t.sh = nil
  }
  return resp, err

  When the go-sql-spanner driver retries an aborted transaction via resetTransactionForRetry, it calls ReadWriteStmtBasedTransaction.ResetForRetry, which passes
  the now-nil t.sh to newReadWriteStmtBasedTransactionWithSessionHandle. That function tries to acquire a new session, but if the context has been cancelled (e.g.
  by errgroup), session acquisition fails and ResetForRetry returns an error. The old ReadWriteStmtBasedTransaction (with sh == nil) remains referenced.

  During cleanup, the deferred conn.Close() in runTransactionWithOptions sees the connection is still in a transaction (because resetTransactionForRetry restored
  c.tx before the retry failed) and calls Rollback, which reaches ReadWriteTransaction.rollback with a nil t.sh.

  Suggested fix

  Add a nil check for t.sh in ReadWriteTransaction.rollback:

  func (t *ReadWriteTransaction) rollback(ctx context.Context) {
        t.mu.Lock()
        t.state = txClosed
        if t.tx == nil {
                t.mu.Unlock()
                return
        }
        t.mu.Unlock()
  +     if t.sh == nil {
  +             return
  +     }
        sid, client := t.sh.getID(), t.sh.getClient()
        if sid == "" || client == nil {
                return
        }
        // ...
  }

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the Spanner API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions