Skip to content

backupccl: nil pointer crash in storage.(*Writer).open during backup in 22.2.9 #103597

@renatolabs

Description

@renatolabs

During a roachtest (#103228), two nodes crashed while a backup was taken, both due to a panic within the GCS library:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x4127782]

goroutine 309871 [running]:
panic({0x4bb3320, 0x8fb3dd0})
        GOROOT/src/runtime/panic.go:987 +0x3ba fp=0xc014aeddf8 sp=0xc014aedd38 pc=0x48d77a
runtime.panicmem(...)
        GOROOT/src/runtime/panic.go:260
runtime.sigpanic()
        GOROOT/src/runtime/signal_unix.go:835 +0x2f6 fp=0xc014aede48 sp=0xc014aeddf8 pc=0x4a4636
cloud.google.com/go/storage.(*Writer).open.func1()
        cloud.google.com/go/storage/external/com_google_cloud_go_storage/writer.go:162 +0x1a2 fp=0xc014aedfe0 sp=0xc014aede48 pc=0x4127782
runtime.goexit()
        GOROOT/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc014aedfe8 sp=0xc014aedfe0 pc=0x4c2401
created by cloud.google.com/go/storage.(*Writer).open
        cloud.google.com/go/storage/external/com_google_cloud_go_storage/writer.go:152 +0x495

Every node in the 4-node cluster was running 22.2.9.

Stack traces for the two nodes that crashed (n3 and n4) are attached below. Note that a very similar crash had been reported before [1], and deemed fixed by [2]. However, the issue doesn't seem to be completely solved.

Reproduction

Running the backup-restore/mixed-version roachtest in #103228 with seed -4303022106448172299 seems to reproduce this with high probability (~1h30m after test start).

n3_stacks.txt
n4_stacks.txt

Roachtest artifacts: https://console.cloud.google.com/storage/browser/cockroach-tmp/103597/roachtest_artifacts;tab=objects?project=cockroach-shared&prefix=&forceOnObjectsSortingFiltering=false&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))

[1] googleapis/google-cloud-go#4167
[2] #65660

Jira issue: CRDB-28094

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions