-
Notifications
You must be signed in to change notification settings - Fork 4.1k
backupccl: nil pointer crash in storage.(*Writer).open during backup in 22.2.9 #103597
Copy link
Copy link
Closed
Closed
Copy link
Labels
A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery
Description
During a roachtest (#103228), two nodes crashed while a backup was taken, both due to a panic within the GCS library:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x4127782]
goroutine 309871 [running]:
panic({0x4bb3320, 0x8fb3dd0})
GOROOT/src/runtime/panic.go:987 +0x3ba fp=0xc014aeddf8 sp=0xc014aedd38 pc=0x48d77a
runtime.panicmem(...)
GOROOT/src/runtime/panic.go:260
runtime.sigpanic()
GOROOT/src/runtime/signal_unix.go:835 +0x2f6 fp=0xc014aede48 sp=0xc014aeddf8 pc=0x4a4636
cloud.google.com/go/storage.(*Writer).open.func1()
cloud.google.com/go/storage/external/com_google_cloud_go_storage/writer.go:162 +0x1a2 fp=0xc014aedfe0 sp=0xc014aede48 pc=0x4127782
runtime.goexit()
GOROOT/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc014aedfe8 sp=0xc014aedfe0 pc=0x4c2401
created by cloud.google.com/go/storage.(*Writer).open
cloud.google.com/go/storage/external/com_google_cloud_go_storage/writer.go:152 +0x495
Every node in the 4-node cluster was running 22.2.9.
Stack traces for the two nodes that crashed (n3 and n4) are attached below. Note that a very similar crash had been reported before [1], and deemed fixed by [2]. However, the issue doesn't seem to be completely solved.
Reproduction
Running the backup-restore/mixed-version roachtest in #103228 with seed -4303022106448172299 seems to reproduce this with high probability (~1h30m after test start).
[1] googleapis/google-cloud-go#4167
[2] #65660
Jira issue: CRDB-28094
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery