Skip to content

bulk: import jobs often fail with server sent GOAWAY and closed the connection #65926

@adityamaru

Description

@adityamaru

Time and again we have seen our roachtests fail with:
gs://cockroach-fixtures/tpce-csv/customers=2000000/746/NewsItem.txt?AUTH=implicit: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug="server_shutting_down"

While this is an infra flake and the only solution is to retry the import, maybe we should be retrying internally so as to not fail the job. This retry could either be at the job resumer level or could be marked as a retriable error in our external storage resuming reader implementations. Either way, the focus of this issue should be to find what error type is bubbled up in such scenarios so that we can intercept and consider it retriable.

Epic: CRDB-2556

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions