-
Notifications
You must be signed in to change notification settings - Fork 4.1k
bulk: import jobs often fail with server sent GOAWAY and closed the connection #65926
Copy link
Copy link
Closed
Labels
A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery
Description
Time and again we have seen our roachtests fail with:
gs://cockroach-fixtures/tpce-csv/customers=2000000/746/NewsItem.txt?AUTH=implicit: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug="server_shutting_down"
While this is an infra flake and the only solution is to retry the import, maybe we should be retrying internally so as to not fail the job. This retry could either be at the job resumer level or could be marked as a retriable error in our external storage resuming reader implementations. Either way, the focus of this issue should be to find what error type is bubbled up in such scenarios so that we can intercept and consider it retriable.
Epic: CRDB-2556
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-disaster-recovery