-
Notifications
You must be signed in to change notification settings - Fork 4.1k
roachprod: audit and fix incorrect usages of c.Parallel wrt error handling #104312
Copy link
Copy link
Closed
Labels
A-testingTesting tools and infrastructureTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-testengTestEng TeamTestEng Team
Description
After implementing SSH retries and refactoring c.Parallel, the new implementation expects a user-defined function to return no error unless there was an internal (roachprod) error; instead, if the executed function resulted in an error, it's expected to be in RunResultDetails.Err.
The above (tacit) contract is violated by several existing commands, e.g., Wipe, Stop. When either of these functions errors out, the actual details of the error are not written to the logs. E.g., an attempt to reuse a cluster failed to execute either Wipe or Stop, but the actual details are unknown; all we get is the following message,
13:47:29 test_runner.go:605: [w4] Unable to reuse cluster: srosenberg-1685591519-18-n4cpu32 due to: COMMAND_PROBLEM: exit status 1. Will attempt to create a fresh one
The ask is as follows,
- audit all the existing commands which use
c.Paralleland fix contract violations - ensure that even internal errors are handled correctly by logging their (nested) context
- update the documentation to make it clear that
errorshould benilunless there was an internal error- Note, the majority of the cases appear to be
return res, nil; perhaps they could use a different API, one which has a single return of type*RunResultDetails?
- Note, the majority of the cases appear to be
[1] #90451
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-testingTesting tools and infrastructureTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-testengTestEng TeamTestEng Team