-
Notifications
You must be signed in to change notification settings - Fork 4.1k
workload/schemachange: add mechanism to detect op gen failures #57949
Description
Describe the problem
Currently, whenever a SQL statement generation failure occurs in the schemachange workload, it ignores it and retries. This is because the error which occurs is usually a pgx.ErrNoRows that occurs as consequence of how the workload generates random statements. For example, calling randTable() may return the above error if it tries to fetch a table in the database when no tables have been created yet.
It is still possible for spurious errors to occur due to bugs in random operation generation. Right now, these get ignored. We should have a way to detect these so that bugs in the workload get fixed.
Describe the Solution
It would be ideal if the workload had some logic to terminate on statement generation failure. It could parse the error that occurs and ignore it if it is a pgx.ErrNoRows. Alternatively, functions like randTable() could just return new, non-existing tables if there are none in the db, and the workload could terminate on every generation error.
Also, it would be nice if Github issues created by roachtest failures due to operation generation bugs vs database bugs were separated. This distinction would make it easier for us to focus on and prioritize fixing database bugs. To accomplish this, one would likely need to add a flag to terminate the workload with an error on operation generation failure and create a roachtest that runs with the flag on.
Jira issue: CRDB-3455