Skip to content

workload/schemachange: add mechanism to detect op gen failures #57949

@jayshrivastava

Description

@jayshrivastava

Describe the problem

Currently, whenever a SQL statement generation failure occurs in the schemachange workload, it ignores it and retries. This is because the error which occurs is usually a pgx.ErrNoRows that occurs as consequence of how the workload generates random statements. For example, calling randTable() may return the above error if it tries to fetch a table in the database when no tables have been created yet.

It is still possible for spurious errors to occur due to bugs in random operation generation. Right now, these get ignored. We should have a way to detect these so that bugs in the workload get fixed.

Describe the Solution

It would be ideal if the workload had some logic to terminate on statement generation failure. It could parse the error that occurs and ignore it if it is a pgx.ErrNoRows. Alternatively, functions like randTable() could just return new, non-existing tables if there are none in the db, and the workload could terminate on every generation error.

Also, it would be nice if Github issues created by roachtest failures due to operation generation bugs vs database bugs were separated. This distinction would make it easier for us to focus on and prioritize fixing database bugs. To accomplish this, one would likely need to add a flag to terminate the workload with an error on operation generation failure and create a roachtest that runs with the flag on.

Jira issue: CRDB-3455

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions