How to handle tests that "should work" but don't?

In https://github.com/coiled/coiled-runtime/pull/243#issuecomment-1217395868, I'm adding a workload that should work—and in fact does work on a fresh cluster—but always fails in CI right now due to some combination of
* https://github.com/dask/distributed/issues/4987#issuecomment-877473670
* https://github.com/dask/distributed/issues/5223

In general, how to we want to handle these sorts of tests?

#### 1. Don't run them (skip / don't add at all)

This is simple since then we don't have to change anything right now.

If they're skipped, when we think the issue might be fixed, we could hopefully remember to go un-skip them, see if that passes, and if so, merge the un-skip.

However, that manual process (remember to un-skip this) won't scale well to many tests, is easy to forget, and also doesn't exactly confirm that the thing that you think fixed the problem actually did, since (without even more manual work) you don't have a record of the test failing before the suspected fix was merged.

#### 2. Allow some tests to fail (xfail?), and have a way of displaying that on the dashboard

This would be a little more work, since we'd need to figure out a way to represent failure on the dashboard. But that would also tell a nice story when a lot of Xs turned into a normal line.

It's also nice to know if something unintentionally fixes the test (though this is probably pretty rare). Plus, I think it's generally good to be upfront about things that don't work.

A downside here is increased runtime (and cost) for the tests, running something all the time that we know is going to fail (and the failure mode is probably that it times out after 10min of runtime or something).

Another thing @fjetter mentioned is that we don't want to have a bunch of xfailed tests hanging around forever. **If a test is going to be xfailed, it must have an issue opened for it, and the expectation that we'd actually work on that issue in the near term.** Otherwise, if something doesn't work and we don't expect it will for a long time, it should probably be skipped or not added at all.

cc @hayesgb @jrbourbeau @ian-r-rose @ncclementi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle tests that "should work" but don't? #252

1. Don't run them (skip / don't add at all)

2. Allow some tests to fail (xfail?), and have a way of displaying that on the dashboard

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle tests that "should work" but don't? #252

Description

1. Don't run them (skip / don't add at all)

2. Allow some tests to fail (xfail?), and have a way of displaying that on the dashboard

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions