Automatically retry WPT tasks with few failures?

docker-worker and generic-worker both support a `retry` configuration in task definitions:

* https://github.com/taskcluster/docker-worker/blob/e2eb847d404d7fa776a394c5845c5e98040cc13b/schemas/v1/payload.json#L190-L197
* https://github.com/taskcluster/generic-worker/blob/e31923d3322d3d31af563ae6a1f3e328b9223d5f/multiuser_windows.yml#L222-L237

It takes an list of exit codes. When the task’s command fails with one of these, the worker resolves the task with the queue as “exception” rather than “failed”. The queue will then automatically re-schedule that task again, up to a configurable number of retries (defaults to 5).


-----

In this repository, PRs often fail to land because of some intermittent WPT failure. We mark some test filenames as known intermittents whose failure we ignore, but I suspect that at least some of the source of non-determinism is weakly or not correlated to the filename, as PRs still regularly fail at first but then land after a retry (or a few).

Retrying an entire PR though homu is costly, in terms of overall cycle time. https://github.com/servo/servo/pull/23383 can help but not when another PR was merged in the meantime, which is common when homu’s queue is non-empty as homu will start on the next PR after a failure quicker than a human can type the `retry` command.

Taskcluster queue’s retry mechanism is more fine-grained: task level instead of PR level. When running fewer tests again in the second try (one out of 6 WPT chucks for example), we’re less likely to hit another random/intermittent failure.

~However, the downside is multiplying the time to reporting deterministic test failures when a PR breaks something. This could be limited by setting a low retry count like 2 or 3 for WPT tasks.~ (https://github.com/servo/servo/pull/24768 greatly reduced this time loss.)

@jdm, what do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatically retry WPT tasks with few failures? #23655

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Automatically retry WPT tasks with few failures? #23655

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions