test(test-runner): test round-robin sharding with stable-test-runner #33032

muhqu · 2024-10-09T18:24:56Z

This PR contains the changes of #30962 applied as patch to the stable-test-runner.

Here's a comparison of the run times for the tests in CI…

The times above are from these test runs…

github-actions · 2024-10-09T18:54:28Z

Test results for "tests 1"

3 flaky

⚠️ [installation tests] › playwright-electron-should-work.spec.ts:21:5 › electron should work @package-installations-macos-latest
⚠️ [chromium] › components/splitView.spec.tsx:35:5 › should render sidebar first @web-components-web
⚠️ [webkit-library] › library/browsercontext-viewport-mobile.spec.ts:157:5 › mobile viewport › mouse should work with mobile viewports and cross process navigations @webkit-ubuntu-22.04-node18

35869 passed, 620 skipped
✔️✔️✔️

Merge workflow run.

pavelfeldman · 2024-10-09T23:41:36Z

Do I understand it right that if the tests had beforeAll / worker fixture setups (as in real e2e tests), we would observe an opposite effect?

pavelfeldman · 2024-10-09T23:45:19Z

I'm referring to this part:

+ *        [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
+ * Shard 1:  ^---------^                                      : [  1, 2, 3 ]
+ * Shard 2:              ^---------^                          : [  4, 5, 6 ]
+ * Shard 3:                          ^---------^              : [  7, 8, 9 ]
+ * Shard 4:                                      ^---------^  : [ 10,11,12 ]
+ * ```
+ * Shards tests by round-robin.
+ *
+ * ```
+ *          [  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]
+ * Shard 1:    ^               ^               ^              : [  1, 5, 9 ]
+ * Shard 2:        ^               ^               ^          : [  2, 6,10 ]
+ * Shard 3:            ^               ^               ^      : [  3, 7,11 ]
+ * Shard 4:                ^               ^               ^  : [  4, 8,12 ]
+ * ```

By default, we recommend using fullyParallel. This means that for the tests with non-trivial hooks we would do much more work in case of round-robin as we'll be running those hooks 3x more times. Making sure we are on the same page wrt the idea behind your change.

muhqu · 2024-10-10T08:52:58Z

I think beforeAll kinda breaks fully-parallel as it causes tests in a spec file to be executed in the same test group, but in general you are right, round-robin can cause worker scoped hooks and fixtures to be executed more times than the current sharding algorithm. And setup / teardown times of worker scoped fixtures will probably not be included in test run times, so the duration-round-robin algorithm can not account for those yet…

In our use-case, we are not making use of beforeAll, but instead have a heavy global setup. With global setup there is no difference between the sharding modes, as it's executed simply once per shard (not worker) regardless of the tests which will be run per shard.

There clearly is no silver bullet. However, I think that this solution gives the playwright user more flexibility to try-out and choose whatever works best for them.

pavelfeldman · 2024-10-10T15:34:27Z

There clearly is no silver bullet. However, I think that this solution gives the playwright user more flexibility to try-out and choose whatever works best for them.

Going back to my original sentiment, that's exactly why we are hesitant landing the PR. We are looking for the ways to cover more use cases (with hooks) with a smaller api surface / maintenance cost. For example, allow test lists for shards so that users could use third party solutions to tune their exact configuration. Or allow a callback that would take over scheduling tests. Developing those to be easy in maintenance requires consideration and time that we can't currently allocate to the problem. But we are very open to keeping this communication in case a nice proposal comes up.

muhqu · 2024-10-10T16:09:01Z

Would you be open to an approach that allows users to implement their own test filter? …basically similar to grep, but instead of RegEx use a function that takes a list of test infos and returns a filtered list...

That would actually allow users to implement their own sharding logic.

pavelfeldman · 2024-10-10T17:32:26Z

Would you be open to an approach that allows users to implement their own test filter? …basically similar to grep, but instead of RegEx use a function that takes a list of test infos and returns a filtered list...

I think it is worth exploring. The other option is a test list file. We know that people want test lists, but we are struggling with committing to a persistent test id that would be used in those (it becomes a part of our contract). Lists are also sub-optimal for those interested in sharding as shards have different lists. But many more customers are interested in custom failure retries where test lists are very useful, so it might be that committing it worth it.

Callback approach is potentially simpler as we can use the reporter data types there, but it might hit some rocks as we would allow users specifying the order of the tests and that might result in suboptimal worker process alignment. Just thinking out loud here.

In either case, I like having lower-level primitives that allow for greater flexibility for power users to tune Playwright to their definition of perfection. Much more than having a handful of suboptimal presets that will only work for a couple of customers.

muhqu · 2024-10-10T18:29:23Z

Okay, sounds good. I just looked into filtering and implemented a first version: #33049

test(test-runner): test round-robin sharding with stable-test-runner

7f79e1f

muhqu mentioned this pull request Oct 9, 2024

feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962

Closed

muhqu closed this Oct 10, 2024

muhqu mentioned this pull request Oct 18, 2024

[Feature] Split shards via test timing data #17969

Open

gpaciga mentioned this pull request Jun 9, 2025

[Feature]: Option to distribute projects evenly among shards #36253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(test-runner): test round-robin sharding with stable-test-runner #33032

test(test-runner): test round-robin sharding with stable-test-runner #33032

Uh oh!

muhqu commented Oct 9, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

pavelfeldman commented Oct 9, 2024

Uh oh!

pavelfeldman commented Oct 9, 2024

Uh oh!

muhqu commented Oct 10, 2024

Uh oh!

pavelfeldman commented Oct 10, 2024

Uh oh!

muhqu commented Oct 10, 2024 •

edited

Loading

Uh oh!

pavelfeldman commented Oct 10, 2024

Uh oh!

muhqu commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

test(test-runner): test round-robin sharding with stable-test-runner #33032

test(test-runner): test round-robin sharding with stable-test-runner #33032

Uh oh!

Conversation

muhqu commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 9, 2024

Test results for "tests 1"

Uh oh!

pavelfeldman commented Oct 9, 2024

Uh oh!

pavelfeldman commented Oct 9, 2024

Uh oh!

muhqu commented Oct 10, 2024

Uh oh!

pavelfeldman commented Oct 10, 2024

Uh oh!

muhqu commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavelfeldman commented Oct 10, 2024

Uh oh!

muhqu commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muhqu commented Oct 9, 2024 •

edited

Loading

muhqu commented Oct 10, 2024 •

edited

Loading