Add `./mach timings` by SimonSapin · Pull Request #9 · servo/taskcluster-config

SimonSapin · 2019-11-17T10:47:35Z

Summarizes the time taken by tasks in given task group IDs

Example:

$ ./mach timings DL4zftVfSqW3WOTC3IoFcg DBt9ki9gTdWmwAk-VDorzw HKX2tko_RgGd6--031IfOA
https://community-tc.services.mozilla.com/tasks/groups/DL4zftVfSqW3WOTC3IoFcg
count 1, total 0:00:29, max: 0:00:29	docker	0:00:29
count 2, total 1:21:41, max: 0:47:55	macos-disabled-mac8	0:47:55 0:33:45
count 6, total 2:47:04, max: 0:37:12	macos-disabled-mac8 WPT	0:23:30 0:29:24 0:28:13 0:37:12 0:20:35 0:28:09
https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw
count 1, total 0:00:32, max: 0:00:32	docker	0:00:32
count 1, total 0:59:14, max: 0:59:14	macos-disabled-mac1	0:59:14
count 6, total 4:12:16, max: 1:01:14	macos-disabled-mac1 WPT	0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19	macos-disabled-mac9	0:55:19
count 6, total 4:25:09, max: 1:01:40	macos-disabled-mac9 WPT	0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
https://community-tc.services.mozilla.com/tasks/groups/HKX2tko_RgGd6--031IfOA
count 1, total 0:00:49, max: 0:00:49	docker	0:00:49
count 7, total 2:49:55, max: 0:32:28	macos-disabled-mac8 WPT	0:32:28 0:29:55 0:08:55 0:19:44 0:27:36 0:31:41 0:19:35
count 7, total 3:30:39, max: 0:45:07	macos-i3 WPT	0:23:23 0:37:31 0:31:03 0:37:58 0:45:07 0:08:17 0:27:20
count 13, total 4:39:15, max: 0:38:49	macos-i7 WPT	0:37:12 0:20:01 0:18:59 0:17:29 0:38:49 0:17:00 0:20:02 0:26:49 0:20:03 0:09:17 0:20:02 0:19:55 0:13:39

Summarizes the time taken by tasks in given task group IDs Example: ``` $ ./mach timings DL4zftVfSqW3WOTC3IoFcg DBt9ki9gTdWmwAk-VDorzw HKX2tko_RgGd6--031IfOA https://community-tc.services.mozilla.com/tasks/groups/DL4zftVfSqW3WOTC3IoFcg count 1, total 0:00:29, max: 0:00:29 docker 0:00:29 count 2, total 1:21:41, max: 0:47:55 macos-disabled-mac8 0:47:55 0:33:45 count 6, total 2:47:04, max: 0:37:12 macos-disabled-mac8 WPT 0:23:30 0:29:24 0:28:13 0:37:12 0:20:35 0:28:09 https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 https://community-tc.services.mozilla.com/tasks/groups/HKX2tko_RgGd6--031IfOA count 1, total 0:00:49, max: 0:00:49 docker 0:00:49 count 7, total 2:49:55, max: 0:32:28 macos-disabled-mac8 WPT 0:32:28 0:29:55 0:08:55 0:19:44 0:27:36 0:31:41 0:19:35 count 7, total 3:30:39, max: 0:45:07 macos-i3 WPT 0:23:23 0:37:31 0:31:03 0:37:58 0:45:07 0:08:17 0:27:20 count 13, total 4:39:15, max: 0:38:49 macos-i7 WPT 0:37:12 0:20:01 0:18:59 0:17:29 0:38:49 0:17:00 0:20:02 0:26:49 0:20:03 0:09:17 0:20:02 0:19:55 0:13:39 ```

## Before this Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR. servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR: https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw ``` count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 ``` Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized. When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot. ## This This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle. Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average. The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though. Another worry I had if whether this would make wose the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work. This turned out not to be enough of a problem to negatively affect the total machine time: ``` https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w count 1, total 0:00:48, max: 0:00:48 docker 0:00:48 count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04 count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT 0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36 0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40 0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16 0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27 ``` (4h03min is even lower than above, but seems within variation.) ## After this #23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.

Split WPT macOS testing into many more chunks ## Before this Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR. servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR: https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw ``` count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 ``` Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized. When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot. ## This This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle. Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average. The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though. Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work. This turned out not to be enough of a problem to negatively affect the total machine time: https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w ``` count 1, total 0:00:48, max: 0:00:48 docker 0:00:48 count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04 count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT 0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36 0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40 0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16 0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27 ``` (4h03min is even lower than above, but seems within variation.) ## After this #23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.

## Before this Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR. servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR: https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw ``` count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 ``` Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized. When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot. ## This This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle. Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average. The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though. Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work. This turned out not to be enough of a problem to negatively affect the total machine time: https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w ``` count 1, total 0:00:48, max: 0:00:48 docker 0:00:48 count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04 count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT 0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36 0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40 0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16 0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27 ``` (4h03min is even lower than above, but seems within variation.) ## After this #23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.

Split WPT macOS testing into many more chunks ## Before this Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR. servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR: https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw ``` count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 ``` Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized. When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot. ## This This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle. Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average. The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though. Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work. This turned out not to be enough of a problem to negatively affect the total machine time: https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w ``` count 1, total 0:00:48, max: 0:00:48 docker 0:00:48 count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04 count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT 0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36 0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40 0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16 0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27 ``` (4h03min is even lower than above, but seems within variation.) ## After this #23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.

## Before this Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR. servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR: https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw ``` count 1, total 0:00:32, max: 0:00:32 docker 0:00:32 count 1, total 0:59:14, max: 0:59:14 macos-disabled-mac1 0:59:14 count 6, total 4:12:16, max: 1:01:14 macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10 count 1, total 0:55:19, max: 0:55:19 macos-disabled-mac9 0:55:19 count 6, total 4:25:09, max: 1:01:40 macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31 ``` Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized. When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot. ## This This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle. Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average. The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in servo#24753 reduced that overhead, though. Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work. This turned out not to be enough of a problem to negatively affect the total machine time: https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w ``` count 1, total 0:00:48, max: 0:00:48 docker 0:00:48 count 1, total 0:39:04, max: 0:39:04 macos-disabled-mac9 0:39:04 count 31, total 4:03:29, max: 0:15:29 macos-disabled-mac9 WPT 0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36 0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40 0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16 0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27 ``` (4h03min is even lower than above, but seems within variation.) ## After this servo#23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.

SimonSapin mentioned this pull request Nov 18, 2019

Split WPT macOS testing into many more chunks servo/servo#24768

Merged

jdm approved these changes Nov 19, 2019

View reviewed changes

SimonSapin merged commit 30f2413 into master Nov 19, 2019

SimonSapin deleted the timings branch November 19, 2019 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `./mach timings`#9

Add `./mach timings`#9
SimonSapin merged 1 commit intomasterfrom
timings

SimonSapin commented Nov 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SimonSapin commented Nov 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants