Add parallel execution mode for integration tests#3522
Conversation
ccd8798 to
4413730
Compare
Motivation: Integration tests regularly take 21m in CI; they're the long pole. Modifications: * Allocation count tests are now run in parallel if `NIO_ALLOC_COUNTER_TESTS_PARALLEL` is `true`, tests in a given file are still in serial. * Test output is catted to different files and recovered and printed in the order in which the tests were started. Result: Integration test execution goes from 21m to 10m
4413730 to
eb22daf
Compare
| echo "- ${files[$i]}" | ||
| cat "$out_dir/$module" |
There was a problem hiding this comment.
Why is this going to stdout but the other branch going to stderr?
There was a problem hiding this comment.
This is the intended output of the script which I believe always went to stdout, the other path represents a failure case so it goes to stderr so as not mess with any parsing of the script output
There was a problem hiding this comment.
Logging should go to stderr. Output that's consumed by another process should go to stdout. It's not clear why the new stuff is going to stderr IMO. I'd expect a downstream tool to see all the modules that ran and whether they passed or failed, not just the ones that passed.
| @@ -289,8 +289,33 @@ build_package \ | |||
| set -eu | |||
There was a problem hiding this comment.
This used to exit if a swift run command failed.
Are we sure it does now since the wait is in an if
There was a problem hiding this comment.
I don't think it will now, we won't early exit, we'll keep going and try every file. I'm not sure this is worse, but it is different, I can look at changing it if you'd like.
There was a problem hiding this comment.
I'm less concerned with whether we exit early, but IIUC this now changes whether we exit non-zero or not if one of the runs fails. We used to do that implicitly with the -e, on the first fail, because they were being run in serial. Now they're being run in parallel it's harder to reason about. I can see it sets all_ok=false if there's a failure, but is something checking this?
Can we just confirm that this fails as expected when there's a failure?
There was a problem hiding this comment.
I have tweaked this (because the other solution didn't get on well on my Mac's version of bash) and confirmed by running a script with this code snippet that failing and succeeding commands are handled as expected.
Add parallel execution mode for integration tests
Motivation:
Integration tests regularly take 21m in CI; they're the long pole.
Modifications:
NIO_ALLOC_COUNTER_TESTS_PARALLEListrue, tests in a given fileare still in serial.
the order in which the tests were started.
Result:
Integration test execution (and therefore a PR's CI checks) goes from 21m to 10m