Skip to content

CI: Test scripts w. no external dependencies#2143

Closed
Lestropie wants to merge 1 commit intodevfrom
CI_test_scripts
Closed

CI: Test scripts w. no external dependencies#2143
Lestropie wants to merge 1 commit intodevfrom
CI_test_scripts

Conversation

@Lestropie
Copy link
Copy Markdown
Member

Adds "scripts_nodep" input to run_tests script, which executes only tests for those Python scripts absent from the blacklist identifying those scripts with external software dependencies.

Related to #2134, but seeking to integrate more regular testing of as many scripts as possible; testing of those scripts with external dependencies could potentially be done only for merges to master. Will find out from the checks on this PR how much additional runtime is required.

Adds "scripts_nodep" input to run_tests script, which executes only tests for those Python scripts absent from the blacklist identifying those scripts with external software dependencies.
@Lestropie
Copy link
Copy Markdown
Member Author

CI timing changes:
Linux: 35m -> 2h25m
Mac: 40m -> 1h30m
MSYS2: 45m -> 3h15m

Main culprits (at least when run locally) are dwi2response, dwigradcheck and population_template. dwigradcheck could be sped up trivially, and I could have only the Mac CI build run the script tests; reckon that would be satisfactory?

@Lestropie
Copy link
Copy Markdown
Member Author

A potential alternative solution would be to define a separate text file that contains only those script tests intended to be run as part of CI. So for instance only one variant of each dwi2response algorithm would be invoked, only one population_template test would run, etc.. It may not catch all potential interactions between C++ code modifications and Python scripts, and it would require additional maintenance for the contents of that test list file, but it would enable a reasonably quick test of the scripts to be integrated into CI, and it would allow for those tests to be run on all three OS's within CI.

@Lestropie Lestropie marked this pull request as draft August 25, 2020 00:39
@bjeurissen
Copy link
Copy Markdown
Member

It definitely makes sense to add scripts to CI. I feel only adding it for Mac could offer a good balance between test coverage and runtime.

What is the reason for the large discrepancies between the platforms? Is that because they have different amounts of resources allocated?

@bjeurissen
Copy link
Copy Markdown
Member

What is the reason for the large discrepancies between the platforms? Is that because they have different amounts of resources allocated?

I had a look and the resources for each platform are the same. I believe those massive Windows Python runtimes are indicative of using "msys/python3", which uses POSIX emulation, rather than "mingw64/mingw-w64-x86_64-python3", which is native Windows python. I think we should be using the latter?

@maxpietsch
Copy link
Copy Markdown
Member

population_template test runtime can be reduced by a factor 2 by deactivating the multi-contrast tests.

Instead of additional tests files, how about deactivating non-essential standalone script tests stochastically for CI testing? This speeds up testing but we'd still eventually catch test failures.

Something along the lines of ([ ! -z "$CI" ] || (($RANDOM % 2))) && <test>? Could also include some git-fu to check if a file of interest was changed which would trigger to run all tests:
run_all=$(git --no-pager diff --name-only $(git branch --show-current) $(git merge-base $(git branch --show-current) master) | grep 'mrtransform\|mrregister\|registration\|population_template')

@jdtournier
Copy link
Copy Markdown
Member

What is the reason for the large discrepancies between the platforms? Is that because they have different amounts of resources allocated?

I had a look and the resources for each platform are the same. I believe those massive Windows Python runtimes are indicative of using "msys/python3", which uses POSIX emulation, rather than "mingw64/mingw-w64-x86_64-python3", which is native Windows python. I think we should be using the latter?

Just to follow up on that: I'm not sure the python runtime should make all that much difference, the bulk of the work should be performed within the binaries. But more to the point, I had a go at checking the difference in performance on my Windows 10 laptop for some of these scripts between MSYS2 and WSL2 (which I've recently installed). This was done using this branch, and just timing the tests for dwi2response (timing the second run to ensure all data are synchronised and testing commands built first):

using MSYS2:

$ time ./run_tests dwi2response
logging to "testing_dwi2response.log"
checking for requisite test data... Synchronized OK
building testing commands... OK
Erasing old script test output directory... OK
running "testing/scripts/tests/dwi2response"... 43 of 43 passed

real    11m44.009s
user    0m2.616s
sys     0m6.493s

using WSL2:

$ time ./run_tests dwi2response
logging to "testing_dwi2response.log"
checking for requisite test data... Synchronized OK
building testing commands... OK
Erasing old script test output directory... OK
running "testing/scripts/tests/dwi2response"... 43 of 43 passed

real    51m44.493s
user    13m57.978s
sys     10m55.893s

😮 not what I was expecting (are at least hoping for)! I was hoping WSL2 would bring performance improvements, not a 4x drop on the exact same hardware... I might run it again just to make sure, this is really weird.

@bjeurissen
Copy link
Copy Markdown
Member

Wow... What happens if you benchmark dwi2fod between the two?

@bjeurissen
Copy link
Copy Markdown
Member

@jdtournier
Copy link
Copy Markdown
Member

OK, it looks like my custom WSL2 setup was probably to blame... I'd set it up to use the same home directory as the regular Windows one, using these instructions. If I revert to a vanilla WSL2 setup, the numbers are actually far more impressive:

$ time ./run_tests dwi2response
logging to "testing_dwi2response.log"
checking for requisite test data... Synchronized OK
building testing commands... OK
Erasing old script test output directory... OK
running "testing/scripts/tests/dwi2response"... 43 of 43 passed

real    5m56.313s
user    10m31.748s
sys     0m29.153s

So it looks like WSL2 is a substantial improvement after all - as long as you don't mess with the setup...

@jdtournier
Copy link
Copy Markdown
Member

On the topic of WSL, a nice bonus is that mrpeek works out of the box - though you need to use the wsltty terminal to get it to work:

mrpeek

@Lestropie
Copy link
Copy Markdown
Member Author

Closing in facvor of #2154.

@Lestropie Lestropie closed this Feb 7, 2021
@Lestropie Lestropie deleted the CI_test_scripts branch February 7, 2021 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants