We use Python's multiprocessing.Pool (see get_multiprocessing_pool in tools/shared.py) to compile system libraries etc., so that we use all cores. This may be slower on windows. Perhaps there is a faster way to execute a bunch of shell commands in parallel?