joblib.Parallel is not efficient at scheduling small tasks due to interprocess communication overhead. Currently this can be addressed by writing a wrapper function that work on a group of tasks and do the task grouping manually prior to calling joblib.Parallel
It would be more convenient to make joblib.Parallel do grouped dispatch internally by just passing the size of the group:
output = joblib.Parallel(n_jobs=42, group_size=10)(
delayed(my_function)(one_input) for one_input in many_many_inputs)
joblib.Parallelis not efficient at scheduling small tasks due to interprocess communication overhead. Currently this can be addressed by writing a wrapper function that work on a group of tasks and do the task grouping manually prior to callingjoblib.ParallelIt would be more convenient to make
joblib.Paralleldo grouped dispatch internally by just passing the size of the group: