-
Notifications
You must be signed in to change notification settings - Fork 450
Performance: joblib.Parallel is significantly slower than ProcessPoolExecutor for tasks with large objects #1733
Description
Problem description
When running parallel tasks on a large, complex object (in this case, a pyvista.PolyData mesh), joblib.Parallel is substantially outperformed by the standard concurrent.futures.ProcessPoolExecutor.
Even when a manual batch_size is specified to reduce dispatching overhead, joblib remains an order of magnitude slower. The performance difference suggests that joblib may be incurring significant overhead from repeatedly serializing the large input object, whereas ProcessPoolExecutor.map seems to handle this more efficiently.
Minimal Reproducible Example
This script sets up a simple parallel task accessing data from a large pyvista.PolyData object and benchmarks joblib against concurrent.futures.
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "joblib",
# "pyvista",
# ]
# ///
import concurrent
import concurrent.futures
import itertools
import time
import joblib
import pyvista as pv
def process(mesh: pv.PolyData, idx: int) -> float:
"""A simple function that accesses data from the mesh."""
return mesh.points[idx, 0]
def main() -> None:
# Create a large mesh object to demonstrate the issue.
# A smaller `level` will reduce the mesh size and the performance gap.
mesh: pv.PolyData = pv.Box(level=200)
N_JOBS: int = 1000
print("Mesh details:")
print(mesh)
print("-" * 20)
# Benchmark joblib with automatic batching
with joblib.parallel_config(n_jobs=8, verbose=1, prefer="processes"):
time_start: float = time.perf_counter()
parallel = joblib.Parallel(batch_size="auto")
_ = parallel(joblib.delayed(process)(mesh, i) for i in range(N_JOBS))
time_end: float = time.perf_counter()
rate: float = N_JOBS / (time_end - time_start)
print(f"joblib (auto batch size): {rate:.2f} it/sec")
# Benchmark joblib with manual batching
time_start: float = time.perf_counter()
parallel = joblib.Parallel(batch_size=N_JOBS // 16)
_ = parallel(joblib.delayed(process)(mesh, i) for i in range(N_JOBS))
time_end: float = time.perf_counter()
rate: float = N_JOBS / (time_end - time_start)
print(f"joblib (manual batch size): {rate:.2f} it/sec")
# Benchmark concurrent.futures.ProcessPoolExecutor
time_start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
_ = executor.map(
process,
itertools.repeat(mesh),
range(N_JOBS),
chunksize=N_JOBS // 16,
)
time_end = time.perf_counter()
rate: float = N_JOBS / (time_end - time_start)
print(f"concurrent.futures: {rate:.2f} it/sec")
if __name__ == "__main__":
main()Observed Behavior
| Implementation | Throughput (it/sec) |
|---|---|
| joblib (auto batch size) | 32.26 |
| joblib (manual batch size) | 322.58 |
| concurrent.futures | 1829.17 |
The output clearly shows that concurrent.futures is significantly faster.
Mesh details:
PolyData (0x7f3e8c1e9f00)
N Cells: 242406
N Points: 242408
X Bounds: -1.000e+00, 1.000e+00
Y Bounds: -1.000e+00, 1.000e+00
Z Bounds: -1.000e+00, 1.000e+00
N Arrays: 0
--------------------
[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 1.4s
[Parallel(n_jobs=8)]: Done 184 tasks | elapsed: 6.1s
[Parallel(n_jobs=8)]: Done 434 tasks | elapsed: 13.7s
[Parallel(n_jobs=8)]: Done 784 tasks | elapsed: 24.4s
[Parallel(n_jobs=8)]: Done 1000 out of 1000 | elapsed: 31.0s finished
joblib (auto batch size): 32.26 it/sec
[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 668 tasks | elapsed: 1.4s
[Parallel(n_jobs=8)]: Done 1000 out of 1000 | elapsed: 3.1s finished
joblib (manual batch size): 322.58 it/sec
concurrent.futures: 1829.17 it/sec
Expected Behavior
It is expected that joblib's performance would be more comparable to ProcessPoolExecutor, especially since joblib is a library specialized for this kind of parallel workload. While some overhead is expected, a >5x performance difference with manual batching seems excessive.
Environment
- Python: 3.12.10
- joblib: 1.5.1
- pyvista: 0.45.3
- OS: Linux