-
-
Notifications
You must be signed in to change notification settings - Fork 757
Closed
Description
Limiting the memory used by a dask-worker using the --memory-limit option seems to have no effect.
Using Python 2.7.5 on a Scientific Linux 7.3 host, with dask 0.14.1 and distributed 1.16.1, I set up a scheduler and a single worker as follows:
dask-scheduler --scheduler-file /tmp/schedfile &
sleep 5
dask-worker --no-nanny --no-bokeh --nprocs 1 --nthreads 1 --memory-limit=250e6 \
--scheduler-file /tmp/schedfile &
I ran the following client code to test:
import dask, dask.distributed, dask.array
cli = dask.distributed.Client(scheduler_file='/tmp/schedfile')
x = dask.array.random.random((2000, 2000), chunks=(10, 10))
y = x.T * x
print(cli.gather(cli.compute(y.sum()))
and watched the memory usage via ps -e -o rss,command | grep python | grep -v grep.
I expected the RSS of the dask-worker process to stay below 250K (or just slightly above), but it was around 450K for most of the calculation. With larger array sizes, the RSS rises even more.
Is this a bug? If not, what is the best way to guarantee that a dask-worker does not go over a hard limit, memory-wise?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels