-
-
Notifications
You must be signed in to change notification settings - Fork 756
Open
Description
To avoid generating too much NFS traffic, I am using a semaphore like this to copy data from NFS to a local file storage to read from later:
cluster = SLURMCluster(cores=16, memory="64 GB", #processes=1,
local_directory="/var/scratch/me/dask_scheduler_spill",
interface='ib0', walltime='24:00:00')
# Create a client to submit to.
client = Client(cluster)
# Allocate 10 nodes in the cluster
cluster.scale(10)
print("Waiting for workers")
# Wait until they are ready
client.wait_for_workers(10)
print("Workers are ready!")
sem = Semaphore(max_leases=4, name='data_copy')
def copy_data(location, target, sem):
with sem:
copy_tree(location, target, update=True)
client.run(copy_data, root_dataset, local_dataset, sem)
sem.close() # Clean up the semaphore at the scheduler side
Now if I run it I get the error that
site-packages/distributed/semaphore.py", line 472, in release
raise RuntimeError("Released too often")
RuntimeError: Released too often
I tried debugging, I found out that setting processes=1, in the construction of SLURMCluster does **not ** fix this. So it looks like a threading issue to me. #4057 should've fixed that, but clearly it didn't? Creating the semaphore with sem = Semaphore(max_leases=4, name='data_copy', client=client, register=True) also did not work.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels