Skip to content

Disable locking in to_zarr (needed for using to_zarr in a distributed context)#3607

Merged
mrocklin merged 2 commits intodask:masterfrom
jakirkham:no_lock_in_to_zarr
Jun 13, 2018
Merged

Disable locking in to_zarr (needed for using to_zarr in a distributed context)#3607
mrocklin merged 2 commits intodask:masterfrom
jakirkham:no_lock_in_to_zarr

Conversation

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Jun 13, 2018

Appears that to_zarr was defaulting to store's behavior of using a threading.Lock object for each chunk written. As the threading.Lock object is not pickleable, this fails when using Distributed. Since we already guarantee the chunks are aligned, just disable any kind of locking in to_zarr. Includes a test using a small Distributed cluster that fails without this change and passes with it.

  • Tests added / passed
  • Passes flake8 dask

Appears that locking was enabled in `to_zarr` using the default
`threading.Lock` object. This doesn't work well in Distributed context.
Given we already ensure the Dask Array will have proper chunking and
Zarr will match it, there is no need to have any lock. So simply disable
locking completely.
@jakirkham jakirkham mentioned this pull request Jun 13, 2018
2 tasks
@martindurant
Copy link
Member

Good catch.

@jakirkham
Copy link
Member Author

Thanks. If you have time to review, that would be appreciated.

@martindurant
Copy link
Member

+1

a.to_zarr(d)
a2 = da.from_zarr(d)
assert_eq(a, a2)
assert a2.chunks == a.chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably live in tests/test_distributed.py

Also, you should copy the models already in that file, or else read through http://distributed.readthedocs.io/en/latest/develop.html#writing-tests

When you start creating thousands of dask clusters and tornado IOLoops in a process weird things happen if you aren't careful. The testing infrastructure in that doc page includes some well-trusted mechanisms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I found this usage pattern in that file. Not attached to the usage pattern or which file within Dask this lives. Just noting that usage pattern already exists in the code base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm referring to creating a bare Client. Is this also what you're referring to? If so can you point me to where this is happening? I'm not seeing this happen within the test_distributed.py file. We always use the cluster context manager and the loop pytest fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was incorrect about the file. Found it here.

ref: https://github.com/dask/dask/blob/0.17.5/dask/bytes/tests/test_hdfs.py#L300-L312

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event, have moved this to Distributed and used the with cluster() syntax. Please let me know your thoughts on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, thanks for hunting that down.

Anyway, I strongly suggest that we stick to the pattern in the dask.distributed documentation, or in the test_distributed.py file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree. Only used it as it was the first thing I found and did not know what I was doing. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a PR ( #3609 ) to try and refresh that test.

Make sure that it is possible to store to an on disk Zarr Array in a
Distributed context.
@jakirkham jakirkham force-pushed the no_lock_in_to_zarr branch from 540acb4 to e3b2df5 Compare June 13, 2018 17:35
a.to_zarr(d)
a2 = da.from_zarr(d)
assert_eq(a, a2)
assert a2.chunks == a.chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks @jakirkham

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking :)

@jakirkham jakirkham mentioned this pull request Jun 13, 2018
2 tasks
@mrocklin
Copy link
Member

+1 from @martindurant . Merging

@mrocklin mrocklin merged commit 411f3c5 into dask:master Jun 13, 2018
@jakirkham jakirkham deleted the no_lock_in_to_zarr branch June 14, 2018 03:47
@jakirkham
Copy link
Member Author

Thanks all :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants