Skip to content

Delay creating metadata in dask.to_cloudvolume #309

Merged
william-silversmith merged 2 commits intoseung-lab:masterfrom
chrisroat:delay_metadata
Jan 20, 2020
Merged

Delay creating metadata in dask.to_cloudvolume #309
william-silversmith merged 2 commits intoseung-lab:masterfrom
chrisroat:delay_metadata

Conversation

@chrisroat
Copy link
Contributor

I was finding a huge slowdown when opening many volumes for writing on a cloud filesystem. All the metadata files were being written from my machine -- instead of being sharded to workers.

In addition, when compute=False, it may not be the intention of the caller to ever write out data. The PR delays the writing of info or provenance until compute time.

For these reasons, I think we can delay writing the metadata. It's an open question to me if we should always wrap in dask.delayed, regardless of the value of compute. I believe this may be better, and am trying to open a discussion with the dask maintainers on a similar PR in their repo - dask/dask#5797

@chrisroat chrisroat changed the title Delay creating metadata in to_cloudvolume with compute=False Delay creating metadata in dask.to_cloudvolume Jan 17, 2020
@chrisroat
Copy link
Contributor Author

I've updated this PR to always "delay" the initial creation of info and provenance metadata. In reality, all this does is create a dependency such that the metadata is written prior to the data. It simply moves metadata creation to compute-time instead of during dask graph-computation time.

For immediate computation, this shouldn't show an effect. For delayed computation, this will move the creation to a worker.

I think it is ready to go. The travis failure seems unrelated.

@william-silversmith
Copy link
Contributor

lgtm, tests pass when running with the appropriate credentials

@william-silversmith william-silversmith merged commit df2ea10 into seung-lab:master Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Significantly affects time or space.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants