Add MemoryFileSystem by martindurant · Pull Request #2741 · dask/dask

martindurant · 2017-10-04T01:41:27Z

Tests added / passed
Passes flake8 dask
Fully documented, including docs/source/changelog.rst for all changes
and one of the docs/source/*-api.rst files for new API

See dask/fastparquet#215
This is a single global store. It is meant for use only with the Threaded scheduler - not sure how useful it is.

martindurant · 2017-10-26T14:23:21Z

What would be the right way of determining the size of data help in a bytesio on py2? Is it something that needs to be saved via tell() when we are done writing instead?

mrocklin · 2017-10-26T14:48:26Z

I don't know personally, but this seems like the kind of thing that might get an answer on StackOverflow relatively quickly.

…

On Thu, Oct 26, 2017 at 10:23 AM, Martin Durant ***@***.***> wrote: What would be the right way of determining the size of data help in a bytesio on py2? Is it something that needs to be saved via tell() when we are done writing instead? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszCJzlv2qTVCs3bOYf9KWnwvUMwk5ks5swJXZgaJpZM4Ps_sY> .

martindurant · 2017-10-26T14:51:08Z

Actually, thinking about it a moment, i.seek(0, 2) should work and come with very little cost.

mrocklin · 2017-10-30T16:59:21Z

@martindurant is there anything that remains to be done here? What's here seems fine to me.

My only comment is that there seems to be a fair amount of copy-pasting between filesystem test suites. It might make sense at some point to construct an inheritable test class that others can use for tests. This might be something that we hand to the Arrow folks for use with their HDFS implementation.

martindurant · 2017-10-30T17:01:59Z

I think this is complete enough to be useful.
Agree about the test duplication, although I don't expect this code to change frequently.

mrocklin · 2017-10-30T17:38:08Z

I think that this needs to be added to the import at dask/bytes/__init__.py .

It would also be nice to see a roundtrip test with dd.to_csv and dd.read_csv

mrocklin · 2017-11-30T12:58:15Z

@martindurant ok to merge?

martindurant · 2017-11-30T13:42:14Z

Yes, I think so. This does not appear explicitly in the docs, but it is a fairly niche use.

Plus auto-import the back-end

Use UUID for ukey; file may have changed at any time Use temp directory for test server

martindurant · 2018-02-21T17:52:10Z

Updated here with the simplifications that went into bytes. Can be merged after #3160, if that is good to go.

martindurant · 2018-05-28T00:13:06Z

@alimanfoo , if this would be useful to you for making in-memory zarr files, then please try it out and see how well it works.

alimanfoo · 2018-05-29T09:16:16Z

Cool, thank you, I'll take a look.

…

On Mon, 28 May 2018, 01:13 Martin Durant, ***@***.***> wrote: @alimanfoo <https://github.com/alimanfoo> , if this would be useful to you for making in-memory zarr files, then please try it out and see how well it works. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8Qvznm1rkNDUIX0cN8dH89R17_uSKks5t20EXgaJpZM4Ps_sY> .

jakirkham · 2018-06-05T18:53:34Z

Have a few questions. What contexts does this work in (e.g. single threaded, multithreaded, multiprocessing, distributed, etc.)? Also how does this work when someone wants to access this stored data?

martindurant · 2018-06-05T19:02:46Z

There are a couple of examples of round-tripping in the tests, so the following should work

arr.to_zarr('memory://path/arr.zarr')
arr2  = da.read_zarr('memory://path/arr.zarr')

so long as we are within one process (sync or thread scheduler, or distributed in-process).

If you are not in one process, you would still successfully make the file-like objects of binary data, but would not know which piece was where. That is like persisting a set of keys (binary data in memory) without the global map of which key is where - i.e., not too useful.

Enough to get to_zarr/from_zarr working

martindurant · 2018-06-06T14:18:02Z

With those changes, a simple zarr roundtrip does work.

Note: this stuff, is found useful, still needs extensive testing

From dask/dask#2741 (which can be closed)

jrbourbeau · 2019-06-17T17:53:00Z

Closing based on https://github.com/martindurant/filesystem_spec/pull/11#issue-209228566. @martindurant feel free to re-open if needed

Martin Durant added 2 commits October 3, 2017 21:37

Add MemoryFileSystem

5822927

flake

bf0f283

Martin Durant added 2 commits October 26, 2017 11:45

size for py2

2d27b42

py2 fix

e4c6e93

Import memory in bytes; add csv rountrip test

a7c244b

Martin Durant added 16 commits February 12, 2018 11:45

Add HTTPFileSystem

3f01ca8

First test

609b11b

Plus auto-import the back-end

Simplify logic; parse query

d3d83ee

Add tests

4518f04

flake

10ac499

More tests, fixes, and working for Range-free servers

d7c5810

Rever change in bytes.core

a31995c

fix test

ce259ab

flake on tests

d81dc11

Add checks for non-behaving HTTP servers

2930cbb

one more flake

e8fca45

Merge branch 'master' into memory_fs

b0588e6

remove trim

45936e8

Fix for comments

f14dc8f

Use UUID for ukey; file may have changed at any time Use temp directory for test server

Merge branch 'httpfs' into memory_fs

f53fc3a

fixes

ce50a5f

Martin Durant added 5 commits February 21, 2018 12:58

flaking

dafab62

Merge branch 'master' into httpfs

0999d77

Add httpfs docs to remote services, update changelog

b86ef09

Merge branch 'httpfs' into memory_fs

c750645

Merge branch 'master' into memory_fs

91626db

martindurant mentioned this pull request Apr 20, 2018

[ENH] Add read support for Google Cloud Storage pandas-dev/pandas#20729

Merged

4 tasks

martindurant mentioned this pull request May 24, 2018

add to/read_zarr #3460

Merged

2 tasks

martindurant mentioned this pull request Jun 5, 2018

Support Zarr Arrays in to_zarr/from_zarr #3561

Merged

2 tasks

Martin Durant added 2 commits June 6, 2018 10:11

Merge branch 'master' into memory_fs

916dc13

Add memory FS methods

776a904

Enough to get to_zarr/from_zarr working

typo

4dacaa0

Note: this stuff, is found useful, still needs extensive testing

martindurant mentioned this pull request Aug 17, 2018

Would it be worth making this method accept both filepath and BytesIO/StringIO object? dask/fastparquet#360

Closed

martindurant pushed a commit to fsspec/filesystem_spec that referenced this pull request Aug 17, 2018

Add example memory filesystem implementation

80efb4d

From dask/dask#2741 (which can be closed)

martindurant mentioned this pull request Aug 17, 2018

Add example memory filesystem implementation fsspec/filesystem_spec#11

Merged

jrbourbeau closed this Jun 17, 2019

martindurant deleted the memory_fs branch February 9, 2021 19:13

Uh oh!

Conversation

martindurant commented Oct 4, 2017

Uh oh!

martindurant commented Oct 26, 2017

Uh oh!

mrocklin commented Oct 26, 2017 via email

Uh oh!

martindurant commented Oct 26, 2017

Uh oh!

mrocklin commented Oct 30, 2017

Uh oh!

martindurant commented Oct 30, 2017

Uh oh!

mrocklin commented Oct 30, 2017

Uh oh!

mrocklin commented Nov 30, 2017

Uh oh!

martindurant commented Nov 30, 2017

Uh oh!

martindurant commented Feb 21, 2018

Uh oh!

martindurant commented May 28, 2018

Uh oh!

alimanfoo commented May 29, 2018 via email

Uh oh!

jakirkham commented Jun 5, 2018

Uh oh!

martindurant commented Jun 5, 2018

Uh oh!

martindurant commented Jun 6, 2018

Uh oh!

jrbourbeau commented Jun 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants