ENH: added lzma support to data file handler#3667
Conversation
|
Sounds okay to me. |
There was a problem hiding this comment.
This test now seems to be leaving open a file handle somewhere--some of the build configurations test that all tests close any files they open. It's not obvious to me yet where that's happening, but you can try it yourself locally by passing the --open-files flag to ./setup.py test.
There was a problem hiding this comment.
I was able to confirm this locally, so I don't think it's just a transient failure. And it is indeed happening when 'local.dat.xz' is passed to the filename fixture.
There was a problem hiding this comment.
Just to give one more data point though, it turns out my Python (3.4) is not compiled with lzma support:
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lzma
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/internal/1/root/usr/local/lib/python3.4/lzma.py", line 26, in <module>
from _lzma import *
ImportError: No module named '_lzma'
>>>
despite this, the test is still doing something (I'm not sure what yet) and leaving an open file. That might be what's happening on Travis too.
|
Couldn't run the test --open-files here, but I'm trying to install psutil now. I suspect it's the original file handle being abandoned by |
|
Right, that was quite unexpected, but I think the underlying problem is that |
|
About the test philosophy, I did not like the original way in and replaced it with the |
|
OK, I think the problem is in Which makes no distinction between binary files that could not be decoded at all and text files with a not-yet-understood structure. |
|
Well that seems to work better; left it at the |
|
Oh, had missed the original |
astropy/utils/data.py
Outdated
There was a problem hiding this comment.
There's a number of situations where Python doesn't have bzip2 support (i.e. it's in the stdlib but not compiled). Maybe say "any of which can be compressed in gzip, bzip2 or xz if the appropriate compression libraries are available."
|
Thanks for taking this on. As I'm sure you're discovering, file handling in Python is full of all kinds of fun inconsistencies and gotchas -- particularly when trying to support as many different versions of Python as we are. Don't take the amount of feedback as saying anything negative about your work... |
|
You're welcome; I did get some exposure to this when doing a little work on the numpy io routines, sometimes similarly frustrating. It's not so easy to get your hands on an installation without bzip2 compiled in, or Python 3.3 without lzma. |
|
@taldcroft : I just saw this issue and thought it's worth pointing out to you, too. The way |
|
This seems to have gotten mixed up with some unrelated commits (possibly due to a botched merge or something). @dhomeier, do you think you'll have a chance any time soon to finish work on this? If not, I can try to see what I can do to tie up the loose ends. |
|
Sorry I've left this and my other pending PR open this long as I was busy changing to a new job and moving... I will have a look at the merge conflicts here; apart from this there were no intrinsic open issues with this one afaiac, just the general question how many exceptions should be caught and silenced by |
|
No problem, thanks! |
|
Later than I hoped; the actual conflicts are pretty much trivial and easily fixed – don't know why they don't merge in the first place. Not sure what exactly is the correct procedure now; do I merge the current HEAD of master and push to this branch? How can I convince git this is resolved or just take the bloody version that's in the current HEAD when there's no file to edit? I tried to |
|
OK, no idea what's going on now; I have managed to auto-merge master with this branch using the in |
|
You're messing yourself up by trying to merge the master branch into your branch. Try this instead. First go back to the last commit you made on your branch (before the merging mess): Then follow the instructions for performing a rebase: http://docs.astropy.org/en/stable/development/workflow/development_workflow.html#rebase-but-only-if-asked If at any point you get a merge conflict with Finish the rebase, resolving any issues along the way. Push the updated branch: You should end up with a pull request that has your commits and only your commits applied as diffs to the current upstream/master branch. |
added capability to detect and read files with LZMA compression (.xz) to get_readable_fileobj(), using either the builtin lzma module of Python 3.3+, or the backports.lzma package, when installed
changed xz file opening with backports.lzma to use NamedTemporaryFile; added xz cases to test_compressed and check for bzip2 support to test_read; ascii.read() raises ValueError on unreadable files
switched the remaining checks for bz2/lzma support to pytest.mark.xfail
|
Thanks, the |
|
Great, thanks @dhomeier that seems to have fixed it up correctly. If you'll just add a changelog entry for this in v1.1.0 under New Features/ |
|
Which appeared in |
|
@dhomeier Yes, in |
|
@dhomeier Or, if it's important to you to call out, it's fine to add a changelog entry to both sections. Under New Features/io.ascii say that ASCII tables can be read from LZMA compressed files. Then also note under New Features/utils that |
|
Sensible, |
|
Yeah, I need to refactor the FITS stuff to use the code from |
|
In some cases that has already been done, such as the use of |
|
Done |
|
Great, thanks! Sorry for the rebase hassles. |
ENH: added lzma support to data file handler
|
Thanks, just my learning curve with pull requests, I guess! |
This PR adds capability to transparently detect and read files with LZMA compression (
.xz) toget_readable_fileobj(), using either the builtin lzma module of Python 3.3+, or the backports.lzma package, when installed - the latter should thus only be relevant for 2.6+2.7 among the still-supported versions.Only potential issue I could think of is that
backports.lzmaneeds to open the file by its name, but since a new file handle is returned in place of the original one anyway, I don't really see a problem with that.Added a line about the decompression capabilities to the unified file read/write interface doc, with a link to
get_readable_fileobj(); maybe debatable if the latter is getting in too deep.