Skip to content

Remove compatibility code which forced loading all HDUs on close#6082

Merged
bsipocz merged 6 commits intoastropy:masterfrom
saimn:fits-close-file
Jun 12, 2017
Merged

Remove compatibility code which forced loading all HDUs on close#6082
bsipocz merged 6 commits intoastropy:masterfrom
saimn:fits-close-file

Conversation

@saimn
Copy link
Contributor

@saimn saimn commented May 16, 2017

Closes #5582, following the discussion there, i.e. Option 2: Remove the compatibility behavior for which almost everybody agreed. To get the old behavior, users can use lazy_load_hdus=False which is equivalent by forcing the reading of all HDUs. So basically this reverts 5c66c2d. And the global option for lazy_load_hdus exists already, so I just added a test to make sure it works.

ping @embray

@saimn saimn added the io.fits label May 16, 2017
@saimn saimn added this to the v2.0.0 milestone May 16, 2017
@pllim
Copy link
Member

pllim commented May 16, 2017

Yay. Still needs a change log.

@saimn
Copy link
Contributor Author

saimn commented May 16, 2017

@pllim - yep, I should play the "guess PR number" game ;)

assert phdr['FILENAME'] == 'labq01i3q_rawtag.fits'

@raises(ValueError)
@raises(IndexError)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "unexpected" change is this: with the new behavior the index error when accessing the HDU happens before the ValueError that was raised when trying to access the data. So I think it's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone who currently catches that as ValueError will need to change their code, so this needs to be mentioned in your change log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the contextmanager instead? Doesn't have to be for all tests but since we're changing this one it might be a perfect opportunity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saimn , do you plan to address the context manager comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@embray - Thanks for reviewing this! You're right about the exception message that should be more helpful, I will implement this. But I'm also confused by your last sentence, is it possible to have _read_all = True if the file is not completely read ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@embray - self._read_all is set to True when the file is closed (https://github.com/astropy/astropy/blob/master/astropy/io/fits/hdu/hdulist.py#L1146). So it would be necessary to keep the part of the code deleted in this PR which computes self._total_read ?
There must be a way to know if the file was fully read (with the EOFError block I guess), but currently it is not possible, so the commit I just pushed modifies the exception message unconditionally for now.

Maybe it's good enough ? I mean is it useful to add more complexity to something already quite complex, just to get a more correct exception message for a very specific case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, maybe this is actually a bit tricky to determine, but my thinking was that you know the size of the file, and when you read a header you know, at least for a well-formed file, how many bytes should be left in the file (for the data in that HDU). And if there are more bytes beyond the end of the data then you haven't read the whole file yet. There could still be more HDUs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree it could be done, I'm just not sure of the best way. And self._read_all should already give this information, so is it better to add a new way to know if the file was fully read ? Instead why not simply removing the line that set self._read_all to True when the file is closed ? I guess this line is there to avoid running again the beginnning of _read_next_hdu, but if the file is closed it's possible anyway to return earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@embray - I implemented my last suggestion (7381a70), this way _read_all cannot be set to True after the file is closed. Does this work for you ?

@pllim
Copy link
Member

pllim commented May 16, 2017

Guess the PR number games have been successful for me. 😉

@saimn saimn force-pushed the fits-close-file branch from 450b078 to 7b9c718 Compare May 16, 2017 20:26
CHANGES.rst Outdated
[#5929]

- Remove compatibility code which forced loading all HDUs on close. The old
behavior can be used with `lazy_load_hdus=False`. Because of this change,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always use double backticks in the changelog, we never do intersphinx linking from here. Fixing this and the next line will get rid of the docs build error on travis, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry, and yet I know it ...

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside of minor comments, LGTM. Also need squashing (and maybe rebase while you are at it)?


def test_open_3(self):
# Test that HDUs cannot be accessed after the file was closed
r = fits.open(self.data('test0.fits'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test the context manager (since that is how we recommend open be done anyway, or at least I do) or does it not matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really matter here as the file is closed just after.

CHANGES.rst Outdated
- Remove compatibility code which forced loading all HDUs on close. The old
behavior can be used with ``lazy_load_hdus=False``. Because of this change,
trying to access the ``.data`` attribute from an HDU which is not loaded now
raises a IndexError instead of a ValueError. [#6082]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Also add double-backticks to IndexError and ValueError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I found the two versions in the CHANGES file, so I was not sure which one is preferred.

@saimn saimn force-pushed the fits-close-file branch from 865a697 to a6e4132 Compare May 17, 2017 18:57
@saimn
Copy link
Contributor Author

saimn commented May 17, 2017

@pllim - Rebased and squashed.

@pllim pllim requested a review from MSeifert04 May 17, 2017 19:38
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# the deprecation period has passed.
if self._file and self._file.closed and self._total_read is not None:
if index > self._total_read - 1:
warnings.warn(textwrap.dedent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a shame, we lose the most verbose warning message I've ever seen... 😄

@MSeifert04
Copy link
Contributor

It's a good sign that the tests are passing and the changes look alright. :)

@embray
Copy link
Member

embray commented May 22, 2017

I have one small-ish comment, but otherwise LGTM thanks @saimn !

@bsipocz
Copy link
Member

bsipocz commented May 23, 2017

Thank you @embray for reviewing this!

@saimn saimn force-pushed the fits-close-file branch from 7381a70 to 35c7432 Compare May 30, 2017 17:08
@saimn
Copy link
Contributor Author

saimn commented May 30, 2017

Rebased to get rid off a changelog conflict. @embray are you happy with the _read_all solution from 7381a70 ?

@saimn saimn force-pushed the fits-close-file branch from 35c7432 to 37550b3 Compare June 6, 2017 22:23
@saimn
Copy link
Contributor Author

saimn commented Jun 6, 2017

Not sure if the tests failures are real or related to some pytest change (errors in pytest_runtest_teardown about unclosed files: https://travis-ci.org/astropy/astropy/jobs/237603430#L2494). So rebase to check again with Travis. Also I fixed another ResourceWarning error that was causing an other test to fail.

@saimn
Copy link
Contributor Author

saimn commented Jun 7, 2017

Ok, so the test failures are indeed related to 88315b3 and the --open-files option/plugin. But really I don't understand why, as this commit does not change the behavior [1]. Many fits tests does not close FITS file, before 88315b3 these are not detected by --open-files and now they are.

Example before 88315b3, where --open-files does not fail the test, but there are two ResourceWarning which show that files are not closed:

astropy/io/fits/tests/test_image.py::test_comphdu_bscale PASSED

====================================== warnings summary =======================================
None
  [pytest] section in setup.cfg files is deprecated, use [tool:pytest] instead.

lib.linux-x86_64-3.6/astropy/io/fits/tests/test_image.py::test_comphdu_bscale
  /tmp/simon/astropy-test-j_7zc2ao/lib.linux-x86_64-3.6/astropy/io/fits/tests/test_image.py:1771: ResourceWarning: unclosed file <_io.FileIO name='/tmp/simon/pytest-of-simon/pytest-12/test_comphdu_bscale0/3hdus.fits' mode='rb' closefd=True>
    hdus = fits.open(filename2)
  /home/simon/miniconda3/lib/python3.6/site-packages/_pytest/python.py:142: ResourceWarning: unclosed file <_io.FileIO name='/tmp/simon/pytest-of-simon/pytest-12/test_comphdu_bscale0/3hdus_comp.fits' mode='rb' closefd=True>
    testfunction(**testargs)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
===================================== 69 tests deselected =====================================
===================== 1 passed, 69 deselected, 3 warnings in 1.18 seconds =====================

After 88315b3, --open-files raise an error for these files.
Maybe something because in the first case (before 88315b3) it tries to read past the end of the file (see [1] above, this is the only change I think). It may also depends on how psutil determine the open files.

@pllim
Copy link
Member

pllim commented Jun 7, 2017

What PR was that commit tied to? My GitHub-fu is insufficient to sniff that out.

@bsipocz
Copy link
Member

bsipocz commented Jun 7, 2017

@pllim - To this one, that is the 3rd commit here.

@pllim
Copy link
Member

pllim commented Jun 7, 2017

Oh... 😅 😅 😅

@saimn
Copy link
Contributor Author

saimn commented Jun 7, 2017

Hehe ! The goal of this commit was to have different messages for the IndexError, if the file was read to the end or not (see #6082 (comment)).

@DougBurke
Copy link
Contributor

I'm not sure how related the following is, but I thought I'd mention it here before opening a separate issue. I also apologize for not having time to check against the current master branch, but I only have AstroPy 1.3.3 easily at hand.

In Sherpa we do "interesting" things for some of our file I/O, including trying to read in a file as FITS when it may not be. This ends up creating ResourceWarning warnings in out tests. At least some of these appear to be AstroPy related (i.e. not closing the file up on error) as shown in sherpa/sherpa#365 - which I will repeat the example code here:

% cat dummy.dat 
# A B
0.5 1.65
% cat dummy.py
import warnings
from astropy.io import fits

warnings.simplefilter(action='error', category=ResourceWarning)

try:
    fits.open('dummy.dat')
except OSError as e:
    print("--> {}".format(e))
% 

Running this script (AstroPy 1.3.3 and Python 3.5) produces the warning:

% python dummy.py 
--> Header missing END card.
Exception ignored in: <_io.FileIO name='dummy.dat' mode='rb' closefd=True>
ResourceWarning: unclosed file <_io.FileIO name='dummy.dat' mode='rb' closefd=True>

@bsipocz
Copy link
Member

bsipocz commented Jun 8, 2017

@DougBurke - I see the same warning on master.

@saimn
Copy link
Contributor Author

saimn commented Jun 8, 2017

@DougBurke - This warning is perfectly normal, as the file is not closed in your example (you should add a finally block to close the file, or use the with statement).

@pllim
Copy link
Member

pllim commented Jun 8, 2017

Just curious, what if you set memmap=False?

@DougBurke
Copy link
Contributor

@pllim (assuming you're talkiing to me; threading is not ideal here!) - I get the same warning if I use the following program:

import warnings
from astropy.io import fits

warnings.simplefilter(action='error', category=ResourceWarning)

try:
    fits.open('dummy.dat', memmap=False)
except OSError as e:
    print("--> {}".format(e))

@saimn
Copy link
Contributor Author

saimn commented Jun 8, 2017

Ooh okay, understood, in this case because of the exception the HDUList object is never returned, and the underlying file pointer cannot be closed. Probably not easy to fix ...

@DougBurke
Copy link
Contributor

Is it worth me opening a new ticket for this issue then @saimn ?

@pllim
Copy link
Member

pllim commented Jun 8, 2017

@DougBurke , yes.

@DougBurke
Copy link
Contributor

As GitHub has already noted, I've created #6168 to track this - thanks for the input and sorry for highjacking this PR.

@saimn
Copy link
Contributor Author

saimn commented Jun 9, 2017

So, back to my open files issue (#6082 (comment)): I found why most of the tests do not raise an error with --open-files when they do not close files: https://github.com/astropy/astropy/blob/master/astropy/io/fits/tests/__init__.py#L27
This applies for all tests in classes that inherit from FitsTestCase, and explains why the new failures I see here are with tests in standalone functions. But I still don't understand why the behavior changed for these failing tests ...

saimn added 5 commits June 10, 2017 00:12
Closes astropy#5582. astropy#5065 implemented lazy loading of HDUs in fits files. But
it did it in a way that maintains backwards compatibility at the expense
of performance, loading the remaining HDUs to allow read the headers
after the file was closed. Basically this reverts
5c66c2d
@saimn saimn force-pushed the fits-close-file branch from b67916e to 60bcb1f Compare June 9, 2017 22:14
@saimn
Copy link
Contributor Author

saimn commented Jun 9, 2017

I think I found the culprit, the function I added to handle the IndexError was keeping a reference to the HDUList object, preventing the file pointer to be closed by the gc ...

@DougBurke
Copy link
Contributor

@saimn does this change mean that my proposed change at #6168 (comment) isn't needed, or are they separate issues?

@saimn
Copy link
Contributor Author

saimn commented Jun 9, 2017

@DougBurke - It's not related, but it makes me think about a workaround for your issue (to avoid the ResourceWarning). Opening the file before passing it to fits.open allows to close it after:

try:
    with open('dummy.dat', 'rb') as f:
        fits.open(f)
except OSError as e:
    print("--> {}".format(e))

@saimn
Copy link
Contributor Author

saimn commented Jun 9, 2017

And tests pass now 🕺 🎉

@saimn
Copy link
Contributor Author

saimn commented Jun 12, 2017

Ok I think it's good to go now !

@bsipocz
Copy link
Member

bsipocz commented Jun 12, 2017

There was agreement on this above, so merging this now. Thanks @saimn!

@bsipocz bsipocz merged commit 694c1c7 into astropy:master Jun 12, 2017
@saimn saimn deleted the fits-close-file branch June 12, 2017 08:10
@DougBurke
Copy link
Contributor

@saimn - thanks for the suggestion in #6082 (comment) - unfortunately our code isn't written in a way that lets us easily use this approach, but I'll keep it in mind as we refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants