-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Changed QualityIO so that it uses iter explicitly on the handle, so t… #3031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3031 +/- ##
==========================================
+ Coverage 85.37% 86.30% +0.93%
==========================================
Files 282 282
Lines 59385 59467 +82
==========================================
+ Hits 50697 51323 +626
+ Misses 8688 8144 -544 ☔ View full report in Codecov by Sentry. |
|
Have you looked at the Python issue tracker to see if this is a known issue with Can you add a test using a |
|
There is some discussion on the Python mailing list from 2016, but nothing seemed to come of it: https://mail.python.org/pipermail/python-list/2016-July/862590.html The git blame on tempfile.NamedTemporaryFile says it hasn't been touched in 5+ years. So it's at least a very old bug. There are no mentions of this specific quirk on the Python bugs list that I can find. And sure i'll work on a unittest. |
|
@xzy3 Can you provide a self-contained example showing the problem? |
|
@mdehoon here you go from tempfile import NamedTemporaryFile
from Bio import SeqIO
test_seq = """\
@HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTTGAGATTTGTTGGGGGAGACATTTTTGTGATTGCCTTGAT
+HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBBRTT\]][]dddd`ddd^dddadd^BBBBBBBBBBBBBBBBBBBBBBBB
"""
with NamedTemporaryFile(mode='w+t') as fd:
fd.write(test_seq)
fd.seek(0)
next(SeqIO.parse(fd, 'fastq')) |
|
MafIO turned out to have the exact same problem. I did the same fix to that parser also. |
|
@xzy3 Thank you. It seems that this is a bug in Python. For example, we get the same problem here: >>> fd = open("mytestfile", "w+t")
>>> fd.write('something')
9
>>> fd.seek(0)
0
>>> next(fd)
'something'
>>> fd.close()
>>> from tempfile import NamedTemporaryFile
>>> fd = NamedTemporaryFile(mode='w+t')
>>> fd.write('something')
9
>>> fd.seek(0)
0
>>> next(fd)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_TemporaryFileWrapper' object is not an iterator
>>> if this is indeed a bug in Python, I don't think we should fix this in Biopython, in particular since there is a trivial workaround. |
|
@mdehoon This exact thing was brought up on the Python mailing list in 2016 and it was informally dropped as "not a bug". The argument against was that |
|
I agree, it does seem to be a bug in Python itself, and would support reporting a bug to Python directly. However, even if they do fix it, it won't be widely available for a while (and may never be backported as far at Python 3.6), so being pragmatic I am OK with adding the workaround (with tests) to Biopython. |
|
@peterjc Ok I'll report it to the Python bugtracker and reference this discussion. |
|
@peterjc Even then I don't think this needs to be fixed in Biopython. As @xzy3 wrote, one workaround is to use the |
|
@mdehoon as best I can tell there are only two places in SeqIO that use next directly on a file. I've fixed them both in this patch. |
|
@peterjc ok tests added, fixed a spot in MafIO that had the same problem. I made an issue over at the Python bug tracker and submitted a pull request python/cpython#21434 |
|
Looks fairly positive so far on your Python pull request 👍 |
|
Looks like the Python pull request is awaiting review by the core team. Even if it is merged, I am inclined to merge this fix (although we have some git conflicts to resolve first). [fixed typo] @mdehoon would you prefer to wait and see what the Python team decides? |
|
@peterjc Yes we should wait and see the Python team decides. |
|
Well I committed a brain-o and got the first pull request in a bad state. I had to restart it so it's now here python/cpython#22766 |
|
Let's hope the CPython team accept that - thank you. |
d496faf to
50f2e60
Compare
Bio/SeqIO/FastaIO.py
Outdated
| super().__init__(source, fmt="Fasta") | ||
| try: | ||
| line = next(self.stream) | ||
| line = next(iter(self.stream)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use line = self.stream.readline()?
Bio/SeqIO/PirIO.py
Outdated
| raise StopIteration | ||
| stream = self.stream | ||
| # NamedTemporaryFiles don't support __next__ so create an iter as a workaround | ||
| stream = iter(self.stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need this here? We are not calling next on stream, and for line in stream implicitly calls iter on stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 155 is next(stream).step() which breaks in the unit tests when run with NamedTempFiles.
Bio/SeqIO/QualityIO.py
Outdated
| # handle as a work around. | ||
| handle_iter = iter(handle) | ||
| try: | ||
| line = next(handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use readline instead?
Bio/SeqIO/QualityIO.py
Outdated
| if line is None: | ||
| try: | ||
| line = next(self.stream) | ||
| line = next(stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. The current fix required calling iter each time we enter __next__.
50f2e60 to
735c2e2
Compare
|
switched to |
Bio/SeqIO/FastaIO.py
Outdated
| line = next(self.stream) | ||
| except StopIteration: | ||
| line = self.stream.readline() | ||
| if line == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if not line: is considered better stylistically.
735c2e2 to
32d5038
Compare
|
|
||
| identifier = line[4:].strip() | ||
| description = next(stream).strip() | ||
| description = self.stream.readline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to call .strip() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called 3 lines down after checking if the empty string was returned to indicate EOF.
Bio/SeqIO/QualityIO.py
Outdated
| self.line = line | ||
| if line is None: | ||
| line = self.stream.readline() | ||
| if line is None or line == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line cannot be None here.
Bio/AlignIO/MafIO.py
Outdated
| # allows parsing of the last bundle without duplicating code | ||
| try: | ||
| line = next(handle) | ||
| line = next(handle_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use .readline() here instead of creating handle_iter?
tempfile.NamedTemporaryFiles do not support __next__.
32d5038 to
1d18ede
Compare
|
Do I need to do anything else? |
|
Thank you both. |
…hat tempfile.NamedTemporaryFile can be used easily.
3.7.0 (default, Aug 15 2018, 09:03:42)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
CPython
Linux-3.10.0-1127.e17.x86_64-x86_64-with-centos-7.8.2003-Core
Bio.version 1.77
This line:
biopython/Bio/SeqIO/QualityIO.py
Line 923 in 380932d
has broken reading from
tempfile.NamedTemporaryFileunless you do thisSeqIO.parse(tmp_fd.file, 'fastq'). NamedTemporaryFile returns https://github.com/python/cpython/blob/1bf9cc509326bc42cd8cb1650eb9bf64550d817e/Lib/tempfile.py#L457which is for some reason seems to not be its own iterator. I don't see any recent changes to the
tempfilemodule to cause this.I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rstfile, have runflake8locally, andunderstand that AppVeyor and TravisCI will be used to confirm the Biopython unit
tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rstandCONTRIB.rstas part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)