reverse iteration for eachline#42225
Conversation
|
It would be easy to restrict this to seekable Update: I've changed the PR to drop non-seekable support, since that seemed controversial. (I saved the non-seekable branch at https://github.com/stevengj/julia/tree/sgj/reverse-eachline-nonseekable.) |
|
Unrelated CI error on freebsd64: |
52f8cb4 to
b72fdf3
Compare
b72fdf3 to
e26b3d2
Compare
|
Apparently unrelated error on freebsd64 ( This PR should be ready for review? |
e26b3d2 to
905ba00
Compare
|
Updated to handle CRLF line endings, with a test. |
|
(CI failures on freebsd64 and win32 are unrelated: |
|
Looks like JuliaLang/LinearAlgebra.jl#873? |
c812e9e to
c93259e
Compare
|
Rebased. Hopefully can be merged if green? |
This PR implements
Iterators.Reverseiteration for theEachLineiterator, so that now e.g.last(eachline(io))will work. (This seems to have been requested multiple times on discourse etcetera, and is nontrivial for users implement efficiently themselves. It also seems to be a pretty common question in Python, where the most common answers involve reading the whole file to get the last few lines., or executing an externaltailprogram.)For seekable streams, it works by reading the data from the end in 4kiB chunks. Performance seems comparable to forwards iteration.
For non-seekable streams,
it falls back to reading the whole stream into memory. However, forit throws an exception whenlast(eachline(...))andtake(Reverse(eachline(...)), n)it uses an algorithm (similar to how the Unixtailfunction works) that requires only bounded memory — it reads the file in a sequence of 4kiB chunks, but discards earlier chunks if they are not needed for the requested number of lines.seekendorseekis called.I noticed that
first(r::Reverse)was callinglast(r.itr), which fails for non-indexable iterators becauselast(itr)defaults toitr[end], so I changed it to use the default implementation (in abstractarrays.jl) which iterates once. Otherwise, you can't implementlast(itr::MyType)byfirst(Iterators.Reverse(itr)). This is analogous to howlast(itr, n)is already implemented. (It might be good to change the defaultlast(itr)method, but that's a topic for another PR.)