SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator#4937
SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator#4937SethTisue merged 1 commit intoscala:2.11.xfrom
Conversation
These iterator implementations are used to concatenate two (JoinIterator) or more (ConcatIterator) other iterators with `++`. They used to perform many unnecessary calls to the child iterators’ `hasNext` methods. This improved state machine-based implementation reduces that number to the bare minimum, i.e. iterating over concatenated iterators with `foreach` calls the children's `hasNext` methods a total of (number of children) + (number of elements) times, the same as when iterating over all children separately.
|
review by @Ichoran plz |
|
Well, this is an improvement but the algorithms, especially for ConcatIterator, are still not very efficient. The primary culprit is the incredibly expensive :+ and head/tail decomposition on a Vector. This just isn't a good way to switch things rapidly (though at least it scales well). Furthermore, it checks when adding a long LHS to a short RHS, but I'm not sure what the goal here is. Better? Yes. Looks correct? Yes. Achieved ideal performance? No. It does seem to fix the "too many hasNext" problem, which is great! (But on the other hand one could also argue that iterators with an expensive |
There should be a better way. I'll do some benchmarking to find out.
Right, ConcatIterator originally came to pass to prevent consuming this iterator from SOEing: But we have the same situation on the RHS. This still causes a SOE:
That was my first thought as well, but repeatedly calling |
|
Here's a much faster version: 07db071 It's not binary compatible though. I suggest we use the fix here for 2.11.8 and consider the new implementation for 2.12. |
|
LGTM Agreed--use this for 2.11, and the even better version for 2.12. (The other one also looks good to me, save for one place where it looks like |
SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator
These iterator implementations are used to concatenate two (JoinIterator)
or more (ConcatIterator) other iterators with
++. They used to performmany unnecessary calls to the child iterators’
hasNextmethods.This improved state machine-based implementation reduces that number to
the bare minimum, i.e. iterating over concatenated iterators with
foreachcalls the children'shasNextmethods a total of (number ofchildren) + (number of elements) times, the same as when iterating over
all children separately.