Skip to content

SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator#4937

Merged
SethTisue merged 1 commit intoscala:2.11.xfrom
szeiger:issue/9623-2.11
Feb 8, 2016
Merged

SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator#4937
SethTisue merged 1 commit intoscala:2.11.xfrom
szeiger:issue/9623-2.11

Conversation

@szeiger
Copy link
Contributor

@szeiger szeiger commented Feb 1, 2016

These iterator implementations are used to concatenate two (JoinIterator)
or more (ConcatIterator) other iterators with ++. They used to perform
many unnecessary calls to the child iterators’ hasNext methods.

This improved state machine-based implementation reduces that number to
the bare minimum, i.e. iterating over concatenated iterators with
foreach calls the children's hasNext methods a total of (number of
children) + (number of elements) times, the same as when iterating over
all children separately.

These iterator implementations are used to concatenate two (JoinIterator)
or more (ConcatIterator) other iterators with `++`. They used to perform
many unnecessary calls to the child iterators’ `hasNext` methods.

This improved state machine-based implementation reduces that number to
the bare minimum, i.e. iterating over concatenated iterators with
`foreach` calls the children's `hasNext` methods a total of (number of
children) + (number of elements) times, the same as when iterating over
all children separately.
@SethTisue
Copy link
Member

review by @Ichoran plz

@Ichoran
Copy link
Contributor

Ichoran commented Feb 4, 2016

Well, this is an improvement but the algorithms, especially for ConcatIterator, are still not very efficient. The primary culprit is the incredibly expensive :+ and head/tail decomposition on a Vector. This just isn't a good way to switch things rapidly (though at least it scales well). Furthermore, it checks when adding a long LHS to a short RHS, but ++ doesn't check the RHS to see if maybe that is a ConcatIterator too, and it should really merge the lists.

I'm not sure what the goal here is. Better? Yes. Looks correct? Yes. Achieved ideal performance? No.

It does seem to fix the "too many hasNext" problem, which is great!

(But on the other hand one could also argue that iterators with an expensive hasNext that don't cache the result are poorly-written iterators. Why is it the consumer's job to maintain extra state just in case?)

@szeiger
Copy link
Contributor Author

szeiger commented Feb 4, 2016

The primary culprit is the incredibly expensive :+ and head/tail decomposition on a Vector. This just isn't a good way to switch things rapidly (though at least it scales well).

There should be a better way. I'll do some benchmarking to find out.

Furthermore, it checks when adding a long LHS to a short RHS, but ++ doesn't check the RHS to see if maybe that is a ConcatIterator too, and it should really merge the lists.

Right, ConcatIterator originally came to pass to prevent consuming this iterator from SOEing:

(1 to n).foldLeft(start)((res, _) => res ++ Iterator(1))

But we have the same situation on the RHS. This still causes a SOE:

(1 to n).foldLeft(start)((res, _) => Iterator(1) ++ res)

(But on the other hand one could also argue that iterators with an expensive hasNext that don't cache the result are poorly-written iterators. Why is it the consumer's job to maintain extra state just in case?)

That was my first thought as well, but repeatedly calling hasNext on an already exhausted Iterator looks like a rather unexpected call pattern to me. I wouldn't necessarily expect every implementation to be optimized for this case.

@szeiger
Copy link
Contributor Author

szeiger commented Feb 4, 2016

Here's a much faster version: 07db071

It's not binary compatible though. I suggest we use the fix here for 2.11.8 and consider the new implementation for 2.12.

@Ichoran
Copy link
Contributor

Ichoran commented Feb 6, 2016

LGTM

Agreed--use this for 2.11, and the even better version for 2.12. (The other one also looks good to me, save for one place where it looks like last doesn't release its contents.)

SethTisue added a commit that referenced this pull request Feb 8, 2016
SI-9623 Avoid unnecessary hasNext calls in JoinIterator & ConcatIterator
@SethTisue SethTisue merged commit ef4ed49 into scala:2.11.x Feb 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants