SI-9581 Fix overflow on Vector take and drop methods#4876
SI-9581 Fix overflow on Vector take and drop methods#4876SethTisue merged 1 commit intoscala:2.11.xfrom
Conversation
Fixes the index/length comparison in `Vector#take` and `Vector#drop` so that they handle all possible integer values. Given the collection's invariants `startIndex >= endIndex` and `0 >= startIndex, endIndex`, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases when `n <= 0` are handled beforehand, `endIndex - n` cannot overflow, contrary to `startIndex + n`. If without the danger of overflows the condition yields true, on the other hand, `startIndex + n` cannot overflow as it is smaller than `endIndex` (as the previous formulation of the condition shows).
|
👍 for the prompt fix proposal and for spotting the same problem in My only two concerns here:
But I'm far from the maintainer — I guess @Ichoran should review this, also because of https://twitter.com/travisbrown/status/673504795945123840. |
|
LGTM For what it's worth, the collections are absolutely riddled with bounds checks that may fail around Int.MaxValue. I don't believe it is worth dropping comments on every dangerous line that the subtraction is done this way to avoid bounds checks--it adds too much clutter and doesn't necessarily help when new code is written. Instead, there needs to be a culture among reviewers of thinking about corner cases like that, and possibly a top-level document specifying best practices (e.g. |
|
I agree with @Ichoran regarding the practicality and usefulness of comments in this cases. However, I'm more fond of unit testing thoroughly these corner cases rather than leaving that "responsibility" to the reviewers. @Ichoran, that would be a fine use case for your scala-collections-laws project, wouldn't it? I must also note that this bug was particularly tricky to find, as it doesn't reveal itself in collections created directly, e.g. |
|
@ruippeixotog - It's a little hard to know what to test, but yes, taking |
That's one of the special reasons why code review can be more effective than testing — once you know to look for it. There's also general experience showing it's more effective than testing, if done at the appropriate speed — and reviewer time is sorely lacking. One advantage of Maybe next we have a volunteer at poking holes in {Hash,}Map and {Hash,}Set? Luckily, they don't reimplement To the credit of the implementors, Maybe those clones can be consolidated? |
Yes!!
I must disagree. Anyone making serious use of scalacheck would find it rather quickly. I think the reason it took so long to find is that few people used both scalacheck and This seems like an argument to aim to one day have property based testing in the standard library. |
While we're speculating, I'd like you to provide evidence. and in fact, it was born while Travis Brown was debugging some code of his (IIUC his story). If you don't know you need I might agree that experts in This is still unrealistic, but harder to check: or, more realistically: For extra fun, you need a big enough integer to trigger the fault, so finding any fault requires a clever probability distribution for the involved sizes. I've not tested those examples, and I'll be positively impressed if some reasonable variant actually works. |
|
@Blaisorblade All I know is that it seemed to surface pretty easily here:
|
|
As I argued above, in collections whose nature allows two of their instances to have different internal representations while still being considered equal in Scala, finding bugs can be tricky and specialized generators need to be built in order to ensure proper property coverage. It may not be an easy or quick task and it may require knowledge of the collection's internals, but I still think it's an effective way to prevent corner case bugs. Not infallible, but efficient. I followed that approach in (btw, regarding the need for a big enough integer, ScalaCheck is actually sensitive to bounds and special cases; when you ask for arbitrary integers, the lower and upper bounds are always tested, as well as 0, 1 and -1 [source]) |
|
To clarify: I'm also pro-unit-testing and pro-conscious use of ScalaCheck, I was just agreeing with you that finding such a bug would not be trivial. Kudos to putting the effort in in your case! (After seeing this bug, I wonder whether your |
|
Yeah, it could be done in the generator. In line 58 it is verified that the application of any sequence of delete/insert operations always results in a valid red-black tree, which should guarantee the integrity of the other methods. But it would be safer if such a construction was used in the generator too. I also have the idea that although there are multiple possible internal representations for a red-black tree with a given set of elements, all representations can be constructed by a given sequence of insertions. Equivalently, there are no internal representation that can only be achieved by deleting nodes. |
Note that ScalaCheck is a dependency of scala/scala, and adding ScalaCheck-based tests to the repo is encouraged. |
SI-9581 Fix overflow on Vector take and drop methods
|
thank you @ruippeixotog! the fix is very much appreciated |
Was unaware of that. Glad to hear! |
Fixes the index/length comparison in
Vector#takeandVector#dropso that they handle all possible integer values.Given the collection's invariants
startIndex >= endIndexand0 >= startIndex, endIndex, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases whenn <= 0are handled beforehand,endIndex - ncannot overflow, contrary tostartIndex + n. If without the danger of overflows the condition yields true, on the other hand,startIndex + ncannot overflow as it is smaller thanendIndex(as the previous formulation of the condition shows).