SI-9581 Fix overflow on Vector take and drop methods by ruippeixotog · Pull Request #4876 · scala/scala

ruippeixotog · 2015-12-06T19:26:18Z

Fixes the index/length comparison in Vector#take and Vector#drop so that they handle all possible integer values.

Given the collection's invariants startIndex >= endIndex and 0 >= startIndex, endIndex, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases when n <= 0 are handled beforehand, endIndex - n cannot overflow, contrary to startIndex + n. If without the danger of overflows the condition yields true, on the other hand, startIndex + n cannot overflow as it is smaller than endIndex (as the previous formulation of the condition shows).

Fixes the index/length comparison in `Vector#take` and `Vector#drop` so that they handle all possible integer values. Given the collection's invariants `startIndex >= endIndex` and `0 >= startIndex, endIndex`, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases when `n <= 0` are handled beforehand, `endIndex - n` cannot overflow, contrary to `startIndex + n`. If without the danger of overflows the condition yields true, on the other hand, `startIndex + n` cannot overflow as it is smaller than `endIndex` (as the previous formulation of the condition shows).

Blaisorblade · 2015-12-06T19:43:56Z

👍 for the prompt fix proposal and for spotting the same problem in drop.

My only two concerns here:

If I were, I'd require a comment inline to warn anybody modifying this source — in both places.
Your reasoning looks correct to me, and avoiding unsigned comparison might help performance a bit (not sure it's ever noticeable, but... ). I went for unsigned comparison because I wonder if we need such fixes on other collections, and because using that is a bit easier. Not sure what's the policy here.

But I'm far from the maintainer — I guess @Ichoran should review this, also because of https://twitter.com/travisbrown/status/673504795945123840.

Ichoran · 2015-12-07T04:28:09Z

LGTM

For what it's worth, the collections are absolutely riddled with bounds checks that may fail around Int.MaxValue. I don't believe it is worth dropping comments on every dangerous line that the subtraction is done this way to avoid bounds checks--it adds too much clutter and doesn't necessarily help when new code is written. Instead, there needs to be a culture among reviewers of thinking about corner cases like that, and possibly a top-level document specifying best practices (e.g. end-n instead of start+n).

ruippeixotog · 2015-12-07T11:28:02Z

I agree with @Ichoran regarding the practicality and usefulness of comments in this cases. However, I'm more fond of unit testing thoroughly these corner cases rather than leaving that "responsibility" to the reviewers. @Ichoran, that would be a fine use case for your scala-collections-laws project, wouldn't it?

I must also note that this bug was particularly tricky to find, as it doesn't reveal itself in collections created directly, e.g. Vector(0 to 64: _*), but it happens on vectors constructed like in the unit test. With some collections several instantiation processes must be tested, as collections deemed equal may have different internal representations.

Ichoran · 2015-12-07T22:54:01Z

@ruippeixotog - It's a little hard to know what to test, but yes, taking Int.MaxValue should produce the identical collection to the original, and I can add that as a law. (Likewise, dropping that many should always result in an empty collection.) But that's one of many possible bounds bugs. Every method that can help is probably needed to improve things as much as they should be (improved tests, improved culture, etc.).

Blaisorblade · 2015-12-08T00:10:01Z

I must also note that this bug was particularly tricky to find, as it doesn't reveal itself in collections created directly, e.g. Vector(0 to 64: _*), but it happens on vectors constructed like in the unit test.

That's one of the special reasons why code review can be more effective than testing — once you know to look for it. There's also general experience showing it's more effective than testing, if done at the appropriate speed — and reviewer time is sorely lacking.

One advantage of unsignedCompare is that it makes this problem more obvious to the readers.

Maybe next we have a volunteer at poking holes in {Hash,}Map and {Hash,}Set? Luckily, they don't reimplement take and drop, but inherit them from IterableLike, which looks safe.

To the credit of the implementors, unsignedCompare already shows up in the library, even in two clones, in the two places where I'd most have feared this:
https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/immutable/HashSet.scala#L1022-L1024
https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/generic/BitOperations.scala
https://github.com/scala/scala/search?utf8=%E2%9C%93&q=unsignedCompare

Maybe those clones can be consolidated?

jedesah · 2015-12-13T20:59:14Z

However, I'm more fond of unit testing thoroughly these corner cases rather than leaving that "responsibility" to the reviewers.

Yes!!

I must also note that this bug was particularly tricky to find

I must disagree. Anyone making serious use of scalacheck would find it rather quickly. I think the reason it took so long to find is that few people used both scalacheck and Vector (could be wrong).

This seems like an argument to aim to one day have property based testing in the standard library.

Blaisorblade · 2015-12-13T21:36:08Z

I must also note that this bug was particularly tricky to find

Anyone making serious use of scalacheck would find it rather quickly. I think the reason it took so long to find is that few people used both scalacheck and Vector (could be wrong).

While we're speculating, I'd like you to provide evidence.
But I'd like to see an actually obvious test where ScalaCheck actually finds the bug. The original one doesn't look so natural to me:

org.scalacheck.Prop.forAll { (x: Vector[Int], y: Vector[Int]) =>
  (x ++ y).take(Int.MaxValue); true
}.check(_.withMaxSize(999))

and in fact, it was born while Travis Brown was debugging some code of his (IIUC his story). If you don't know you need ++ to trigger a bug, you need to hope your Arbitrary[Vector] instance covers that space. (ScalaCheck's generator don't seem to be written to test the standard library).

I might agree that experts in Vector's implementation using ScalaCheck seriously enough should realize the need for such a case and test their generator to ensure coverage, but writing good generators is not "quick" — see the original QuickCheck paper for extended discussion of the pitfalls.

This is still unrealistic, but harder to check:

org.scalacheck.Prop.forAll { (x: Vector[Int], y: Int) =>
  x.take(y); true
}

or, more realistically:

org.scalacheck.Prop.forAll { (l: Vector[Int], n: Int) =>
  l.take(n) ++  l.drop(n) == l
}

For extra fun, you need a big enough integer to trigger the fault, so finding any fault requires a clever probability distribution for the involved sizes.

I've not tested those examples, and I'll be positively impressed if some reasonable variant actually works.

jedesah · 2015-12-13T22:06:06Z

@Blaisorblade All I know is that it seemed to surface pretty easily here:

https://github.com/quasar-analytics/quasar/blob/feature/translate_web_api_for_free/web/src/test/scala/quasar/api/services/DataServiceSpec.scala#L139

filesystem.contents is the Vector in that code. I am fairly certain that the Arbitrary for SingleFileMemState uses the default Arbitrary instance for Vector.

ruippeixotog · 2015-12-13T22:45:07Z

As I argued above, in collections whose nature allows two of their instances to have different internal representations while still being considered equal in Scala, finding bugs can be tricky and specialized generators need to be built in order to ensure proper property coverage. It may not be an easy or quick task and it may require knowledge of the collection's internals, but I still think it's an effective way to prevent corner case bugs. Not infallible, but efficient.

I followed that approach in mutable.TreeMap and built a set of ScalaCheck properties not only for the collection itself, but also for its underlying RedBlackTree structure. I can't know for sure if there are still any bugs in the code that are not covered by the properties, but as I was developing the data structure I can say they helped me find some corner cases I wasn't counting on.

(btw, regarding the need for a big enough integer, ScalaCheck is actually sensitive to bounds and special cases; when you ask for arbitrary integers, the lower and upper bounds are always tested, as well as 0, 1 and -1 [source])

Blaisorblade · 2015-12-14T02:53:26Z

To clarify: I'm also pro-unit-testing and pro-conscious use of ScalaCheck, I was just agreeing with you that finding such a bug would not be trivial. Kudos to putting the effort in in your case! (After seeing this bug, I wonder whether your genRedBlackTree could increase assurance by also trying deletions; I've seen your delete property, but that won't test e.g. min or max after delete. OTOH, I'm not sure this extra testing would be worth the time to write it).

ruippeixotog · 2015-12-14T22:23:02Z

Yeah, it could be done in the generator. In line 58 it is verified that the application of any sequence of delete/insert operations always results in a valid red-black tree, which should guarantee the integrity of the other methods. But it would be safer if such a construction was used in the generator too.

I also have the idea that although there are multiple possible internal representations for a red-black tree with a given set of elements, all representations can be constructed by a given sequence of insertions. Equivalently, there are no internal representation that can only be achieved by deleting nodes. ~~I can't give you a proof however as I'm not 100% sure of that, just intuition :)~~ All valid red-black trees (and binary search trees) can be constructed by inserting its elements as seen in a breadth-first traversal :)

SethTisue · 2015-12-15T18:29:46Z

This seems like an argument to aim to one day have property based testing in the standard library

Note that ScalaCheck is a dependency of scala/scala, and adding ScalaCheck-based tests to the repo is encouraged.

SI-9581 Fix overflow on Vector take and drop methods

SethTisue · 2015-12-15T18:30:48Z

thank you @ruippeixotog! the fix is very much appreciated

jedesah · 2015-12-15T19:20:52Z

Note that ScalaCheck is a dependency of scala/scala, and adding ScalaCheck-based tests to the repo is encouraged.

Was unaware of that. Glad to hear!

scala-jenkins added this to the 2.11.8 milestone Dec 6, 2015

scala-jenkins added the reviewed label Dec 7, 2015

SethTisue added a commit that referenced this pull request Dec 15, 2015

Merge pull request #4876 from ruippeixotog/issue/9581

7559aed

SI-9581 Fix overflow on Vector take and drop methods

SethTisue merged commit 7559aed into scala:2.11.x Dec 15, 2015

scabug mentioned this pull request Apr 7, 2017

Vector.take(Int.MaxValue) fails because of integer overflow scala/bug#9581

Closed

Conversation

ruippeixotog commented Dec 6, 2015

Uh oh!

Blaisorblade commented Dec 6, 2015

Uh oh!

Ichoran commented Dec 7, 2015

Uh oh!

ruippeixotog commented Dec 7, 2015

Uh oh!

Ichoran commented Dec 7, 2015

Uh oh!

Blaisorblade commented Dec 8, 2015

Uh oh!

jedesah commented Dec 13, 2015

Uh oh!

Blaisorblade commented Dec 13, 2015

Uh oh!

jedesah commented Dec 13, 2015

Uh oh!

ruippeixotog commented Dec 13, 2015

Uh oh!

Blaisorblade commented Dec 14, 2015

Uh oh!

ruippeixotog commented Dec 14, 2015

Uh oh!

SethTisue commented Dec 15, 2015

Uh oh!

SethTisue commented Dec 15, 2015

Uh oh!

jedesah commented Dec 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants