Skip to content

SI-9581 Fix overflow on Vector take and drop methods#4876

Merged
SethTisue merged 1 commit intoscala:2.11.xfrom
ruippeixotog:issue/9581
Dec 15, 2015
Merged

SI-9581 Fix overflow on Vector take and drop methods#4876
SethTisue merged 1 commit intoscala:2.11.xfrom
ruippeixotog:issue/9581

Conversation

@ruippeixotog
Copy link
Contributor

Fixes the index/length comparison in Vector#take and Vector#drop so that they handle all possible integer values.

Given the collection's invariants startIndex >= endIndex and 0 >= startIndex, endIndex, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases when n <= 0 are handled beforehand, endIndex - n cannot overflow, contrary to startIndex + n. If without the danger of overflows the condition yields true, on the other hand, startIndex + n cannot overflow as it is smaller than endIndex (as the previous formulation of the condition shows).

Fixes the index/length comparison in `Vector#take` and `Vector#drop` so that they handle all possible integer values.

Given the collection's invariants `startIndex >= endIndex` and `0 >= startIndex, endIndex`, it is sufficient to change the arithmetic in the comparison as done in this commit to avoid overflows. As cases when `n <= 0` are handled beforehand, `endIndex - n` cannot overflow, contrary to `startIndex + n`. If without the danger of overflows the condition yields true, on the other hand, `startIndex + n` cannot overflow as it is smaller than `endIndex` (as the previous formulation of the condition shows).
@scala-jenkins scala-jenkins added this to the 2.11.8 milestone Dec 6, 2015
@Blaisorblade
Copy link
Contributor

👍 for the prompt fix proposal and for spotting the same problem in drop.

My only two concerns here:

  1. If I were, I'd require a comment inline to warn anybody modifying this source — in both places.
  2. Your reasoning looks correct to me, and avoiding unsigned comparison might help performance a bit (not sure it's ever noticeable, but... ). I went for unsigned comparison because I wonder if we need such fixes on other collections, and because using that is a bit easier. Not sure what's the policy here.

But I'm far from the maintainer — I guess @Ichoran should review this, also because of https://twitter.com/travisbrown/status/673504795945123840.

@Ichoran
Copy link
Contributor

Ichoran commented Dec 7, 2015

LGTM

For what it's worth, the collections are absolutely riddled with bounds checks that may fail around Int.MaxValue. I don't believe it is worth dropping comments on every dangerous line that the subtraction is done this way to avoid bounds checks--it adds too much clutter and doesn't necessarily help when new code is written. Instead, there needs to be a culture among reviewers of thinking about corner cases like that, and possibly a top-level document specifying best practices (e.g. end-n instead of start+n).

@ruippeixotog
Copy link
Contributor Author

I agree with @Ichoran regarding the practicality and usefulness of comments in this cases. However, I'm more fond of unit testing thoroughly these corner cases rather than leaving that "responsibility" to the reviewers. @Ichoran, that would be a fine use case for your scala-collections-laws project, wouldn't it?

I must also note that this bug was particularly tricky to find, as it doesn't reveal itself in collections created directly, e.g. Vector(0 to 64: _*), but it happens on vectors constructed like in the unit test. With some collections several instantiation processes must be tested, as collections deemed equal may have different internal representations.

@Ichoran
Copy link
Contributor

Ichoran commented Dec 7, 2015

@ruippeixotog - It's a little hard to know what to test, but yes, taking Int.MaxValue should produce the identical collection to the original, and I can add that as a law. (Likewise, dropping that many should always result in an empty collection.) But that's one of many possible bounds bugs. Every method that can help is probably needed to improve things as much as they should be (improved tests, improved culture, etc.).

@Blaisorblade
Copy link
Contributor

I must also note that this bug was particularly tricky to find, as it doesn't reveal itself in collections created directly, e.g. Vector(0 to 64: _*), but it happens on vectors constructed like in the unit test.

That's one of the special reasons why code review can be more effective than testing — once you know to look for it. There's also general experience showing it's more effective than testing, if done at the appropriate speed — and reviewer time is sorely lacking.

One advantage of unsignedCompare is that it makes this problem more obvious to the readers.

Maybe next we have a volunteer at poking holes in {Hash,}Map and {Hash,}Set? Luckily, they don't reimplement take and drop, but inherit them from IterableLike, which looks safe.

To the credit of the implementors, unsignedCompare already shows up in the library, even in two clones, in the two places where I'd most have feared this:
https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/immutable/HashSet.scala#L1022-L1024
https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/generic/BitOperations.scala
https://github.com/scala/scala/search?utf8=%E2%9C%93&q=unsignedCompare

Maybe those clones can be consolidated?

@jedesah
Copy link
Contributor

jedesah commented Dec 13, 2015

However, I'm more fond of unit testing thoroughly these corner cases rather than leaving that "responsibility" to the reviewers.

Yes!!

I must also note that this bug was particularly tricky to find

I must disagree. Anyone making serious use of scalacheck would find it rather quickly. I think the reason it took so long to find is that few people used both scalacheck and Vector (could be wrong).

This seems like an argument to aim to one day have property based testing in the standard library.

@Blaisorblade
Copy link
Contributor

I must also note that this bug was particularly tricky to find

Anyone making serious use of scalacheck would find it rather quickly. I think the reason it took so long to find is that few people used both scalacheck and Vector (could be wrong).

While we're speculating, I'd like you to provide evidence.
But I'd like to see an actually obvious test where ScalaCheck actually finds the bug. The original one doesn't look so natural to me:

org.scalacheck.Prop.forAll { (x: Vector[Int], y: Vector[Int]) =>
  (x ++ y).take(Int.MaxValue); true
}.check(_.withMaxSize(999))

and in fact, it was born while Travis Brown was debugging some code of his (IIUC his story). If you don't know you need ++ to trigger a bug, you need to hope your Arbitrary[Vector] instance covers that space. (ScalaCheck's generator don't seem to be written to test the standard library).

I might agree that experts in Vector's implementation using ScalaCheck seriously enough should realize the need for such a case and test their generator to ensure coverage, but writing good generators is not "quick" — see the original QuickCheck paper for extended discussion of the pitfalls.

This is still unrealistic, but harder to check:

org.scalacheck.Prop.forAll { (x: Vector[Int], y: Int) =>
  x.take(y); true
}

or, more realistically:

org.scalacheck.Prop.forAll { (l: Vector[Int], n: Int) =>
  l.take(n) ++  l.drop(n) == l
}

For extra fun, you need a big enough integer to trigger the fault, so finding any fault requires a clever probability distribution for the involved sizes.

I've not tested those examples, and I'll be positively impressed if some reasonable variant actually works.

@jedesah
Copy link
Contributor

jedesah commented Dec 13, 2015

@Blaisorblade All I know is that it seemed to surface pretty easily here:

https://github.com/quasar-analytics/quasar/blob/feature/translate_web_api_for_free/web/src/test/scala/quasar/api/services/DataServiceSpec.scala#L139

filesystem.contents is the Vector in that code. I am fairly certain that the Arbitrary for SingleFileMemState uses the default Arbitrary instance for Vector.

@ruippeixotog
Copy link
Contributor Author

As I argued above, in collections whose nature allows two of their instances to have different internal representations while still being considered equal in Scala, finding bugs can be tricky and specialized generators need to be built in order to ensure proper property coverage. It may not be an easy or quick task and it may require knowledge of the collection's internals, but I still think it's an effective way to prevent corner case bugs. Not infallible, but efficient.

I followed that approach in mutable.TreeMap and built a set of ScalaCheck properties not only for the collection itself, but also for its underlying RedBlackTree structure. I can't know for sure if there are still any bugs in the code that are not covered by the properties, but as I was developing the data structure I can say they helped me find some corner cases I wasn't counting on.

(btw, regarding the need for a big enough integer, ScalaCheck is actually sensitive to bounds and special cases; when you ask for arbitrary integers, the lower and upper bounds are always tested, as well as 0, 1 and -1 [source])

@Blaisorblade
Copy link
Contributor

To clarify: I'm also pro-unit-testing and pro-conscious use of ScalaCheck, I was just agreeing with you that finding such a bug would not be trivial. Kudos to putting the effort in in your case! (After seeing this bug, I wonder whether your genRedBlackTree could increase assurance by also trying deletions; I've seen your delete property, but that won't test e.g. min or max after delete. OTOH, I'm not sure this extra testing would be worth the time to write it).

@ruippeixotog
Copy link
Contributor Author

Yeah, it could be done in the generator. In line 58 it is verified that the application of any sequence of delete/insert operations always results in a valid red-black tree, which should guarantee the integrity of the other methods. But it would be safer if such a construction was used in the generator too.

I also have the idea that although there are multiple possible internal representations for a red-black tree with a given set of elements, all representations can be constructed by a given sequence of insertions. Equivalently, there are no internal representation that can only be achieved by deleting nodes. I can't give you a proof however as I'm not 100% sure of that, just intuition :) All valid red-black trees (and binary search trees) can be constructed by inserting its elements as seen in a breadth-first traversal :)

@SethTisue
Copy link
Member

This seems like an argument to aim to one day have property based testing in the standard library

Note that ScalaCheck is a dependency of scala/scala, and adding ScalaCheck-based tests to the repo is encouraged.

SethTisue added a commit that referenced this pull request Dec 15, 2015
SI-9581 Fix overflow on Vector take and drop methods
@SethTisue SethTisue merged commit 7559aed into scala:2.11.x Dec 15, 2015
@SethTisue
Copy link
Member

thank you @ruippeixotog! the fix is very much appreciated

@jedesah
Copy link
Contributor

jedesah commented Dec 15, 2015

Note that ScalaCheck is a dependency of scala/scala, and adding ScalaCheck-based tests to the repo is encouraged.

Was unaware of that. Glad to hear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants