Skip to content

Conversation

@zdenko
Copy link
Collaborator

@zdenko zdenko commented Feb 8, 2018

Fixes #4889.
It seems that #4853, besides preventing an infinite loop, also caused a breaking change in the existing code.
The condition which checks the upper and lower range bounds, in order to prevent the infinite loop, always allows at least one iteration of the for loop.
Example:

a = 2
b = 1
s = 1
r = (x for x in [a..b] by s)

The value of r, prior to #4853 was [], but now it's [2].

To me [] is a bug as I would expect the result to be [2].
Given that r = (x for x in [1..2] by 1000) is [1], r = (x for x in [1..2] by -1000) should give the same result.

Anyway, to correct the breaking change caused by #4853, I prepared this PR which reverts the output.
And, I guess whether the result of for x in [1..2] by -1 should be []or [1] belongs to a separate discussion and PR.

@vendethiel
Copy link
Collaborator

Ok, that code looks much better. Nicely done.

@GeoffreyBooth
Copy link
Collaborator

Let’s add a test that matches #4889. Instead of logging the values, just push them into an array and then compare the array with what you’d expect it to be.

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 9, 2018

Although the intention of this PR is (was) to revert the output of for..range by loop prior to #4853, I have second thoughts about it.

I think this is the minimal case which shows the difference before and after #4853.

r = (x for x in [2..1] by 1)

As said earlier, the result of r before #4853 was [], and after [2].
Compared to other examples:

r = (x for x in [2..1] by -1000)  # r = [2]
r = (x for x in [2..2] by 1000)   # r = [2]

The reason for x in [2..1] by 1 returns [] lies in the compiled condition which checks the upper bound value based on the value of by, and doesn't take in account value of the lower bound.

# CS 2.1.1 and below

for x in [2..1] by 100
# for (x = i = 2; i <= 1; x = i += 100)
# []

for x in [2..1] by -100
# for (x = i = 2; i >= 1; x = i += -100)
# [2]

for x in [2..2] by 100
# for (x = i = 2; i <= 2; x = i += 100)
# [2]

for x in [2..2] by -100
# for (x = i = 2; i >= 2; x = i += -100)
# [2]

The condition part i <= or i >= depends on the value of the by.
I think this is wrong, and the result from the first example should also be [2].
The range used in for loop (it's not compiled into the array as in a = [1..2]) determines the lower and upper bounds and the condition should always check both limits.
So, for x in [a..b] by s in pseudocode should be

for (x = a; x bewteen a and b; x += s)

@GeoffreyBooth
Copy link
Collaborator

Going back to the example in #4889, modified slightly so that it could become a test:

results = []
n = 1

for i in [0..n]
  results.push 'i = ' + i
  for j in [(i+1)..n]
    results.push 'j = ' + j

This produces [ 'i = 0', 'j = 1', 'i = 1', 'j = 2', 'j = 1' ] in this branch, 2.1.1, 2.0.3, and 1.12.7. All well and good.

Now take the same as above, but add by 1 to the second for:

results = []
n = 1

for i in [0..n]
  results.push 'i = ' + i
  for j in [(i+1)..n] by 1
    results.push 'j = ' + j

Then we get:

  • [ 'i = 0', 'j = 1', 'i = 1', 'j = 2', 'j = 1' ] in this branch
  • [ 'i = 0', 'j = 1', 'i = 1', 'j = 2' ] in 2.1.1
  • [ 'i = 0', 'j = 1', 'i = 1' ] in 2.0.3 and 1.12.7

The issue raised in #4889 was that by 1 shouldn’t make a difference. And that certainly seems reasonable, it shouldn’t. So assuming the output of the first example is correct, this PR fixes the bug that by 1 should be a noop.

It’s a separate question, which I think @zdenko is alluding at in his last comment, as to whether the output from the first example is correct. But I think we should treat that as a separate bug. Could we get one PR that makes the existence or nonexistence of by 1 irrelevant, and a separate issue/PR to discuss what the “correct” output should be in cases like the first example?

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 13, 2018

@GeoffreyBooth my comment about the correctness of the output was regarding the second example, e.g. for - range - by.

Besides fixing infinite loop, #4853 also caused a breaking change of the for - range - by loop compilation, but I don't think this change is a bug.
And, I think that the result ([ 'i = 0', 'j = 1', 'i = 1' ]) user from #4889 expects is wrong.

Options:
a) we merge this PR which will revert the output of for - range - by loop, and then create another PR to discuss changes in the output.
b) discard this PR and keep changes from the #4853 (with minor improvements in a separate PR)

I vote for b).

Why the different result?

The results from for x in [a..b] and for x in [a..b] by 1] are different because for - range and for - range - by loops compiles differently.

  1. for x in [a..b]

No (significant) changes before and after #4853:

for (x = a; a <= b ? x <= b : x >= b; a <= b ? ++x; --x)

The value of increment/decrement step is defined by the relation of a and b.

  1. for x in [a..b] by s

Before #4853:

for (x = a; s > 0 ? x <= b : x >= b; x += s)

The loop condition basically says if the step is larger then 0 it also means that the end index ('b') of the range is larger then the start index (a).

After #4853:

for (x = a; a <= b ? a <= x <= b : a >= x >= b; x += s)

The loop condition is changed: x must be between a and b

The original intention of this change was to prevent possible infinite loops. Back then I wasn't paying additional attention to the output (although I should), as all existing tests have passed.
But, using the value of 'by' to define loop condition seems wrong to me. The value of 'by` is an explicit increment/decrement step set by the user.

Here is the table that shows the difference between the for - range - by from the second example:

i j range condition
before
condition
after
result
before
result
current
0 1 [1..1] by 1 j <= 1 1 <= j <= 1 [i = 0, j = 1]  [i = 0, j = 1]
1 2 [2..1] by 1 j <= 1 2 >= j >= 1 [i = 0, j = 1, i = 1]  [i = 0, j = 1, i = 1, j = 2]
1 3 [2..1] by 1 2 >= j >= 1  [i = 0, j = 1, i = 1, j = 2]

@GeoffreyBooth
Copy link
Collaborator

I don’t have time right this minute to dig into this, but do you mind opening a PR that’s limited to making by 1 and the lack of by equivalent? I think we can all agree that that is a bug that should be fixed, and we should get that merged in.

As for changing/correcting the interpretation of by, I think I’ll probably agree with you, but I need to work through some examples first. This is complicated enough that I think it would benefit from its own PR for focus/discussion. That could maybe be this PR, if you want to split out the by stuff to another one.

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 13, 2018

Agree. I'll prepare a couple of tests for this one and create another PR for changing the interpretation of by.

@GeoffreyBooth
Copy link
Collaborator

@zdenko If you don’t mind, let’s wrap up this work (both fixing the by 1 inconsistency and any change to how ranges are handled, as a separate PR) so that I can get another bugfix patch release out. Once that’s out, we can merge in some of the new features/changed output PRs, which should trigger a new minor release (2.3.0).

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 18, 2018

@GeoffreyBooth I added a simple test, a code from the example.
Will you merge this into master, and I'll start another PR to deal with range by inconsistency.

result = []
for i in [0..n]
result.push i
for j in [(i+1)..n] by 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s either have both for loops say by 1 or have neither do so.

If it makes a difference, in the other PR maybe we can copy this test with the other case (like one test with no by, and the other test with two by 1s).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without by 1 the result will be [ 0, 1, 1, 2, 1 ]. I used the same code as the user in #4889 to show the final (expected) result. Adding by 1 to the first loop will make no difference, but it's not actually the point of this PR (or #4889).
The second loop range is [1..1] in the first iteration and [2..1] in the second, and this is the cause of different results if by 1 is used or not.
The purpose of this PR was to revert the compilation of for - range - by so the result would be the same as before #4853.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we need a PR that produces the result the user expects, if we think that that result is wrong. I thought the plan was to have two PRs:

  • one so that by 1 and a lack of by are always equivalent
  • one that fixes the range output to be what you think it should be

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, my bad.
In that case, if you agree, I suggest just one PR which will cover both issues, as changed range output is prerequisite for correct by 1 and w/o by output.

@GeoffreyBooth
Copy link
Collaborator

Thinking out loud, starting with my example above:

results = []
n = 1

for i in [0..n]
  results.push 'i = ' + i
  for j in [(i+1)..n]
    results.push 'j = ' + j

This could be rewritten as:

results = []
for i in [0..1]
  results.push 'i = ' + i
  for j in [(i+1)..1]
    results.push 'j = ' + j

Now is where things get tricky. So the first range is 0..1, and so the first for loops twice, once where i is 0 and once where i is 1 (remember that .. is inclusive).

So there are two second for loops created, once for each i. Those two ranges are 1..1 and 2..1. There’s no by here, so what are we to do for 2..1? Nothing at all, since a lack of by implies by 1 but we can’t count up here? Or do we see that the start is greater than the end, and imply by -1 instead?

The docs actually have an answer to this question. The example for ranges is this:

countdown = (num for num in [10..1])

Which returns [10,9,8,7,6,5,4,3,2,1], both now and in 1.12.7. So clearly the expectation is that we check whether we should count up or down, and choose by 1 or by -1 as appropriate.

So to unpack the loops:

results = []
i = 0
results.push 'i = ' + i
for j in [1..1]
  results.push 'j = ' + j
i = 1
results.push 'i = ' + i
for j in [2..1]
  results.push 'j = ' + j

And finally:

results = []
i = 0
results.push 'i = ' + i
j = 1
results.push 'j = ' + j
i = 1
results.push 'i = ' + i
j = 2
results.push 'j = ' + j
j = 1
results.push 'j = ' + j

And so we get [ 'i = 0', 'j = 1', 'i = 1', 'j = 2', 'j = 1' ], which my “by hand” calculations determines to be what I would assume is the “correct” answer.

So I guess what I was assuming in earlier posts was wrong: a lack of by doesn’t always mean that there’s an implied by 1, but rather that there’s an implied by 1 or by -1, depending on if the start of the range is greater or less than the end of the range. (If the start and end are equal, it doesn’t matter what the by value is, because the range returns just that one value.)

But what are we supposed to do with a range of [2..1] and by 1? You can’t count up from 2 to 1. In 1.12.7, [10..1] by 1 is an empty array. That feels correct to me. In 2.2.1, it’s [10]—as in, it does one loop first, then increments up and then checks that it’s still within range. I can see the logic in this interpretation too.

Since both interpretations of [10..1] by 1 are reasonable, I think we should stay with the 1.x handling unless there’s a persuasive case to be made for why it’s wrong. Otherwise the new output is just an arbitrary breaking change. Surely it must be possible to avoid infinite loops while preserving the empty-array output of [10..1] by 1 that we got in 1.x.

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 18, 2018

Surely it must be possible to avoid infinite loops while preserving the empty-array output of [10..1] by 1 that we got in 1.x.

@GeoffreyBooth this PR does exactly that. All tests from #4853 are included, I just corrected the empty array result in cases like [2..1] by 1.

Since both interpretations of [10..1] by 1 are reasonable, I think we should stay with the 1.x handling

I think the opposite. Here is simplified version of how for x in [a..b] by s is compiled in nodes:

for (x = a; s > 0 ? x <= b : x >= b; x += s) 

The part s > 0 feels wrong to me. Step s in explicit increment/decrement set by the user and shouldn't be used in the condition part.
Let's see what happens if we use numbers:

for x in [2..1] by 1
# for (x = 2; 1 > 0 ? x <= 1 : x >= 1; x += 1) 

The result is an empty array because x <= 1 (e.g. 2 <= 1) fails. It seems logic at first as you can't go up from 2to 1 by 1.


for x in [2..2] by 1
# for (x = 2; 1 > 0 ? x <= 2 : x >= 2; x += 1) 

The result in this case will be [2]. So, here we say go up from 2 to 2 by 1 and we'll get [2]


for x in [2..1] by -100
# for (x = 2; -100 > 0 ? x <= 1 : x >= 1; x += -100) 

This reads as go down from 2 to 1 by -100. The result is [2].


It's probably just me, but I read for x in [a..b] as "go from a to b" and for x in [a..b] by s as "go from a to b with step of s".
So, the only difference is the step. The former will be implicitly set by CS, the latter by the user.
If by is used as the condition, then it actually means "if s is positive go up to b, otherwise go down to b". The a is disregarded in this case, and IMHO it shouldn't be.
The loop starts with a and proceeds to b by given step:

for x in [2..1] by 1    # = [2]
for x in [2..1] by -100 # = [2]
for x in [2..2] by 1    # = [2]
for x in [2..1] by 0    # = [2]

I know this is a breaking change, but I'm totally OK if the current output stays.

@GeoffreyBooth
Copy link
Collaborator

I don't mean that we should keep the output of 1.x, of comparing the step against zero. I just mean that [10..1] by 1 should still return an empty array, by however method makes sense.

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 18, 2018

Yes, I get that. I wanted to explain why the result shouldn't be an empty array.
But, let's wrap this up. This PR provides the correct result (e.g. empty array) and prevents an infinite loop.
I say we merge this.

@GeoffreyBooth
Copy link
Collaborator

@zdenko I’ve added a few more tests. Do they look correct to you?

Assuming they’re good, I’m okay with merging this in. Are there still additional changes you want to make to how ranges/by operate, that you’ll be opening a new PR for?

@zdenko
Copy link
Collaborator Author

zdenko commented Feb 20, 2018

Tests are correct. I'll wait with the PR for range-by for now. It would be a breaking change to the existing code. So, maybe in CS3.

@GeoffreyBooth
Copy link
Collaborator

Thanks. Maybe open a new issue explaining it, with a little less focus on what the JavaScript output would be and a little more emphasis on what the result (empty array, array with certain values, etc.) would be. If you can explain why the current output is wrong, and others agree, it’s something we can change in 2.x; if the current output is reasonable but not what a more thoughtful approach would be, it would be good to explain what the better result would be and why, and why we should change to it.

@GeoffreyBooth GeoffreyBooth merged commit 72ab6fe into jashkenas:master Feb 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants