Reduce use of unchecked indexing in parser code by anka-213 · Pull Request #10381 · nushell/nushell

anka-213 · 2023-09-15T11:29:06Z

This draft is currently just an example of the kind of changes that would be done to the rest of the parser.

Description

By replacing spans[idx] with spans.get(idx) calls, we can guarantee that we won't get any Index Out of Bounds panics, since we get back Option<Span> where we'll need to actually check that it was valid.

Some of the ones I replaced are very obviously checked, but that means that I can replace a check by a pattern match. And the goal is to remove all unchecked indexing in the parser.

Fixes Syntax-hightlight panics with index out of bounds due to custom function with many arguments #9072 and adds test for it
Should prevent more errors like Parser crashes when variable name in let expression is missing #10380 from appearing

Further ideas:

Make a new type for non-empty arrays of spans that guarantees that they are non-empty on construction.

User-Facing Changes

None. Just refactoring and preventing crashes.

Tests + Formatting

cargo fmt --all -- --check to check standard code formatting (cargo fmt --all applies these changes)
cargo clippy --workspace -- -D warnings -D clippy::unwrap_used to check that you're using the standard code style
cargo test --workspace to check that all tests pass (on Windows make sure to enable developer mode)
cargo run -- -c "use std testing; testing run-tests --path crates/nu-std" to run the tests for the standard library

After Submitting

No user-facing changes

More info

The approach is inspired by the parse don't validate idea, where instead of first having a validation step where we check that the data is valid and then later construct data with implicit assumptions about the data, we combine these two into a single step, where we try to construct the data and if the data was invalid, we get a None back.

By replacing spans[idx] with spans.get(idx) calls, we can guarantee that we won't get any Index Out of Bounds panics. Some of the ones I replaced are very obviously checked, but that means that I can replace a check by a pattern match. And the goal is to remove all unchecked indexing in the parser.

sholderbach · 2023-09-15T13:13:52Z

Thanks for diving into this! Changes like this sound good. I think we need to get some of the resident parser experts at the table, as the long term goal should be that the parser state is correct on top of the invalid acccesses getting handled gracefully (which remains a must for stability).

(I smiled at the initial PR description :) )

anka-213 · 2023-09-15T17:13:28Z

I don't even know exactly what I changed, but when I refactored the function that crashed before to use my new safer primitives, it resolved the issue in #9072. I did however notice that the new error message refers to the wrong index, so that's probably related to the cause of the original issue.

As you can see now the first test says "missing f", which is true. The second test says "missing e", which is clearly a lie, since "e" is the fourth argument.

running 1 test
stdout: 
stderr: Error: nu::parser::missing_positional

  × Missing required positional argument.
   ╭─[]
 1 │ def a [b: bool, c: bool, d: float, e: float, f: float] {}; a true true 1 1
   ·                                                                           ▲
   ·                                                                           ╰── missing f
   ╰────
  help: Usage: a <b> <c> <d> <e> <f>


stdout: 
stderr: Error: nu::parser::missing_positional

  × Missing required positional argument.
   ╭─[]
 1 │ def a [b: bool, c: bool, d: float, e: float, f: float, g: float] {}; a true true 1 1
   ·                                                                                     ▲
   ·                                                                                     ╰── missing e
   ╰────
  help: Usage: a <b> <c> <d> <e> <f> <g>


test tests::test_parser::too_few_arguments ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 528 filtered out; finished in 3.08s

Well, at least it doesn't crash anymore, even though I haven't found the source of the bug.

By not using nested references. It does however introduce new complexity at one place, but I'll try to remove that later

This reverts commit 14274a1.

Needed in order to call functions with sub-spans, but keeping shared idx

anka-213 · 2023-09-16T00:10:51Z

Oh, I see. I hadn't reran all the other tests. Apparently, when I fixed the new test caused 95 other tests to fail.

Edit: Found the off-by-one error. I still have 7 test failures remaining and I don't actually know exactly which part of the change fixed the bug. I just made the bug impossible, I didn't find the cause of it.

If you start a loop by increasing an index, you're going to skip the first element

anka-213 · 2023-09-16T02:48:17Z

Oh, I figured out part of the original bug, this line

nushell/crates/nu-parser/src/parser.rs

Line 968 in 3a04bd9

if spans[..end].is_empty() || spans_idx == end {

only checks for == end, but in this case we were for some reason looking at two steps from the end. After reading ["true", "true", "1"] at index 2 to get the first float, we try to read ["true", "true"] at index 3 to get the second float, which doesn't make sense.

My change ensures that it actually fails that test and skips to the next step, after which we don't get quite a correct error message, but at least a more sensible one. My guess is that there is a bug in calculate_end_span, maybe it thinks that the bools are keywords requiring a different number of arguments or something.

sophiajt · 2023-09-16T06:51:48Z

For that area of the code - the use that we're using should be correct, and if there are panics, we want to see them so we can fix them. You can't get a span idx beyond the end of the array, so for the cases where you can, that shows an error in the code rather than a problem with not being defensive in how we handle the input.

anka-213 · 2023-09-16T08:41:22Z

The actual bug was here

nushell/crates/nu-parser/src/parser.rs

Line 590 in 3a04bd9

&& spans.len() > (signature.required_positional.len() - positional_idx)

It should be

 && spans.len() - spans_idx > (signature.required_positional.len() - positional_idx)

checking there are fewer remaining positional args than remaining args, rather than fewer positional args than total args.

@jntrnr I see your point, but it would still be preferable for the parser to not crash from a user perspective, so maybe printing a warning in unexpected cases or something like that instead? I'd prefer the confusing message "missing e", despite it being "f" that's missing, over just a full crash.

The idea with my changes is not primarily to be more defensive in the code, but more to encode more invariants in the types and so reducing the need for defensive programming down the line. In general, if we can move as much of the finicky index-manipulation out of the main parser code and into generic primitives, we can reduce the risk for tiny bugs like these popping up.

Edit: I've added an assert with the old check in the error case now. This is also an improvement because the panic is closer to the source of the error rather than later down the line where invariants were unsatisfied.

Old code was comparing remaining positional arguments with total number of arguments, where it should've compared remaining positional with with remaining arguments of any kind Fixes nushell#9072

anka-213 · 2023-09-16T09:50:12Z

Now all the tests passes again. I had a mistake in my translation, where I had accidentally changed span.0 into 0

anka-213 · 2023-09-16T10:39:23Z

crates/nu-parser/src/parser.rs

                spans.len()
            } else if positional_idx < signature.required_positional.len()
-                && spans.len() > (signature.required_positional.len() - positional_idx)
+                && spans.len() - spans_idx > (signature.required_positional.len() - positional_idx)


This is the fix for #9072

anka-213 · 2023-09-16T11:10:33Z

crates/nu-parser/src/parser.rs

            Expression {
                expr: Expr::VarDecl(id),
-                span: span(&spans[*spans_idx..*spans_idx + 1]),
+                span: spans.current(),


why were we recreating a single span in a complicated way here?

span(&spans[*spans_idx..*spans_idx + 1]),

is equivalent to

spans[*spans_idx]

as far as I can tell

anka-213 · 2023-09-16T14:24:06Z

This is ready for review now. I have some plans to keep refactoring more of the parser in the same style, but I might as well do that in a new PR, since this is self-contained as is.

anka-213 · 2023-09-16T14:43:20Z

Oops, I forgot that subtraction is not a safe operator with unsigned integers. It should work now

sophiajt · 2023-09-16T17:11:37Z

@anka-213 - if you found where the bugfix should be, that'd be a good one to create a PR for.

We're still pre-1.0. Users who are using Nushell now are helping us test and get it ready for 1.0 (that's why we put the disclaimer in the README). I get that it's annoying as a user to hit issues, but we want folks to file issues. If it's just a warning, I think some people may not even see it as it might be part of a script they're running.

Not saying I want the parser to crash. I just think we need to be hammering it into shape. Finding issues like the one you found, fixing them, and repeating until we can't find any anymore.

anka-213 · 2023-09-16T19:40:48Z

@jntrnr I've extracted the bug fix into #10395 now.

Yeah, I see your point and I have changed the code to as much as possible be a refactor rather than permitting more programs than before (which I accidentally did before). The main difference should be in moving crashes closer to the source, by enforcing invariants on construction, rather than something unrelated crashing later down the line because of the broken invariants.

I would view it as much more of an extension of the "don't use unwrap" guideline than a case of defensive programming. Using .unwrap() quietly assumes that there is an invariant that holds that guarantees the validity, which using a[i] indexing also does. The goal is to make the invariants more explicit and present in the types rather than implicit and relatedly to make the validity checking explicitly connected to the construction of values and so avoiding "boolean blindness". The refactor simplifies the code in the sense that it reduces the amount of implicit invariant tracking, but also sometimes by removing the need for some code.

I was already from the start trying to avoid changing any behaviour (just writing the exact same checks as before using new primitives), but now I see that I shouldn't view crashes as a thing to avoid as long as they only happen if there is a bug. Which also makes the refactor simpler, since now any cases that were missing from the old checks can be replaced by a call to panic, instead of trying to figure out what could cause it to happen and how to handle it.

anka-213 · 2023-09-18T12:11:55Z

Here's a rather silly crash I found by looking at what assumptions indexing operations have and working backwards from that:

alias "export alias" = foo
export alias

as soon as you type the "s" in the final "alias" it crashes with

thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 2', crates/nu-parser/src/parser.rs:241:42

Now, this is probably not something a user should ever do, so this is more of an illustration of broken assumptions than a serious bug.

My goal is to make the code less fragile so it's more difficult have incorrect assumptions without noticing. And also hopefully remove a bunch of the defensive programming that is already present in the code.

# Description Old code was comparing remaining positional arguments with total number of arguments, where it should've compared remaining positional with with remaining arguments of any kind. This means that if a function was given too few arguments, `calculate_end_span` would believe that it actually had too many arguments, since after parsing the first few arguments, the number of remaining arguments needed were fewer than the *total* number of arguments, of which we had used several. Fixes #9072 Fixes: #13930 Fixes: #12069 Fixes: #8385 Extracted from #10381 ## Bonus It also improves the error handling on missing positional arguments before keywords (no longer crashing since #9851). Instead of just giving the keyword to the parser for the missing positional, we give an explicit error about a missing positional argument. I would like better descriptions than "missing var_name" though, but I'm not sure if that's available without Old error ``` Error: nu::parser::parse_mismatch × Parse mismatch during operation. ╭─[entry #1:1:1] 1 │ let = if foo · ┬ · ╰── expected valid variable name ╰──── ``` New error ``` Error: nu::parser::missing_positional × Missing required positional argument. ╭─[entry #18:1:1] 1 │ let = foo · ┬ · ╰── missing var_name ╰──── help: Usage: let <var_name> = <initial_value> ``` # User-Facing Changes The program `alias = = =` is no longer accepted by the parser

anka-213 changed the title ~~WIP: Reduce use of partial functions in parser code~~ WIP: Reduce use of unchecked indexing in parser code Sep 15, 2023

Fix bug in new code

348e4eb

anka-213 mentioned this pull request Sep 15, 2023

Parser crashes when variable name in let expression is missing #10380

Closed

anka-213 added 3 commits September 15, 2023 16:11

Add (failing) test for nushell#9072

f970ab7

WIP: Add module for safer arrays

60c36dc

Use the safer parser primitives

8d7dcbd

anka-213 added 6 commits September 15, 2023 19:29

Simplify the code somewhat

14274a1

By not using nested references. It does however introduce new complexity at one place, but I'll try to remove that later

Revert "Simplify the code somewhat"

537f3bc

This reverts commit 14274a1.

Minor cleanup

247d809

Allow PointedSpanArray to reference each other

b7b0b55

Needed in order to call functions with sub-spans, but keeping shared idx

Add trivial variation on test case

0f1954b

Fix clippy warnings

53beed3

Fix many test failures

7d4d822

If you start a loop by increasing an index, you're going to skip the first element

Merge remote-tracking branch 'upstream/main' into safer-parser

92021bc

anka-213 added 3 commits September 16, 2023 10:44

Fix logic error in calculate_end_span

57763ec

Old code was comparing remaining positional arguments with total number of arguments, where it should've compared remaining positional with with remaining arguments of any kind Fixes nushell#9072

Add debug assert for the old less defensive check

4cfad96

Fix mistake in conversion to new primitives

f7a3ab4

anka-213 marked this pull request as ready for review September 16, 2023 09:48

anka-213 marked this pull request as draft September 16, 2023 10:31

anka-213 commented Sep 16, 2023

View reviewed changes

Clean up junk and clarify code

7a9534a

anka-213 added 3 commits September 16, 2023 14:30

Remove error-prone function

2706d1b

Allow any sub-spans to be used

b0cdfd0

Remove needless generics

de3017c

anka-213 marked this pull request as ready for review September 16, 2023 13:52

Fix numerical overflow introduced by refactoring

d847fcf

Revert refactor to prevent numerical overflow

81a486a

anka-213 changed the title ~~WIP: Reduce use of unchecked indexing in parser code~~ Reduce use of unchecked indexing in parser code Sep 16, 2023

Fix typo

b69fc00

anka-213 mentioned this pull request Sep 16, 2023

Fix panic on too few arguments for custom function #10395

Merged

5 tasks

anka-213 closed this Jun 29, 2024

Conversation

anka-213 commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Further ideas:

User-Facing Changes

Tests + Formatting

After Submitting

More info

Uh oh!

sholderbach commented Sep 15, 2023

Uh oh!

anka-213 commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anka-213 commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anka-213 commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sophiajt commented Sep 16, 2023

Uh oh!

anka-213 commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anka-213 commented Sep 16, 2023

Uh oh!

anka-213 Sep 16, 2023

Choose a reason for hiding this comment

Uh oh!

anka-213 Sep 16, 2023

Choose a reason for hiding this comment

Uh oh!

anka-213 commented Sep 16, 2023

Uh oh!

anka-213 commented Sep 16, 2023

Uh oh!

sophiajt commented Sep 16, 2023

Uh oh!

anka-213 commented Sep 16, 2023

Uh oh!

anka-213 commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anka-213 commented Sep 15, 2023 •

edited

Loading

anka-213 commented Sep 15, 2023 •

edited

Loading

anka-213 commented Sep 16, 2023 •

edited

Loading

anka-213 commented Sep 16, 2023 •

edited

Loading

anka-213 commented Sep 16, 2023 •

edited

Loading

anka-213 commented Sep 18, 2023 •

edited

Loading