Skip to content

fix(parser): empty span of unbalanced delimiter#16292

Merged
sholderbach merged 3 commits intonushell:mainfrom
blindFS:fix_lexer
Sep 8, 2025
Merged

fix(parser): empty span of unbalanced delimiter#16292
sholderbach merged 3 commits intonushell:mainfrom
blindFS:fix_lexer

Conversation

@blindFS
Copy link
Copy Markdown
Contributor

@blindFS blindFS commented Jul 29, 2025

Fixes #16284

Description

#15246 happens to rely on the lexer's bug to work.
Changed to a workaround solution in this PR, not sure that's good enough in terms of perf and accuracy.
@ysthakur Please take your time to review the changes of parse_record with care.

User-Facing Changes

Tests + Formatting

+1

After Submitting

@github-actions github-actions bot added the A:parser Issues related to parsing label Jul 29, 2025
@blindFS
Copy link
Copy Markdown
Contributor Author

blindFS commented Jul 29, 2025

To illustrate what happens for {a:b}/ before this fix:

The check of

if lex_state.input[0] == b'}' {
extra_tokens = true;
unclosed = false;
break;
}

depends on the off-by-1 span.end of an unbalanced } token to pass. lex_state.input is overwritten in

// If this lex_internal call reached the end of the input, there may now be fewer tokens
// in the output than before.
let tokens_n_diff = (state.output.len() as isize) - (n_tokens as isize);
let next_offset = state.output.last().map(|token| token.span.end);
if let Some(next_offset) = next_offset {
state.input = &state.input[next_offset - state.span_offset..];
state.span_offset = next_offset;
}

Which is the consequence of lexing a:b}/ where the span start is increased by 1 in line 6197, while the span end is untouched, deviating from the meaning of inner_span

let mut start = span.start;
let mut end = span.end;
if bytes.starts_with(b"{") {
start += 1;
} else {
working_set.error(ParseError::Expected("{", Span::new(start, start + 1)));
return garbage(working_set, span);
}
let mut unclosed = false;
let mut extra_tokens = false;
if bytes.ends_with(b"}") {
end -= 1;
} else {
unclosed = true;
}
let inner_span = Span::new(start, end);

I suppose the ideal way to solve this issue is to fix the span argument passed in from parse_brace_expr so that trailing / is not included in the first place.

unclosed = false;
break;
if let Some(ParseError::Unbalanced(left, right, _)) = lex_state.error.as_ref() {
if left == "{" && right == "}" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see another place in the parser that checks both left and right before determining if the error is for unbalanced curly braces, but there's currently no situation where left would be "{" but right would be "]" or something. Feels kinda weird to use two separate strings rather than an enum or something, but I guess it doesn't matter that much, and it's out of scope for this PR anyway.

Copy link
Copy Markdown
Member

@ysthakur ysthakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for being brave enough to plunge into the parser.

I suppose the ideal way to solve this issue is to fix the span argument passed in from parse_brace_expr so that trailing / is not included in the first place.

That would be nice, but probably a harder change to make. I like how simple this PR is, much easier to review.

Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge in main with #16231 so we can run the parse_with_keywords fuzz target on that for a bit?

@blindFS
Copy link
Copy Markdown
Contributor Author

blindFS commented Aug 4, 2025

Can you merge in main with #16231 so we can run the parse_with_keywords fuzz target on that for a bit?

Sure, actually I've noticed several unexpected consequences of this fix, running fuzz tests would be a good practice.

@sholderbach
Copy link
Copy Markdown
Member

If you want to give it a go you can find it at

https://github.com/nushell/nushell/tree/main/crates/nu-parser/fuzz

The basic setup would be

# cargo install cargo-fuzz
cd crates/nu-parser/fuzz
mkdir out
./gather_seeds.nu
cargo fuzz run parse_with_keywords out seeds

I typically run it with some additional flags to the fuzzer:

cargo fuzz run parse_with_keywords out seeds -- -max_len=64 -rss_limit_mb=8000
  • -only_ascii=1 can be used to find interesting stuff without binary mess but for a lexer change I would also make sure that UTF-8 handling remains sane
  • -max_len=64 not sure what it does for the overall performance of exploring the space, but the reproducers are typically easier to reduce than large seed corpus input with a small cause for the bug
  • -rss_limit_mb=8000 increase of the available memory as there were leaks or excessive memory use that stop before finding interesting stuff (see e.g. fix(parser): repeated ( / parenthesis / opened sub-expressions causes memory leak #16204)

@sholderbach
Copy link
Copy Markdown
Member

#16355 has brought another fix for things found by the fuzzer onto main, so worth rebasing onto that to run it with arbitrary bytes and not just ASCII

@sholderbach
Copy link
Copy Markdown
Member

sholderbach commented Aug 5, 2025

Dang found another one of the parse_var_with_opt_type panics (not yet reduced):

..(mut  XEz: , e>!(*s  ({$

Reduced to:

mut : , a
at crates/nu-parser/src/parser.rs:3515:65
  ╰─▶ index out of bounds: the len is 0 but the index is 0
         0: 0x588f919cd30d - rust_begin_unwind
                      at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:695
         1: 0x588f919fc360 - core::panicking::panic_fmt::h5764ee7030b7a73d
                      at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:75
         2: 0x588f919fc542 - core::panicking::panic_bounds_check::h0328ca7e7f0749c4
                      at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:273
         3: 0x588f90287547 - <usize as core::slice::index::SliceIndex<[T]>>::index::hbb5971d2ec5675f5
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:274
         4: 0x588f902848f1 - core::slice::index::<impl core::ops::index::Index<I> for [T]>::index::h3fec42bd6862c540
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:16
                       - <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index::ha3d52af603e6d002
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3358
         5: 0x588f90249ea6 - nu_parser::parser::parse_var_with_opt_type::h6ab8ee4cc14deda7
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:3515
         6: 0x588f9022d64f - nu_parser::parser::parse_multispan_value::hed97fbc8540e3aaa
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:850
         7: 0x588f902312d6 - nu_parser::parser::parse_internal_call::h972f5a2c304629a6
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:1234
         8: 0x588f9020172a - nu_parser::parse_keywords::parse_mut::hdcccdbab9967ba9d
                      at /home/stefan/nushell/crates/nu-parser/src/parse_keywords.rs:3685
         9: 0x588f90267db0 - nu_parser::parser::parse_builtin_commands::h5ce809c687581bd7
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:6081
        10: 0x588f9026c0c4 - nu_parser::parser::parse_pipeline::h5fb0fba33640a7fa
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:6511
        11: 0x588f9026c739 - nu_parser::parser::parse_block::h319f8ce0c9dfcc9c
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:6545
        12: 0x588f902720c6 - nu_parser::parser::parse::hc26cef30bf13895d
                      at /home/stefan/nushell/crates/nu-parser/src/parser.rs:7101
        13: 0x588f8fd1d6ce - nu_cli::syntax_highlight::highlight_syntax::hef2c0d6fa81f53dc
                      at /home/stefan/nushell/crates/nu-cli/src/syntax_highlight.rs:46
        14: 0x588f8fd1d37c - <nu_cli::syntax_highlight::NuHighlighter as reedline::highlighter::Highlighter>::highlight::h103e95f58b7f6845
                      at /home/stefan/nushell/crates/nu-cli/src/syntax_highlight.rs:21
        15: 0x588f8ffca7ba - reedline::engine::Reedline::buffer_paint::hc872e9f884391fd7
                      at /home/stefan/.cargo/git/checkouts/reedline-68c4267336505bba/faee143/src/engine.rs:1773
        16: 0x588f8ffc78db - reedline::engine::Reedline::repaint::h26b5e97e1f7937c3
                      at /home/stefan/.cargo/git/checkouts/reedline-68c4267336505bba/faee143/src/engine.rs:1538
        17: 0x588f8ffc0609 - reedline::engine::Reedline::read_line_helper::h6cd66d9826daa00b
                      at /home/stefan/.cargo/git/checkouts/reedline-68c4267336505bba/faee143/src/engine.rs:829
        18: 0x588f8ffbf8fd - reedline::engine::Reedline::read_line::hcbecdeaeb783826c
                      at /home/stefan/.cargo/git/checkouts/reedline-68c4267336505bba/faee143/src/engine.rs:660
        19: 0x588f8fd12809 - nu_cli::repl::loop_iteration::h6629932c47c21b01
                      at /home/stefan/nushell/crates/nu-cli/src/repl.rs:487
        20: 0x588f8fd8b502 - nu_cli::repl::evaluate_repl::{{closure}}::h0ccb55285a97b64d
                      at /home/stefan/nushell/crates/nu-cli/src/repl.rs:189
        21: 0x588f8fe20ecf - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h4de3698e4a7a92ea
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272
        22: 0x588f8fe20f38 - std::panicking::try::do_call::h81e68cfb2d9a9582
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:587
        23: 0x588f8fd6386b - __rust_try
        24: 0x588f8fd637ae - std::panicking::try::haa4df45239cc9ebe
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:550
                       - std::panic::catch_unwind::hc3631095f97356ef
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358
        25: 0x588f8fd0c1f7 - nu_cli::repl::evaluate_repl::h34b726177417d944
                      at /home/stefan/nushell/crates/nu-cli/src/repl.rs:188
        26: 0x588f8dfd33e9 - nu::run::run_repl::he3344d0dba50d4b0
                      at /home/stefan/nushell/src/run.rs:207
        27: 0x588f8e012de4 - nu::main::hd226463d40432cfc
                      at /home/stefan/nushell/src/main.rs:525
        28: 0x588f8e0024bb - core::ops::function::FnOnce::call_once::h34ee1d5d51ac487c
                      at /home/stefan/.rustup/toolchains/1.86.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250

@blindFS
Copy link
Copy Markdown
Contributor Author

blindFS commented Aug 5, 2025

Dang found another one of the parse_var_with_opt_type panics (not yet reduced):

..(mut  XEz: , e>!(*s  ({$

Reduced to:

mut : , a

Should be fixed now.

#16355 has brought another fix for things found by the fuzzer onto main, so worth rebasing onto that to run it with arbitrary bytes and not just ASCII

Rebased, but I have trouble running cargo-fuzz:

error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -Cpasses=sancov-module -Cllvm-args=-sanitizer-coverage-level=4 -Cllvm-args=-sanitizer-coverage-inline-8bit-counters -Cllvm-args=-sanitizer-coverage-pc-table -Cllvm-args=-sanitizer-coverage-trace-compares --cfg fuzzing -Cllvm-args=-simplifycfg-branch-fold-threshold=0 -Zsanitizer=address -Cdebug-assertions -Ccodegen-units=1 --target x86_64-apple-darwin --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=split-debuginfo --print=crate-name --print=cfg -Wwarnings` (exit status: 1)
  --- stderr
  error: the option `Z` is only accepted on the nightly compiler

  help: consider switching to a nightly toolchain: `rustup default nightly`

  note: selecting a toolchain with `+toolchain` arguments require a rustup proxy; see <https://rust-lang.github.io/rustup/concepts/index.html>

  note: for more information about Rust's stability policy, see <https://doc.rust-lang.org/book/appendix-07-nightly-rust.html#unstable-features>

  error: 1 nightly option were parsed

Probably because my rust toolchain is not managed by rustup

@blindFS blindFS mentioned this pull request Sep 5, 2025
2 tasks
@blindFS blindFS marked this pull request as ready for review September 5, 2025 14:00
@blindFS
Copy link
Copy Markdown
Contributor Author

blindFS commented Sep 5, 2025

@sholderbach Found a new one #16586 near

#15547010       REDUCE cov: 17498 ft: 98191 corp: 16516/561Kb lim: 64 exec/s: 653 rss: 6521Mb L: 43/64 MS: 1 EraseBytes-

That doesn't necessarily mean this PR won't cause new issues because the random seed of the fuzzer is not fixed?

Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not getting back to this. Let's merge this one I defer to your debugging on that and @ysthakur reviewing it with regards to the previous fix.

I had some crash logs from fuzzing I never got around to reducing but they should boil down to the def def you fixed with #16591

@sholderbach sholderbach merged commit 2e9d77c into nushell:main Sep 8, 2025
16 checks passed
@github-actions github-actions bot added this to the v0.108.0 milestone Sep 8, 2025
@blindFS blindFS deleted the fix_lexer branch September 9, 2025 16:06
@Bahex Bahex added the notes:ready The "Release notes summary" section of this PR is ready to be included in our release notes. label Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:parser Issues related to parsing notes:ready The "Release notes summary" section of this PR is ready to be included in our release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in parse_{let|mut|const} with colon and opposed parens/braces

4 participants