Skip to content

Fix unterminated loop in parse_record#15246

Merged
sholderbach merged 1 commit intonushell:mainfrom
ysthakur:fix-record-parse
Mar 5, 2025
Merged

Fix unterminated loop in parse_record#15246
sholderbach merged 1 commit intonushell:mainfrom
ysthakur:fix-record-parse

Conversation

@ysthakur
Copy link
Copy Markdown
Member

@ysthakur ysthakur commented Mar 5, 2025

Fixes #15243

Description

As noted in #15243, a record with more characters after it (e.g., {a:b}/) will cause an OOM due to an infinite loop, introduced by #15023. This happens because the entire string {a:b}/ is lexed as one token and passed to parse_record, where it repeatedly lexes until it hits the closing }. This PR detects such extra characters and reports an error.

User-Facing Changes

{a:b}/ and other such constructions will no longer cause infinite loops. Before #15023, you would've seen an "Unclosed delimiter" error message, but this PR changes that to "Invalid characters."

Error: nu::parser::extra_token_after_closing_delimiter

  × Invalid characters after closing delimiter
   ╭─[entry #5:1:7]
 1 │  {a:b}/
   ·       ┬
   ·       ╰── invalid characters
   ╰────
  help: Try removing them.

Tests + Formatting

After Submitting

@ysthakur ysthakur added A:parser Issues related to parsing notes:fixes Include the release notes summary in the "Bug fixes" section labels Mar 5, 2025
@ysthakur ysthakur self-assigned this Mar 5, 2025
@sholderbach
Copy link
Copy Markdown
Member

Thanks for the quick turnaround on this!

Not fully up to speed: are there any legal situations where you can have another character directly following the record, (which is not already part of the larger lex set that was used before, (stuff like ; or } would not have lead to the panic)

Just to make sure we are not locking out legal grammar with an invalid character error.

@ysthakur
Copy link
Copy Markdown
Member Author

ysthakur commented Mar 5, 2025

@sholderbach I'm not super knowledgeable about the parser either, but previously, if parse_record received a span containing other characters after }, it would've triggered this check anyway:

if bytes.ends_with(b"}") {
end -= 1;
} else {
working_set.error(ParseError::Unclosed("}".into(), Span::new(end, end)));
}

So I'm pretty sure this PR should just change what error you get ("unclosed delimiter" -> "extra tokens") without changing the rules.

There's the edge case of {a:b}} and [{a:b}}, where you have junk after the record but it still ends in a }. But such a span won't actually be passed to parse_record - it's flagged as an unbalanced delimiter before that.

@sholderbach
Copy link
Copy Markdown
Member

Yeah sounds like any new errors should be pretty localized with the remaining being caught further up the stack.

Let's try this and see if we have any other edge-cases if they come up. (I have got to admit that I didn't grok the previous state but the more explicit state variables seem more debuggable to me :D)

@sholderbach sholderbach merged commit b1e591f into nushell:main Mar 5, 2025
15 checks passed
@github-actions github-actions bot added this to the v0.103.0 milestone Mar 5, 2025
@ysthakur ysthakur deleted the fix-record-parse branch March 8, 2025 00:09
sholderbach pushed a commit that referenced this pull request Sep 8, 2025
Fixes #16284 

# Description

#15246 happens to rely on the lexer's bug to work.
Changed to a workaround solution in this PR, not sure that's good enough
in terms of perf and accuracy.
@ysthakur Please take your time to review the changes of `parse_record`
with care.

# User-Facing Changes

# Tests + Formatting

+1

# After Submitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:parser Issues related to parsing notes:fixes Include the release notes summary in the "Bug fixes" section

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM for record followed by other character.

2 participants