Skip to content

Fix spread operator lexing in records#15023

Merged
132ikl merged 3 commits intonushell:mainfrom
ysthakur:fix-spread-lex
Feb 11, 2025
Merged

Fix spread operator lexing in records#15023
132ikl merged 3 commits intonushell:mainfrom
ysthakur:fix-spread-lex

Conversation

@ysthakur
Copy link
Copy Markdown
Member

@ysthakur ysthakur commented Feb 6, 2025

Description

Zyphys found that when parsing {...{}, ...{}, a: 1}, the a: would be considered one token, leading to a parse error (Discord message). This PR fixes that.

What would happen is that while getting tokens, the following would happen in a loop:

  1. Get the next two tokens while treating : as a special character (so we get the next field key and a colon token)
  2. Get the next token while not treating : as a special character (so we get the next value)

I didn't update this when I added the spread operator. With {...{}, ...{}, a: 1}, the first two tokens would be ...{} and ...{}, and the next token would be a:. This PR changes this loop to first get a single token, check if it's spreading a record, and move on if so.

Alternatives considered:

  • Treat : as a special character when getting the value too. This would simplify the loop greatly, but would mean you can't use colons in values.
  • Merge the loop for getting tokens and the loop for parsing those tokens. I tried this, but it complicates things if you run into a syntax error and want to create a garbage span going to the end of the record.

User-Facing Changes

Nothing new

When parsing {...{}, ...{}, a: 1}, the "a:" would be considered one
token
@ysthakur ysthakur added A:syntax Changes to the grammar or syntax beyond parser bugfixes notes:fixes Include the release notes summary in the "Bug fixes" section labels Feb 6, 2025
@132ikl
Copy link
Copy Markdown
Member

132ikl commented Feb 11, 2025

lgtm, thanks!

@132ikl 132ikl merged commit 1128fa1 into nushell:main Feb 11, 2025
15 checks passed
@github-actions github-actions bot added this to the v0.103.0 milestone Feb 11, 2025
@ysthakur ysthakur deleted the fix-spread-lex branch February 11, 2025 15:28
@sholderbach
Copy link
Copy Markdown
Member

This seems to introduce a memory leak #15243

e.g. {a:b}/ will spin to death

@ysthakur
Copy link
Copy Markdown
Member Author

ysthakur commented Mar 5, 2025

Whoops, good catch. #15246 should fix it.

sholderbach pushed a commit that referenced this pull request Mar 5, 2025
Fixes #15243

# Description

As noted in #15243, a record with more characters after it (e.g.,
`{a:b}/`) will cause an OOM due to an infinite loop, introduced by
#15023. This happens because the entire string `{a:b}/` is lexed as one
token and passed to `parse_record`, where it repeatedly lexes until it
hits the closing `}`. This PR detects such extra characters and reports
an error.

# User-Facing Changes

`{a:b}/` and other such constructions will no longer cause infinite
loops. Before #15023, you would've seen an "Unclosed delimiter" error
message, but this PR changes that to "Invalid characters."

```
Error: nu::parser::extra_token_after_closing_delimiter

  × Invalid characters after closing delimiter
   ╭─[entry #5:1:7]
 1 │  {a:b}/
   ·       ┬
   ·       ╰── invalid characters
   ╰────
  help: Try removing them.
```

# Tests + Formatting

# After Submitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:syntax Changes to the grammar or syntax beyond parser bugfixes notes:fixes Include the release notes summary in the "Bug fixes" section

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants