Skip to content

Fix parsing for octal strings#16678

Closed
Sheape wants to merge 2 commits intonushell:mainfrom
Sheape:parse-octal-string
Closed

Fix parsing for octal strings#16678
Sheape wants to merge 2 commits intonushell:mainfrom
Sheape:parse-octal-string

Conversation

@Sheape
Copy link
Copy Markdown
Contributor

@Sheape Sheape commented Sep 13, 2025

Fixes #9566.
Binary and hexadecimal strings have no issues in parsing because they are parsed at a chunk of 8 characters (max of 11111111 or 255 in decimal) and 2 characters (max of FF or 255 in decimal). However, for octals this is not the case, previously it was chunked by 3 characters (max of 777 which is 511 in decimal hence it exceeds u8::MAX).

I solved this by not chunking at all and instead parse the entire octal string as u128. Chunking will never work for octals since there isn't an exact number of chars that will fit exactly up to 255.

To see if it works, you can try this and it should parse & return true:

0o[71777342767] == 0b[111 001 111 111 111 011 100 010 111 110 111]
# => true

Release notes summary - What our users need to know

Fixed parsing for octal strings. Previously, only octal strings up to 0o[377] was allowed. Now, any octal strings can be parsed properly.

Tasks after submitting

N/A

@github-actions github-actions bot added the A:parser Issues related to parsing label Sep 13, 2025
@sholderbach
Copy link
Copy Markdown
Member

sholderbach commented Sep 14, 2025

I think while your reasoning here makes sense for the octal int literal 0o77777 etc. I am not sure if we should use this carry semantics for the binary/byte string literals:

In a C string you would for example write:

char str[] = "\377\377\377\0";

with a readable Nushell equivalent:

let str = 0o[377 377 377 000] # Note that you currently need to pass multiples of three for the octal coded byte string which makes the chunking consistent

If you want to express the carrying single int semantics you have the 0o1234567 syntax available and can convert into binary.

@Sheape
Copy link
Copy Markdown
Contributor Author

Sheape commented Sep 14, 2025

Oh yeah, mb I thought 0o[777] was acceptable and valid based on the discussion from the issue.
In a C string, the compiler just warns about out of range sequences. It goes something like this:
warning: octal escape sequence out of range

Maybe a more reasonable thing to do is to throw an error (with better error message) if a chunk of 3 is out-of-range? 0o[400] to 0o[777] is the range where octal strings are out of bounds.

@sholderbach
Copy link
Copy Markdown
Member

The error message can definitely be improved, I think it shouldn't fall back to immediately treating it as a command if you make a simple syntax error in such a unique syntax construct.

@sholderbach
Copy link
Copy Markdown
Member

For reference the motivating example in the top description can be achieved by:

(0o71777342767 | into binary --endian big --compact) == 0b[111 001 111 111 111 011 100 010 111 110 111]

@Sheape
Copy link
Copy Markdown
Contributor Author

Sheape commented Sep 14, 2025

Alright, I'll close this PR and open up a new PR for improved error message. Should I also include an error message for non-multiples of 3 for octals and other invalid strings such as 0b[1102] and 0x[zf32]?

@Sheape Sheape closed this Sep 14, 2025
@Sheape Sheape deleted the parse-octal-string branch September 14, 2025 23:22
@sholderbach
Copy link
Copy Markdown
Member

Alright, I'll close this PR and open up a new PR for improved error message. Should I also include an error message for non-multiples of 3 for octals and other invalid strings such as 0b[1102] and 0x[zf32]?

That sounds wonderful!

I think you don't have to fully distingquish between all the different variants in the error message and only report the expectations in one error message if that isn't too bad.

sholderbach pushed a commit that referenced this pull request Sep 17, 2025
Fixes #9566. Based on the decision from this PR #16678, instead of
dealing with out-of-range octal strings, throwing a specific error is
better.

Let me know if these error messages are too detailed and prefer just
having a single "out-of-range error".

## Release notes summary - What our users need to know

Previously, a generic shell error is thrown for invalid binary strings.
Now, an improve error message is thrown for invalid binary, hexadecimal,
and octal strings.
```nushell
> 0b[121]
# => Error: nu::parser::invalid_binary_string
# => 
# => × Invalid binary string.
# =>   ╭─[entry #1:1:1]
# => 1 │ 0b[121]
# =>   · ───┬───
# =>   ·    ╰── invalid binary string
# =>   ╰────
# =>  help: binary strings may contain only 0 or 1.
```
## Tasks after submitting

- N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:parser Issues related to parsing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0o[777] is not working

2 participants