Feature: Allow underscores as digit separators in numeric literals#5338
Feature: Allow underscores as digit separators in numeric literals#5338Maxxen wants to merge 5 commits intoduckdb:masterfrom
Conversation
|
Ok, @Tishj pointed out that underscores probably will work for exponents in integers, but it won't work for floats... I'll add tests for it later and maybe update the float case as well to be consistent. |
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! I have been playing around with this for a bit and found the following issues:
- Double-casting does not seem to work correctly in all cases:
D select 0____0_____0_____________1_____0__________0000_00_0__0.0 as a;
┌────────┐
│ a │
│ double │
├────────┤
│ 0.0 │
└────────┘
D select '1_______________________________________0.0'::double;
┌───────────────────────────────────────────────────────────────┐
│ CAST('1_______________________________________0.0' AS DOUBLE) │
│ double │
├───────────────────────────────────────────────────────────────┤
│ 1.0 │
└───────────────────────────────────────────────────────────────┘- Selected decimal width appears to be affected by underscores:
D select 1_0.0;
┌──────────────┐
│ 10.0 │
│ decimal(4,1) │
├──────────────┤
│ 10.0 │
└──────────────┘
D select 1_______0.0;
┌───────────────┐
│ 10.0 │
│ decimal(10,1) │
├───────────────┤
│ 10.0 │
└───────────────┘| @@ -0,0 +1,175 @@ | |||
| # name: test/sql/parser/digit_separators.test | |||
| # group: [parser] | |||
|
|
|||
There was a problem hiding this comment.
Could we enable verification for this test?
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! Nice work hacking around all of the numeric casting code. Some comments below:
| # group: [parser] | ||
|
|
||
| query I | ||
| SELECT 1__2 |
There was a problem hiding this comment.
Python only allows individual underscores, not multiple underscores following each other, e.g. this would throw an error in Python:
>>> 1__2
File "<stdin>", line 1
1__2
^
SyntaxError: invalid decimal literalI'm not too opposed to keeping support for multiple subsequent underscores but it does allow for constructing very unreadable numbers.
| if (pos == start_digit) { | ||
| return false; | ||
| } | ||
| while (buf[pos] == '_') { |
There was a problem hiding this comment.
Should this not check the length as well to ensure we don't read out-of-bounds?
| if (pos == start_pos) { | ||
| return false; | ||
| } | ||
| while (buf[pos] == '_') { |
There was a problem hiding this comment.
Same here - should we not check the length here?
|
Closing this for now |
Closes #5112
This adds underscores as optional digit separators in numeric literals, both for integers and reals. They can be used in both the integer and decimal part, however, they are not valid in exponents.
A literal component can not start or end with an underscore either.
Examples, see test/sql/parser/digit_separators.test
I had to modify the
fast_float.hlibrary to also skip over underscores. I don't know if this is too invasive of a change but the only other option is to do another allocaiton/filter step in the cast function before invoking the parsing, which seems wasteful.