[refurb] Parse more exotic decimal strings in `verbose-decimal-constructor (FURB157)` by dylwil3 · Pull Request #14098 · astral-sh/ruff

dylwil3 · 2024-11-04T23:37:12Z

FURB157 suggests replacing expressions like Decimal("123") with Decimal(123). This PR extends the rule to cover cases where the input string to Decimal can be easily transformed into an integer literal.

For example:

Decimal("1__000")   # fix: `Decimal(1000)`

Note: we do not implement the full decimal parsing logic from CPython on the grounds that certain acceptable string inputs to the Decimal constructor may be presumed purposeful on the part of the developer. For example, as in the linked issue, Decimal("١٢٣") is valid and equal to Decimal(123), but we do not suggest a replacement in this case.

Closes #13807

dylwil3 · 2024-11-04T23:40:01Z

Happy to hear any suggestions around the strange mix of mutability and shadowing in the implementation. I was attempting to avoid allocating a String if we didn't have to, and to not change too much from the implementation that already existed...

Also happy to delete some comments if they are contributing more to clutter than understanding.

github-actions · 2024-11-04T23:51:35Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

MichaReiser · 2024-11-05T06:55:10Z

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

+            // _after_ trimming whitespace from the string and removing all occurrences of "_".
+            let mut trimmed = Cow::from(str_literal.to_str().trim_whitespace());
+            if memchr::memchr(b'_', trimmed.as_bytes()).is_some() {
+                trimmed = Cow::from(trimmed.replace('_', ""));


Nit: I don't think memchr gives us much here, considering that most literals are very short. We also don't have to use replace and allcoate a Cow::Owned. Instead, I would use Cow::from(trimmed.trim_start_matches('_')) for better readability (with the added benefit that it doesn't allocate)

I'm fine not using memchr, but I do think we need to use replace because underscores may appear anywhere in the string. For example- Decimal("1_____00") is valid.

Edit: whoops, didn't see that was already addressed below

MichaReiser · 2024-11-05T07:05:03Z

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

+                // NB: We need not convert, e.g., `2e3` to `2000`.
+                // However, `2e+3` is a syntax error, so we remove
+                // the sign on the exponent.
+                rest = Cow::from(rest.replace('+', ""));


replace is technically O(n). Not that it matters much here but I think I would find it slightly more readable if we explicitly tested for the + sign and the expected position.

if rest[index + 1] == b'+' { rest.into_owned().remove(index + 1) }

The alternative is to use remove_matches which captures the semantic better than repalce with a empty string

I like remove_matches but isn't it nightly only? https://doc.rust-lang.org/std/string/struct.String.html#method.remove_matches

Ah dang. I guess replace is still the best option

MichaReiser · 2024-11-05T07:06:09Z

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

+                // 'e' was the last character in `rest`, and then the right
+                // hand side will be `""`.
+                let exponent = rest[index + 1..]
+                    .strip_prefix('+')


It seems that the exponential syntax also supports negative numbers:

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 1088 to 1100 in 51a9983

[b'e' | b'E', b'0'..=b'9', ..] | [b'e' | b'E', b'-' | b'+', b'0'..=b'9', ..] => {

// 'e' | 'E'

number.push(self.cursor.bump().unwrap());

if let Some(sign) = self.cursor.eat_if(|c| matches!(c, '+' | '-')) {

number.push(sign);

}

self.radix_run(&mut number, Radix::Decimal);

true

}

_ => is_float,

Yes, you can have negative exponents. But this lint should not suggest replacing Decimal("1e-1") with Decimal(1e-1). They are not the same! Decimal offers a way to represent decimal numbers in an exact way, something that floating point representations do not support.

>>>> Decimal("1e-1") == Decimal(1e-1) # 1e-1 = 0.1 can not be represented exactly as a float False >>>> Decimal("5e-1") == Decimal(5e-1) # Okay, 5e-1 = 1/2 can be represented exactly True

I think we need to be similarly careful with large numbers, even if they are integers:

>>>> Decimal("1e22") == Decimal(1e22) True >>>> Decimal("1e23") == Decimal(1e23) False

Just as @sharkdp points out, the omission of negative exponents was intentional. I didn't think about large exponents though, good point!

It's hard to figure out when this conversion is actually safe - parsing scientific notation is weird, and I can't seem to find a reference for the range where the representation is faithful. I think issues start to occur when numbers are larger than 1e16 but I'm having trouble finding a justification for that in code/documentation...

The smallest example I found so far was:

>>> Decimal(1801439850948199e1) == Decimal('1801439850948199e1') False

FURB157 should not rewrite strings to float literals. Literals with exponents are float literals, so they should not be passed to the Decimal constructor, or they will trigger decimal-from-float-literal (RUF032).

$ cat ruf032.py from decimal import Decimal Decimal(1e16) $ ruff check --isolated --preview --select RUF032 ruf032.py ruf032.py:2:9: RUF032 `Decimal()` called with float literal argument | 1 | from decimal import Decimal 2 | Decimal(1e16) | ^^^^ RUF032 | = help: Use a string literal instead Found 1 error. No fixes available (1 hidden fix can be enabled with the `--unsafe-fixes` option).

Well that makes life easier! Thank you!

MichaReiser · 2024-11-05T07:14:59Z

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

+            // NB: We have already checked that there is at most one 'e'.
+            if !rest
+                .bytes()
+                .all(|c| c.is_ascii_digit() || c == b'e' || c == b'E')


Does this support 1_2_3_4? or 1.1_2_3_4?

I think the idea above was to replace all _s, not just leading ones.

Good catch. I don't think replacing all of them is correct because _ are not permitted in the exponent part.

I think they are:

https://github.com/python/cpython/blob/ac556a2ad1213b8bb81372fe6fb762f5fcb076de/Lib/_pydecimal.py#L463

>>> Decimal("2e1_0") == Decimal(2e1_0) True

Understanding your own code is always the hardest. So what's not allowed is Decimal(2._1)

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 1072 to 1078 in 4323512

if self.cursor.eat_char('_') {

return self.push_error(LexicalError::new(

LexicalErrorType::OtherError("Invalid Syntax".to_string().into_boxed_str()),

TextRange::new(self.offset() - TextSize::new(1), self.offset()),

));

}

Yes. Leading/trailing underscores are also not supported in "plain" Python syntax: _10. But they are supported in Decimals syntax.

But I don't think there is a problem in this PR? All underscores are replaced. So if someone really has Decimal("_1_0_0__") in their codebase, we would suggest replacing that with Decimal(100), which is fine.

sharkdp · 2024-11-05T07:41:59Z

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

+                // NB: We need not convert, e.g., `2e3` to `2000`.
+                // However, `2e+3` is a syntax error, so we remove
+                // the sign on the exponent.


I don't think the positive sign in the exponent is a syntax error?

>>> 2e+3 2000.0 >>> Decimal(2e+3) Decimal('2000')

You're absolutely right! Dunno why I thought it was...

UnknownPlatypus · 2024-11-05T07:59:20Z

...linter/src/rules/refurb/snapshots/ruff_linter__rules__refurb__tests__FURB157_FURB157.py.snap

+24 | Decimal("__1____000") 
+25 | Decimal("2e4")
+   |
+   = help: Replace with `1000`


Maybe it would be nice to preserve the thousand separator in that case and suggest 1_000 instead of 1000. (If it's not too annoying to do)

That's a good suggestion, and I think it would be a nice followup! I think one would want to trim leading/trailing underscores, and then replace every internal, contiguous sequence of underscores with a single underscore. At that point it may be worth a little refactoring to just walk through the string once and abort early as appropriate...

sbrugman · 2024-11-05T12:48:59Z

crates/ruff_linter/resources/test/fixtures/refurb/FURB157.py

 Decimal(0)
 Decimal("Infinity")
 decimal.Decimal(0)
+


Could we please also include cases for Decimal("-inf") and Decimal("inf") (both should error).

dylwil3 · 2024-11-05T14:56:06Z

Sorry for sending everyone down an exponential rabbit hole, and thanks @dscorbett for digging me out of it!

Exponent handling has been removed since it takes us out of the world of integer literals.

MichaReiser · 2024-11-05T18:07:47Z

@dylwil3 feel free to merge at your own discretion

dylwil3 added 3 commits November 4, 2024 16:49

update fixture

14c4842

implement decimal parsing

fe01159

update snapshot

550f09b

MichaReiser approved these changes Nov 5, 2024

View reviewed changes

MichaReiser reviewed Nov 5, 2024

View reviewed changes

sharkdp reviewed Nov 5, 2024

View reviewed changes

UnknownPlatypus reviewed Nov 5, 2024

View reviewed changes

sbrugman reviewed Nov 5, 2024

View reviewed changes

dylwil3 added 3 commits November 5, 2024 08:53

update fixture

d003550

remove exponent handling since we are not dealing with floats

82d0a51

update snapshot

d4f6a81

dylwil3 merged commit 2b76fa8 into astral-sh:main Nov 5, 2024

dylwil3 deleted the decimal-parsing branch November 5, 2024 19:33

dhruvmanila added the bug Something isn't working label Nov 8, 2024

BrewTestBot mentioned this pull request Nov 8, 2024

ruff 0.7.3 Homebrew/homebrew-core#197081

Merged

ntBre mentioned this pull request Jun 9, 2025

PLW0177 has false negatives for strings that represent NaN #18596

Closed

	[b'e' \| b'E', b'0'..=b'9', ..] \| [b'e' \| b'E', b'-' \| b'+', b'0'..=b'9', ..] => {
	// 'e' \| 'E'
	number.push(self.cursor.bump().unwrap());

	if let Some(sign) = self.cursor.eat_if(\|c\| matches!(c, '+' \| '-')) {
	number.push(sign);
	}

	self.radix_run(&mut number, Radix::Decimal);

	true
	}
	_ => is_float,


	if self.cursor.eat_char('_') {
	return self.push_error(LexicalError::new(
	LexicalErrorType::OtherError("Invalid Syntax".to_string().into_boxed_str()),
	TextRange::new(self.offset() - TextSize::new(1), self.offset()),
	));
	}

Conversation

dylwil3 commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dylwil3 commented Nov 4, 2024

Uh oh!

github-actions bot commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylwil3 Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharkdp Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dylwil3 commented Nov 5, 2024

Uh oh!

MichaReiser commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dylwil3 commented Nov 4, 2024 •

edited

Loading

github-actions bot commented Nov 4, 2024 •

edited

Loading

`ruff-ecosystem` results

dylwil3 Nov 5, 2024 •

edited

Loading

sharkdp Nov 5, 2024 •

edited

Loading

MichaReiser Nov 5, 2024 •

edited

Loading

MichaReiser Nov 5, 2024 •

edited

Loading