Consider unterminated f-strings in `FStringRanges` by dhruvmanila · Pull Request #8154 · astral-sh/ruff

dhruvmanila · 2023-10-24T05:27:01Z

Summary

This PR removes the debug_assertion in the Indexer to allow unterminated f-strings. This is mainly a fix in the development build which now matches the release build.

The fix is simple: remove the debug_assertion which means that the there could be FStringStart and possibly FStringMiddle tokens without a corresponding f-string range in the Indexer. This means that the code requesting for the f-string index need to account for the None case, making the code safer.

This also updates the code which queries the FStringRanges to account for the None case. This will happen when the FStringStart / FStringMiddle tokens are present but the FStringEnd token isn't which means that the Indexer won't contain the range for that f-string.

Test Plan

cargo test

Taking the following code as an example:

f"{123}

This only emits a FStringStart token, but no FStringMiddle or FStringEnd tokens.

And,

f"\.png${

This emits a FStringStart and FStringMiddle token, but no FStringEnd token.

fixes: #8065

dhruvmanila · 2023-10-24T05:27:13Z

Current dependencies on/for this PR:

main
- PR Consider unterminated f-strings in FStringRanges #8154 👈
  - PR Correct quick fix message for W605 #8156

This comment was auto-generated by Graphite.

dhruvmanila · 2023-10-24T05:33:04Z

crates/ruff_python_index/src/fstring_ranges.rs

+        if !self.start_locations.is_empty() {
+            debug!(
+                "Unterminated f-strings detected at: {:?}",
+                self.start_locations
+            );
+        }


I guess this isn't really required because we'd be providing diagnostics for the same.

I wonnder if it would be useful to store the range with a missing and position instead of omitting it entirely.

To extend on this: It might even be more correct to simply assume that everything after the fstring start token is inside the fstring if the end token is missing. However, what the correct interpretation is might depend on the specific usage. E.g. adding noqa comments might be a bad idea for unclosed fstring. Rules testing if the range is part of a multiline string could get away by assuming the string goes to the end of the file

I thought of the same in one of my proposed solution.

For rules, they're only being used to check W605 (invalid escape sequence) and ISC (implicit string concatenation) which will be fine. For the former, the fix will be created (see below) while for later the fix won't be created.

-f"\.png{ +rf"\.png{

The NoQA logic will need to be looked into though. We might have to somehow mark these ranges as "incomplete from source code point of view".

I'm also fine with adding the entire range, but just omitting the range of an unterminated string seems wrong to me (I rather overestimate than underestimate the f-string range)

We should probably avoid adding NoQA directives then otherwise it won't be lexed as a comment and the --add-noqa will keep on adding them.

f"\.png # noqa: W605 # noqa: W605 # ...

I'm not sure how to store this info. One solution could be to update the signature from BTreeMap<TextSize, TextRange> to BTreeMap<TextSize, FStringRange> where:

struct FStringRange { range: TextRange, // Is this f-string complete i.e., does it have both the start and end tokens? complete: bool }

Or, while computing the NoQA mapping we can ignore these ranges by somehow checking if they're for a complete f-string or not.

@charliermarsh Any thoughts here?

If we did change the type to TreeMap<TextSize, FStringRange>, where would we then use that information to avoid adding NOQA?

Earlier I thought that ignoring such ranges here would work:

ruff/crates/ruff_linter/src/directives.rs

Lines 154 to 171 in 29fb86e

// For nested f-strings, we expect `noqa` directives on the last line of the

// outermost f-string. The last f-string range will be used to skip over

// the inner f-strings.

let mut last_fstring_range: TextRange = TextRange::default();

for fstring_range in indexer.fstring_ranges().values() {

if !locator.contains_line_break(*fstring_range) {

continue;

}

if last_fstring_range.contains_range(*fstring_range) {

continue;

}

let new_range = TextRange::new(

locator.line_start(fstring_range.start()),

fstring_range.end(),

);

fstring_mappings.push(new_range);

last_fstring_range = new_range;

}

But, I don't think so as then, by default, Ruff will add the directive at the end of the line.

I think we can update this function to return a Option<TextSize> instead:

ruff/crates/ruff_linter/src/noqa.rs

Lines 746 to 762 in 29fb86e

pub(crate) fn resolve(&self, offset: TextSize) -> TextSize {

let index = self.ranges.binary_search_by(|range| {

if range.end() < offset {

std::cmp::Ordering::Less

} else if range.contains(offset) {

std::cmp::Ordering::Equal

} else {

std::cmp::Ordering::Greater

}

});

if let Ok(index) = index {

self.ranges[index].end()

} else {

offset

}

}

which will be used here and continue on None:

ruff/crates/ruff_linter/src/noqa.rs

Line 527 in 29fb86e

let noqa_offset = noqa_line_for.resolve(diagnostic.start());

Note that even if we somehow support this, it'll be impossible to ignore any violation on that line manually. For example,

# Unterminated f-string f"\.png{x} # noqa: W605

Here, the noqa comment is part of the f-string and is not a comment.

The plan you described here makes sense to me. Thanks for walking me through it.

I've a local branch with a possible implementation for this but there are a few complexities involved. I'll proceed with merging this and open a new PR with that change instead.

github-actions · 2023-10-24T05:49:53Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

konstin · 2023-10-24T07:40:16Z

Shouldn't unterminated f-strings be caught in the parser?

dhruvmanila · 2023-10-24T09:18:00Z

Shouldn't unterminated f-strings be caught in the parser?

Yes, they're being caught. The issue which was highlighted in the linked issue is a debug assertion present in constructing the f-string ranges in the indexer. And, the indexer uses the flattened tokens for that purpose which ignores the error tokens.

For the following code snippet,

f"{123}

We would've panicked but only in the debug build which is what this PR fixes.

Actually, the release build could panic as well with:

f"\.png${

And, running with:

$ pipx run "ruff==0.1.1" check --isolated --no-cache --select=W605,E999 ~/playground/ruff/fstring.py --fix
error: Panicked while linting /Users/dhruv/playground/ruff/fstring.py: This indicates a bug in Ruff. If you could open an issue at:

    https://github.com/astral-sh/ruff/issues/new?title=%5BLinter%20panic%5D

...with the relevant file contents, the `pyproject.toml` settings, and the following stack trace, we'd be very appreciative!

panicked at 'called `Option::unwrap()` on a `None` value', crates/ruff_linter/src/rules/pycodestyle/rules/invalid_escape_sequence.rs:175:18
Backtrace:    0: _rust_eh_personality
   1: _main
   2: _rust_eh_personality
   3: _rust_eh_personality
   4: _rust_eh_personality
   5: _rust_eh_personality
   6: __rjem_je_witnesses_cleanup
   7: __rjem_je_witnesses_cleanup
   8: _main
   9: _main
  10: _main
  11: _main
  12: _main
  13: _main
  14: _main
  15: _main
  16: start
  17: start
  18: _main

dhruvmanila · 2023-10-24T10:03:54Z

crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs

-            (Tok::String { .. }, Tok::FStringStart) => (
-                *a_range,
-                indexer.fstring_ranges().innermost(b_range.start()).unwrap(),
-            ),
-            (Tok::FStringEnd, Tok::String { .. }) => (
-                indexer.fstring_ranges().innermost(a_range.start()).unwrap(),
-                *b_range,
-            ),
-            (Tok::FStringEnd, Tok::FStringStart) => (
-                indexer.fstring_ranges().innermost(a_range.start()).unwrap(),
-                indexer.fstring_ranges().innermost(b_range.start()).unwrap(),
-            ),


These unwrap's aren't safe at all as I thought earlier. Ruff can panic on certain edge cases. It's better to account for the None case.

konstin · 2023-10-24T12:03:11Z

Yes, they're being caught. The issue which was highlighted in the linked issue is a debug assertion present in constructing the f-string ranges in the indexer. And, the indexer uses the flattened tokens for that purpose which ignores the error tokens.

Sorry, i missed that the index is built during parsing, not after

charliermarsh · 2023-10-24T12:47:58Z

I believe the Indexer is built before parsing, but not as part of parsing itself. We need to build it regardless of whether the source code is valid, because we do support linting invalid programs (at least for the token-based rules).

MichaReiser

I would prefer if we express the fact that unterminated f-strings are not stored in the return type of innermost rather than just a note but are fine moving forward with this solution.

dhruvmanila force-pushed the dhruv/unterminated-fstring branch from 7880903 to c755335 Compare October 24, 2023 05:31

dhruvmanila commented Oct 24, 2023

View reviewed changes

dhruvmanila added the bug Something isn't working label Oct 24, 2023

dhruvmanila requested review from MichaReiser and konstin October 24, 2023 05:34

dhruvmanila mentioned this pull request Oct 24, 2023

Correct quick fix message for W605 #8156

Merged

dhruvmanila commented Oct 24, 2023

View reviewed changes

konstin approved these changes Oct 24, 2023

View reviewed changes

dhruvmanila force-pushed the dhruv/unterminated-fstring branch 2 times, most recently from 1e980fd to 29fb86e Compare October 25, 2023 13:43

MichaReiser approved these changes Oct 26, 2023

View reviewed changes

Allow unterminated f-strings in the indexer

5398c07

dhruvmanila force-pushed the dhruv/unterminated-fstring branch from a7c0258 to 5398c07 Compare October 27, 2023 06:13

dhruvmanila linked an issue Oct 27, 2023 that may be closed by this pull request

[Linter panic] Trouble handling a syntax error #8262

Closed

dhruvmanila closed this Oct 27, 2023

dhruvmanila reopened this Oct 27, 2023

dhruvmanila changed the title ~~Allow unterminated f-strings in the indexer~~ Consider unterminated f-strings in FStringRanges Oct 27, 2023

dhruvmanila enabled auto-merge (squash) October 27, 2023 11:08

dhruvmanila merged commit 097e703 into main Oct 27, 2023

dhruvmanila deleted the dhruv/unterminated-fstring branch October 27, 2023 11:11

dhruvmanila mentioned this pull request Oct 27, 2023

WIP: Add unterminated f-string range in Indexer #8281

Closed

miccal mentioned this pull request Nov 3, 2023

ruff 0.1.4 Homebrew/homebrew-core#153286

Merged

dhruvmanila mentioned this pull request Jan 17, 2024

[Linter panic] panicked at 'called Option::unwrap() in pycodestyle invalid_escape_sequence.rs #9542

Closed

	// For nested f-strings, we expect `noqa` directives on the last line of the
	// outermost f-string. The last f-string range will be used to skip over
	// the inner f-strings.
	let mut last_fstring_range: TextRange = TextRange::default();
	for fstring_range in indexer.fstring_ranges().values() {
	if !locator.contains_line_break(*fstring_range) {
	continue;
	}
	if last_fstring_range.contains_range(*fstring_range) {
	continue;
	}
	let new_range = TextRange::new(
	locator.line_start(fstring_range.start()),
	fstring_range.end(),
	);
	fstring_mappings.push(new_range);
	last_fstring_range = new_range;
	}

	pub(crate) fn resolve(&self, offset: TextSize) -> TextSize {
	let index = self.ranges.binary_search_by(\|range\| {
	if range.end() < offset {
	std::cmp::Ordering::Less
	} else if range.contains(offset) {
	std::cmp::Ordering::Equal
	} else {
	std::cmp::Ordering::Greater
	}
	});

	if let Ok(index) = index {
	self.ranges[index].end()
	} else {
	offset
	}
	}

Conversation

dhruvmanila commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

dhruvmanila commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Check Results

Ecosystem

Uh oh!

konstin commented Oct 24, 2023

Uh oh!

dhruvmanila commented Oct 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

konstin commented Oct 24, 2023

Uh oh!

charliermarsh commented Oct 24, 2023

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dhruvmanila commented Oct 24, 2023 •

edited

Loading

dhruvmanila commented Oct 24, 2023 •

edited

Loading

dhruvmanila Oct 26, 2023 •

edited

Loading

github-actions bot commented Oct 24, 2023 •

edited

Loading