[red-knot] Slice expression types & subscript expressions with slices by sharkdp · Pull Request #13917 · astral-sh/ruff

sharkdp · 2024-10-24T18:10:00Z

Summary

Add a new Type::SliceLiteral variant
Infer SliceLiteral types for slice expressions, such as <int-literal>:<int-literal>:<int-literal>.
Infer "sliced" literal types for subscript expressions using slices, such as <string-literal>[<slice-literal>].
Infer types for expressions involving slices of tuples: <tuple>[<slice-literal>].

Eye candy

t = (1, (), True, "a", None, b"b")

reveal_type(t[-2::-2])  # revealed: tuple[None, Literal[True], Literal[1]]

Test Plan

Unit tests for indexing/slicing utility functions
Markdown-based tests for
- Subscript expressions tuple[slice]
- Subscript expressions string_literal[slice]
- Subscript expressions bytes_literal[slice]

github-actions · 2024-10-24T18:23:37Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

sharkdp · 2024-10-28T11:45:32Z

crates/red_knot_python_semantic/src/types.rs

            Type::StringLiteral(str) => Truthiness::from(!str.value(db).is_empty()),
            Type::LiteralString => Truthiness::Ambiguous,
            Type::BytesLiteral(bytes) => Truthiness::from(!bytes.value(db).is_empty()),
+            Type::SliceLiteral(_) => Truthiness::AlwaysTrue,


slice appears to have no custom __bool__ logic; even empty slices and things like slice(None, None) are True.

sharkdp · 2024-10-28T11:52:57Z

crates/red_knot_python_semantic/src/types.rs

+#[salsa::interned]
+pub struct SliceLiteralType<'db> {
+    start: Option<i64>,
+    stop: Option<i64>,
+    step: Option<i64>,
+}


Another option I considered was to inline this within Type. This would increase the size of Type. Unless we think it's okay to restrict indices for literal slice types to i32. Which might be fine. It seems unlikely that someone has a >2 GiB literal string inside their Python source code and still cares about inferring a static type for something like str_literal[4_294_967_297:].

I think using an i32 is probably okay. Ruff/Red Knot also only supports files len(source) = u32::MAX

Switched to i32 now as this also solves some problems on 32 bit platforms (see other comment). It's still salsa::interned for now, as the current size (2 × Option<i32> + 1 × Option<NonZero<i32>> = 20 byte) is still larger than Type at the moment (16 byte). If we absolutely want to avoid having this interned, we could certainly also go lower with the index range or do some more questionable hacks like using i32::MIN as a sentinel value instead of using Option, …

sharkdp · 2024-10-28T13:08:59Z

crates/red_knot_python_semantic/src/types/infer.rs

+    pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) {
+        self.add_diagnostic(
+            node,
+            "slice-step-zero",


Do we have a policy for choosing these names?

Not yet. But it's probably going to follow Ruff's naming schema (internal document)

Ok, I tried something else for now.

sharkdp · 2024-10-28T13:14:34Z

crates/red_knot_python_semantic/src/types/infer.rs

+            (Type::Tuple(tuple_ty), Type::IntLiteral(int)) if i32::try_from(int).is_ok() => {
                let elements = tuple_ty.elements(self.db);
                elements
                    .iter()
-                    .python_subscript(int)
+                    .py_index(i32::try_from(int).unwrap())
                    .copied()
-                    .unwrap_or_else(|| {
+                    .unwrap_or_else(|_| {


This looks like a regression in terms of (1) code quality and (2) functionality, as we previously supported indices up to i64 size. However, this had some drawbacks on 32 bit platforms. If the (absolute value of the) i64 index did not fit into usize, we would show an "index out of bounds error", even though we hadn't even checked the size of the input. It's also highly unlikely that someone would construct a tuple/literal-string/literal-bytes object > 2GiB in size and still care about subscript type inference on that expression. Restricting indices to 32bit makes the error handling much easier, as the index->usize conversion can not fail.

sharkdp · 2024-10-28T13:15:23Z

crates/red_knot_python_semantic/src/types/infer.rs

+                if let Ok(new_elements) = elements.iter().py_slice(start, stop, step) {
+                    let new_elements: Vec<_> = new_elements.copied().collect();
+                    Type::Tuple(TupleType::new(self.db, new_elements.into_boxed_slice()))
+                } else {


I would have preferred to explicitly match on Err(StepSizeZero) here, but clippy forces me to use if let … else ….

hehe... that's one of the pedantic lint rules that I'm very keen on disabling if I can find the necessary support.

sharkdp · 2024-10-28T14:05:39Z

crates/red_knot_python_semantic/src/types/infer.rs

+                let start = slice_ty.start(self.db);
+                let stop = slice_ty.stop(self.db);
+                let step = slice_ty.step(self.db);
+                let chars: Vec<_> = literal_value.chars().collect();


This additional allocation is unfortunate, but .chars() is not an ExactSizeIterator (Unicode…), and I currently don't see how we can implement the most general form of slicing (e.g. something like string_literal[3:-5]) on DoubleEndedIterator alone. I could avoid the allocation by computing the length upfront (O(n)), but I'm not sure if it's worth investing more time into this.

I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.

sharkdp · 2024-10-28T14:33:20Z

crates/red_knot_python_semantic/resources/mdtest/subscript/bytes.md

+reveal_type(b[0:2])  # revealed: Literal[b"\x00a"]
+reveal_type(b[-3:])  # revealed: Literal[b"bc\xff"]


I didn't bother repeating all tests here, since that seems sufficiently covered by the tuple/string cases + the slicing unit tests.

sharkdp · 2024-10-28T14:34:17Z

crates/red_knot_python_semantic/resources/mdtest/subscript/string.md

 reveal_type(b)  # revealed: Unknown
 ```

+## Slices


These tests are not exhaustive in terms of checking slicing functionality, as we do that in the unit tests already. Let me know if you think otherwise.

No, that seems fine!

crates/red_knot_python_semantic/src/types/infer.rs

crates/red_knot_python_semantic/src/types/display.rs

MichaReiser · 2024-10-28T14:47:36Z

crates/red_knot_python_semantic/src/types/infer.rs

+    pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) {
+        self.add_diagnostic(
+            node,
+            "slice-step-zero",


Not yet. But it's probably going to follow Ruff's naming schema (internal document)

crates/red_knot_python_semantic/src/types/infer.rs

MichaReiser · 2024-10-28T14:49:43Z

crates/red_knot_python_semantic/src/types/infer.rs

+                let start = slice_ty.start(self.db);
+                let stop = slice_ty.stop(self.db);
+                let step = slice_ty.step(self.db);
+                let chars: Vec<_> = literal_value.chars().collect();


I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.

crates/red_knot_python_semantic/src/util/subscript.rs

MichaReiser · 2024-10-28T14:52:48Z

crates/red_knot_python_semantic/src/util/subscript.rs

+        Self::Item: 'a;
+}
+
+impl<I, T> PySlice for T


I haven't looked into the usages of the PySlice type but would it make sense to implement it for &[T] instead of implementing it on iterators? Or are there cases where we only have an iterator?

Yes, thanks. We only use it on slices if we are okay with collecting the chars in the LiteralString case.

MichaReiser · 2024-10-28T14:53:59Z

crates/red_knot_python_semantic/src/util/subscript.rs

+                Ordering::Equal | Ordering::Greater => (start, 0, step),
+            };
+
+            Ok(Box::new(self.skip(skip).take(take).step_by(step)))


I think you could use Itertools::iter::Either to return one or the other iterator type without allocating

Oh, cool — I didn't know about itertools::Either! Needed to adapt the zero-case a bit to also fit into the Either::Left side, but that's okay.

sharkdp · 2024-10-28T18:24:47Z

crates/red_knot_python_semantic/resources/mdtest/subscript/bytes.md

+b[0:4:0]  # error: [zero-stepsize-in-slice]
+b[:4:0]  # error: [zero-stepsize-in-slice]
+b[0::0]  # error: [zero-stepsize-in-slice]
+b[::0]  # error: [zero-stepsize-in-slice]


This is a weird edge case that pyright and mypy do not detect (mypy crashes). I'm not sure if it's worth having a separate diagnostic for, but it was easily to implement. We could also ignore it and simplify infer a non-literal type (slice) for something like ::0.

Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.

sharkdp · 2024-10-28T20:35:37Z

crates/red_knot_python_semantic/src/types/infer.rs

+                        self.add_diagnostic(
+                            AnyNodeRef::ExprSlice(slice), // TODO
+                            "zero-stepsize-in-slice",
+                            format_args!("Slice step size can not be zero"),
+                        );


In a previous version of this PR, I deferred reporting this error to the usage site where we try to slice a string/bytes-literal or a tuple. Raising the diagnostic here is too early. The slice might be used on a type with a custom __getitem__ impl that makes use of 0 as a step size somehow.

I'll fix this.

carljm

Looks good to me, modulo the already-commented step-zero error change!

carljm · 2024-10-29T02:36:18Z

crates/red_knot_python_semantic/resources/mdtest/subscript/bytes.md

+b[0:4:0]  # error: [zero-stepsize-in-slice]
+b[:4:0]  # error: [zero-stepsize-in-slice]
+b[0::0]  # error: [zero-stepsize-in-slice]
+b[::0]  # error: [zero-stepsize-in-slice]


Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.

carljm · 2024-10-29T02:38:07Z

crates/red_knot_python_semantic/resources/mdtest/subscript/string.md

 reveal_type(b)  # revealed: Unknown
 ```

+## Slices


No, that seems fine!

AlexWaygood

Nice, this is great! Some post-merge review nitpicks below, but nothing major.

One other more significant thing to note is that it looks like we don't catch this error currently, and ideally at some point we will:

>>> "foo"["bar":"baz"]
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    "foo"["bar":"baz"]
    ~~~~~^^^^^^^^^^^^^
TypeError: slice indices must be integers or None or have an __index__ method

It seems we just infer "foo["bar":"baz"] as being @Todo: this is because we infer "bar":"baz" (accurately) as being a builtins.slice instance, which means we fallback to generic "call str.__getitem__ with slice passed in" logic, and currently we get @Todo as the result whenever we try to call str.__getitem__ because it's overloaded and we don't understand overloads yet. The problem here is that even when we do understand overloads we won't catch this error with the logic as we currently have it; we'll just start inferring str as the result.

For comparison, mypy does catch this error, but it also incorrectly flags code like this, which should pass without error:

>>> class Spam:
...     def __getitem__(self, item: slice | int) -> int:
...         return 42
...         
>>> Spam()["foo":"bar"]
42

I think this problem will naturally solve itself once we:

Add better annotations to str.__getitem__ in typeshed (it should probably accept slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None]now thatsliceis generic, rather than justslice`)
Support generic types in red-knot.

But it might be worth adding a TODO comment somewhere mentioning that this is something we need to address

AlexWaygood · 2024-10-29T16:10:32Z

crates/red_knot_python_semantic/src/types/infer.rs

+            Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => {
+                SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm")))
+            }


I guess you could avoid the .expect() call here by doing something like:

Suggested change

Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => {

SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm")))

}

Some(Type::IntLiteral(n)) => match i32::try_from(n) {

Ok(i) => SliceArg::Arg(Some(i)),

Err(_) => SliceArg::Unsupported

}

And probably similar elsewhere. But it's probably not very important; the way you've done it is obviously safe

I guess you could avoid the .expect() call here by doing something like:

Yes. I added the other three uses of this before. And there I really need/want it as a pattern guard, because I want control flow to fall through if i32::try_from fails. Because I don't want to repeat the whole logic for the "else" clause.

But right here, your solution is definitely better. Changed in #13982

AlexWaygood · 2024-10-29T16:26:09Z

crates/red_knot_python_semantic/src/util/subscript.rs

+    debug_assert!(index >= 0);
+
+    // SAFETY: `index` is non-negative, and `usize` is at least 32 bits.
+    usize::try_from(index).unwrap()


nit: I probably would have written this as

Suggested change

debug_assert!(index >= 0);

// SAFETY: `index` is non-negative, and `usize` is at least 32 bits.

usize::try_from(index).unwrap()

usize::try_from(index)

.expect("Should only ever pass a positive integer to `from_nonnegative_i32`")

Changed in #13982

AlexWaygood · 2024-10-29T16:31:54Z

crates/red_knot_python_semantic/src/types/infer.rs

+                let stop = slice_ty.stop(self.db);
+                let step = slice_ty.step(self.db);
+
+                if let Ok(new_elements) = elements.as_ref().py_slice(start, stop, step) {


nit: we could probably avoid the .as_ref() call here and elsewhere if we implemented PySlice for &Box<[T]> as well as &[T]

Maybe you have a nice solution for this, but I quickly tried and everything I got required changes to the trait structure or other larger-scale code changes. I left those two .as_ref()s for now.

AlexWaygood · 2024-10-29T16:40:16Z

crates/red_knot_python_semantic/src/types/infer.rs

+                let start = slice_ty.start(self.db);
+                let stop = slice_ty.stop(self.db);
+                let step = slice_ty.step(self.db);


We seem to do this several times -- is it worth adding an as_tuple method to SliceLiteralType?

impl<'db> SliceLiteralType<'db> { fn as_tuple(&self, db: &dyn Db) -> (Option<i32>, Option<i32>, Option<i32>) { (self.start(db), self.stop(db), self.step(db)) } }

And then this could be just

Suggested change

let start = slice_ty.start(self.db);

let stop = slice_ty.stop(self.db);

let step = slice_ty.step(self.db);

let (start, stop, step) = slice_ty.as_tuple(self.db)

I like it, thanks — changed in #13982

sharkdp · 2024-10-29T19:33:48Z

It seems we just infer "foo["bar":"baz"] as being @Todo

Yes

this is because we infer "bar":"baz" (accurately) as being a builtins.slice instance, which means we fallback to generic "call str.__getitem__ with slice passed in" logic, and currently we get @Todo as the result whenever we try to call str.__getitem__ because it's overloaded and we don't understand overloads yet.

Exactly.

The problem here is that even when we do understand overloads we won't catch this error with the logic as we currently have it; we'll just start inferring str as the result.

👍 Added a TODO comment in #13982.

@AlexWaygood

## Summary Minor follow-up to #13917 — thanks @AlexWaygood for the post-merge review. - Add SliceLiteralType::as_tuple - Use .expect() instead of SAFETY comment - Match on ::try_from result - Add TODO comment regarding raising a diagnostic for `"foo"["bar":"baz"]`

sharkdp added the ty Multi-file analysis & type inference label Oct 24, 2024

sharkdp force-pushed the david/infer-slice-expressions branch from a13e32d to 0959c75 Compare October 27, 2024 22:12

Initial version of slice-subscript

4b85207

sharkdp force-pushed the david/infer-slice-expressions branch from 0959c75 to 4b85207 Compare October 27, 2024 22:14

sharkdp added 3 commits October 28, 2024 12:10

More mdtest cases

ab159ea

Fix display of slice types

9a39a09

Add tests for display functionality

119a8f8

sharkdp commented Oct 28, 2024

View reviewed changes

sharkdp added 3 commits October 28, 2024 13:53

Switch to i32-based indices

fabb4ce

Show step size error even for empty slices

f75b32d

Fix all clippy suggestions

3c01f62

sharkdp commented Oct 28, 2024

View reviewed changes

sharkdp added 3 commits October 28, 2024 14:24

Add slicing for bytes literals

f7b77bf

Minor updates

0f13d90

Minor improvements

c411735

sharkdp commented Oct 28, 2024

View reviewed changes

sharkdp added 3 commits October 28, 2024 15:14

More tiny improvements

2d82d3d

Add tests for large step sizes

417ce3c

Add more tests for strings

4b740e8

sharkdp commented Oct 28, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Oct 28, 2024

View reviewed changes

sharkdp added 3 commits October 28, 2024 17:08

Fix one more edge case

391f34b

Replace .unwrap() with .expect()

1b16897

Use nth_back()

57432f6

sharkdp added 2 commits October 28, 2024 19:14

Avoid allocations in Display impl

547a59f

Rename rule

caaa465

sharkdp commented Oct 28, 2024

View reviewed changes

sharkdp added 6 commits October 28, 2024 19:47

Implement PySlice on &[T]

5d21710

Use itertools::Either to avoid boxing the iterator

b99f388

Fix clippy suggestions

5853242

Use NonZeroI32 instead of NonZero<i32>

c167bae

Move Nth

0ee0f8a

New formulation using 'Position'; fix another edge case

fe59fb7

sharkdp marked this pull request as ready for review October 28, 2024 19:55

sharkdp requested review from AlexWaygood and carljm as code owners October 28, 2024 19:55

sharkdp added 2 commits October 28, 2024 21:09

Add TODOs in tests

06bf401

Fix MSRV build

56e8487

sharkdp commented Oct 28, 2024

View reviewed changes

carljm approved these changes Oct 29, 2024

View reviewed changes

Only raise StepSizeZero errors on literals/tuples

08b6c7b

sharkdp merged commit 56c796a into main Oct 29, 2024

sharkdp deleted the david/infer-slice-expressions branch October 29, 2024 09:17

sharkdp mentioned this pull request Oct 29, 2024

[red-knot] Infer subscript expression types #13689

Closed

AlexWaygood reviewed Oct 29, 2024

View reviewed changes

sharkdp mentioned this pull request Oct 29, 2024

[red-knot] Minor follow-up on slice expression inference #13982

Merged

		reveal_type(b[0:2]) # revealed: Literal[b"\x00a"]
		reveal_type(b[-3:]) # revealed: Literal[b"bc\xff"]

-            Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => {
-                SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm")))
-            }
+            Some(Type::IntLiteral(n)) => match i32::try_from(n) {
+                Ok(i) => SliceArg::Arg(Some(i)),
+                Err(_) => SliceArg::Unsupported
+            }

Conversation

sharkdp commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eye candy

Test Plan

Uh oh!

github-actions bot commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharkdp Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharkdp Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

sharkdp commented Oct 24, 2024 •

edited

Loading

github-actions bot commented Oct 24, 2024 •

edited

Loading

`ruff-ecosystem` results

sharkdp Oct 28, 2024 •

edited

Loading

MichaReiser Oct 28, 2024 •

edited

Loading

sharkdp Oct 28, 2024 •

edited

Loading