[red-knot] Slice expression types & subscript expressions with slices#13917
[red-knot] Slice expression types & subscript expressions with slices#13917
Conversation
|
a13e32d to
0959c75
Compare
0959c75 to
4b85207
Compare
| Type::StringLiteral(str) => Truthiness::from(!str.value(db).is_empty()), | ||
| Type::LiteralString => Truthiness::Ambiguous, | ||
| Type::BytesLiteral(bytes) => Truthiness::from(!bytes.value(db).is_empty()), | ||
| Type::SliceLiteral(_) => Truthiness::AlwaysTrue, |
There was a problem hiding this comment.
slice appears to have no custom __bool__ logic; even empty slices and things like slice(None, None) are True.
| #[salsa::interned] | ||
| pub struct SliceLiteralType<'db> { | ||
| start: Option<i64>, | ||
| stop: Option<i64>, | ||
| step: Option<i64>, | ||
| } |
There was a problem hiding this comment.
Another option I considered was to inline this within Type. This would increase the size of Type. Unless we think it's okay to restrict indices for literal slice types to i32. Which might be fine. It seems unlikely that someone has a >2 GiB literal string inside their Python source code and still cares about inferring a static type for something like str_literal[4_294_967_297:].
There was a problem hiding this comment.
I think using an i32 is probably okay. Ruff/Red Knot also only supports files len(source) = u32::MAX
There was a problem hiding this comment.
Switched to i32 now as this also solves some problems on 32 bit platforms (see other comment). It's still salsa::interned for now, as the current size (2 × Option<i32> + 1 × Option<NonZero<i32>> = 20 byte) is still larger than Type at the moment (16 byte). If we absolutely want to avoid having this interned, we could certainly also go lower with the index range or do some more questionable hacks like using i32::MIN as a sentinel value instead of using Option, …
| pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) { | ||
| self.add_diagnostic( | ||
| node, | ||
| "slice-step-zero", |
There was a problem hiding this comment.
Do we have a policy for choosing these names?
There was a problem hiding this comment.
Not yet. But it's probably going to follow Ruff's naming schema (internal document)
There was a problem hiding this comment.
Ok, I tried something else for now.
| (Type::Tuple(tuple_ty), Type::IntLiteral(int)) if i32::try_from(int).is_ok() => { | ||
| let elements = tuple_ty.elements(self.db); | ||
| elements | ||
| .iter() | ||
| .python_subscript(int) | ||
| .py_index(i32::try_from(int).unwrap()) | ||
| .copied() | ||
| .unwrap_or_else(|| { | ||
| .unwrap_or_else(|_| { |
There was a problem hiding this comment.
This looks like a regression in terms of (1) code quality and (2) functionality, as we previously supported indices up to i64 size. However, this had some drawbacks on 32 bit platforms. If the (absolute value of the) i64 index did not fit into usize, we would show an "index out of bounds error", even though we hadn't even checked the size of the input. It's also highly unlikely that someone would construct a tuple/literal-string/literal-bytes object > 2GiB in size and still care about subscript type inference on that expression. Restricting indices to 32bit makes the error handling much easier, as the index->usize conversion can not fail.
| if let Ok(new_elements) = elements.iter().py_slice(start, stop, step) { | ||
| let new_elements: Vec<_> = new_elements.copied().collect(); | ||
| Type::Tuple(TupleType::new(self.db, new_elements.into_boxed_slice())) | ||
| } else { |
There was a problem hiding this comment.
I would have preferred to explicitly match on Err(StepSizeZero) here, but clippy forces me to use if let … else ….
There was a problem hiding this comment.
hehe... that's one of the pedantic lint rules that I'm very keen on disabling if I can find the necessary support.
| let start = slice_ty.start(self.db); | ||
| let stop = slice_ty.stop(self.db); | ||
| let step = slice_ty.step(self.db); | ||
| let chars: Vec<_> = literal_value.chars().collect(); |
There was a problem hiding this comment.
This additional allocation is unfortunate, but .chars() is not an ExactSizeIterator (Unicode…), and I currently don't see how we can implement the most general form of slicing (e.g. something like string_literal[3:-5]) on DoubleEndedIterator alone. I could avoid the allocation by computing the length upfront (O(n)), but I'm not sure if it's worth investing more time into this.
There was a problem hiding this comment.
I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.
| reveal_type(b[0:2]) # revealed: Literal[b"\x00a"] | ||
| reveal_type(b[-3:]) # revealed: Literal[b"bc\xff"] |
There was a problem hiding this comment.
I didn't bother repeating all tests here, since that seems sufficiently covered by the tuple/string cases + the slicing unit tests.
| reveal_type(b) # revealed: Unknown | ||
| ``` | ||
|
|
||
| ## Slices |
There was a problem hiding this comment.
These tests are not exhaustive in terms of checking slicing functionality, as we do that in the unit tests already. Let me know if you think otherwise.
| pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) { | ||
| self.add_diagnostic( | ||
| node, | ||
| "slice-step-zero", |
There was a problem hiding this comment.
Not yet. But it's probably going to follow Ruff's naming schema (internal document)
| let start = slice_ty.start(self.db); | ||
| let stop = slice_ty.stop(self.db); | ||
| let step = slice_ty.step(self.db); | ||
| let chars: Vec<_> = literal_value.chars().collect(); |
There was a problem hiding this comment.
I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.
| Self::Item: 'a; | ||
| } | ||
|
|
||
| impl<I, T> PySlice for T |
There was a problem hiding this comment.
I haven't looked into the usages of the PySlice type but would it make sense to implement it for &[T] instead of implementing it on iterators? Or are there cases where we only have an iterator?
There was a problem hiding this comment.
Yes, thanks. We only use it on slices if we are okay with collecting the chars in the LiteralString case.
| Ordering::Equal | Ordering::Greater => (start, 0, step), | ||
| }; | ||
|
|
||
| Ok(Box::new(self.skip(skip).take(take).step_by(step))) |
There was a problem hiding this comment.
I think you could use Itertools::iter::Either to return one or the other iterator type without allocating
There was a problem hiding this comment.
Oh, cool — I didn't know about itertools::Either! Needed to adapt the zero-case a bit to also fit into the Either::Left side, but that's okay.
| b[0:4:0] # error: [zero-stepsize-in-slice] | ||
| b[:4:0] # error: [zero-stepsize-in-slice] | ||
| b[0::0] # error: [zero-stepsize-in-slice] | ||
| b[::0] # error: [zero-stepsize-in-slice] |
There was a problem hiding this comment.
This is a weird edge case that pyright and mypy do not detect (mypy crashes). I'm not sure if it's worth having a separate diagnostic for, but it was easily to implement. We could also ignore it and simplify infer a non-literal type (slice) for something like ::0.
There was a problem hiding this comment.
Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.
| self.add_diagnostic( | ||
| AnyNodeRef::ExprSlice(slice), // TODO | ||
| "zero-stepsize-in-slice", | ||
| format_args!("Slice step size can not be zero"), | ||
| ); |
There was a problem hiding this comment.
In a previous version of this PR, I deferred reporting this error to the usage site where we try to slice a string/bytes-literal or a tuple. Raising the diagnostic here is too early. The slice might be used on a type with a custom __getitem__ impl that makes use of 0 as a step size somehow.
I'll fix this.
carljm
left a comment
There was a problem hiding this comment.
Looks good to me, modulo the already-commented step-zero error change!
| b[0:4:0] # error: [zero-stepsize-in-slice] | ||
| b[:4:0] # error: [zero-stepsize-in-slice] | ||
| b[0::0] # error: [zero-stepsize-in-slice] | ||
| b[::0] # error: [zero-stepsize-in-slice] |
There was a problem hiding this comment.
Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.
| reveal_type(b) # revealed: Unknown | ||
| ``` | ||
|
|
||
| ## Slices |
AlexWaygood
left a comment
There was a problem hiding this comment.
Nice, this is great! Some post-merge review nitpicks below, but nothing major.
One other more significant thing to note is that it looks like we don't catch this error currently, and ideally at some point we will:
>>> "foo"["bar":"baz"]
Traceback (most recent call last):
File "<python-input-0>", line 1, in <module>
"foo"["bar":"baz"]
~~~~~^^^^^^^^^^^^^
TypeError: slice indices must be integers or None or have an __index__ methodIt seems we just infer "foo["bar":"baz"] as being @Todo: this is because we infer "bar":"baz" (accurately) as being a builtins.slice instance, which means we fallback to generic "call str.__getitem__ with slice passed in" logic, and currently we get @Todo as the result whenever we try to call str.__getitem__ because it's overloaded and we don't understand overloads yet. The problem here is that even when we do understand overloads we won't catch this error with the logic as we currently have it; we'll just start inferring str as the result.
For comparison, mypy does catch this error, but it also incorrectly flags code like this, which should pass without error:
>>> class Spam:
... def __getitem__(self, item: slice | int) -> int:
... return 42
...
>>> Spam()["foo":"bar"]
42I think this problem will naturally solve itself once we:
- Add better annotations to
str.__getitem__in typeshed (it should probably accept slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None]now thatsliceis generic, rather than justslice`) - Support generic types in red-knot.
But it might be worth adding a TODO comment somewhere mentioning that this is something we need to address
| Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => { | ||
| SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm"))) | ||
| } |
There was a problem hiding this comment.
I guess you could avoid the .expect() call here by doing something like:
| Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => { | |
| SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm"))) | |
| } | |
| Some(Type::IntLiteral(n)) => match i32::try_from(n) { | |
| Ok(i) => SliceArg::Arg(Some(i)), | |
| Err(_) => SliceArg::Unsupported | |
| } |
And probably similar elsewhere. But it's probably not very important; the way you've done it is obviously safe
There was a problem hiding this comment.
I guess you could avoid the
.expect()call here by doing something like:
Yes. I added the other three uses of this before. And there I really need/want it as a pattern guard, because I want control flow to fall through if i32::try_from fails. Because I don't want to repeat the whole logic for the "else" clause.
But right here, your solution is definitely better. Changed in #13982
| debug_assert!(index >= 0); | ||
|
|
||
| // SAFETY: `index` is non-negative, and `usize` is at least 32 bits. | ||
| usize::try_from(index).unwrap() |
There was a problem hiding this comment.
nit: I probably would have written this as
| debug_assert!(index >= 0); | |
| // SAFETY: `index` is non-negative, and `usize` is at least 32 bits. | |
| usize::try_from(index).unwrap() | |
| usize::try_from(index) | |
| .expect("Should only ever pass a positive integer to `from_nonnegative_i32`") |
| let stop = slice_ty.stop(self.db); | ||
| let step = slice_ty.step(self.db); | ||
|
|
||
| if let Ok(new_elements) = elements.as_ref().py_slice(start, stop, step) { |
There was a problem hiding this comment.
nit: we could probably avoid the .as_ref() call here and elsewhere if we implemented PySlice for &Box<[T]> as well as &[T]
There was a problem hiding this comment.
Maybe you have a nice solution for this, but I quickly tried and everything I got required changes to the trait structure or other larger-scale code changes. I left those two .as_ref()s for now.
| let start = slice_ty.start(self.db); | ||
| let stop = slice_ty.stop(self.db); | ||
| let step = slice_ty.step(self.db); |
There was a problem hiding this comment.
We seem to do this several times -- is it worth adding an as_tuple method to SliceLiteralType?
impl<'db> SliceLiteralType<'db> {
fn as_tuple(&self, db: &dyn Db) -> (Option<i32>, Option<i32>, Option<i32>) {
(self.start(db), self.stop(db), self.step(db))
}
}And then this could be just
| let start = slice_ty.start(self.db); | |
| let stop = slice_ty.stop(self.db); | |
| let step = slice_ty.step(self.db); | |
| let (start, stop, step) = slice_ty.as_tuple(self.db) |
Yes
Exactly.
👍 Added a TODO comment in #13982. |
## Summary Minor follow-up to #13917 — thanks @AlexWaygood for the post-merge review. - Add SliceLiteralType::as_tuple - Use .expect() instead of SAFETY comment - Match on ::try_from result - Add TODO comment regarding raising a diagnostic for `"foo"["bar":"baz"]`
Summary
Type::SliceLiteralvariantSliceLiteraltypes for slice expressions, such as<int-literal>:<int-literal>:<int-literal>.<string-literal>[<slice-literal>].<tuple>[<slice-literal>].closes #13853
Eye candy
Test Plan
tuple[slice]string_literal[slice]bytes_literal[slice]