Skip to content

[red-knot] Slice expression types & subscript expressions with slices#13917

Merged
sharkdp merged 27 commits intomainfrom
david/infer-slice-expressions
Oct 29, 2024
Merged

[red-knot] Slice expression types & subscript expressions with slices#13917
sharkdp merged 27 commits intomainfrom
david/infer-slice-expressions

Conversation

@sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Oct 24, 2024

Summary

  • Add a new Type::SliceLiteral variant
  • Infer SliceLiteral types for slice expressions, such as <int-literal>:<int-literal>:<int-literal>.
  • Infer "sliced" literal types for subscript expressions using slices, such as <string-literal>[<slice-literal>].
  • Infer types for expressions involving slices of tuples: <tuple>[<slice-literal>].

closes #13853

Eye candy

t = (1, (), True, "a", None, b"b")

reveal_type(t[-2::-2])  # revealed: tuple[None, Literal[True], Literal[1]]

Test Plan

  • Unit tests for indexing/slicing utility functions
  • Markdown-based tests for
    • Subscript expressions tuple[slice]
    • Subscript expressions string_literal[slice]
    • Subscript expressions bytes_literal[slice]

@sharkdp sharkdp added the ty Multi-file analysis & type inference label Oct 24, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@sharkdp sharkdp force-pushed the david/infer-slice-expressions branch from a13e32d to 0959c75 Compare October 27, 2024 22:12
@sharkdp sharkdp force-pushed the david/infer-slice-expressions branch from 0959c75 to 4b85207 Compare October 27, 2024 22:14
Type::StringLiteral(str) => Truthiness::from(!str.value(db).is_empty()),
Type::LiteralString => Truthiness::Ambiguous,
Type::BytesLiteral(bytes) => Truthiness::from(!bytes.value(db).is_empty()),
Type::SliceLiteral(_) => Truthiness::AlwaysTrue,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice appears to have no custom __bool__ logic; even empty slices and things like slice(None, None) are True.

Comment on lines +1825 to +1830
#[salsa::interned]
pub struct SliceLiteralType<'db> {
start: Option<i64>,
stop: Option<i64>,
step: Option<i64>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I considered was to inline this within Type. This would increase the size of Type. Unless we think it's okay to restrict indices for literal slice types to i32. Which might be fine. It seems unlikely that someone has a >2 GiB literal string inside their Python source code and still cares about inferring a static type for something like str_literal[4_294_967_297:].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using an i32 is probably okay. Ruff/Red Knot also only supports files len(source) = u32::MAX

Copy link
Contributor Author

@sharkdp sharkdp Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to i32 now as this also solves some problems on 32 bit platforms (see other comment). It's still salsa::interned for now, as the current size (2 × Option<i32> + 1 × Option<NonZero<i32>> = 20 byte) is still larger than Type at the moment (16 byte). If we absolutely want to avoid having this interned, we could certainly also go lower with the index range or do some more questionable hacks like using i32::MIN as a sentinel value instead of using Option, …

pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) {
self.add_diagnostic(
node,
"slice-step-zero",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a policy for choosing these names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. But it's probably going to follow Ruff's naming schema (internal document)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried something else for now.

Comment on lines +3209 to +3215
(Type::Tuple(tuple_ty), Type::IntLiteral(int)) if i32::try_from(int).is_ok() => {
let elements = tuple_ty.elements(self.db);
elements
.iter()
.python_subscript(int)
.py_index(i32::try_from(int).unwrap())
.copied()
.unwrap_or_else(|| {
.unwrap_or_else(|_| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression in terms of (1) code quality and (2) functionality, as we previously supported indices up to i64 size. However, this had some drawbacks on 32 bit platforms. If the (absolute value of the) i64 index did not fit into usize, we would show an "index out of bounds error", even though we hadn't even checked the size of the input. It's also highly unlikely that someone would construct a tuple/literal-string/literal-bytes object > 2GiB in size and still care about subscript type inference on that expression. Restricting indices to 32bit makes the error handling much easier, as the index->usize conversion can not fail.

if let Ok(new_elements) = elements.iter().py_slice(start, stop, step) {
let new_elements: Vec<_> = new_elements.copied().collect();
Type::Tuple(TupleType::new(self.db, new_elements.into_boxed_slice()))
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred to explicitly match on Err(StepSizeZero) here, but clippy forces me to use if let … else ….

Copy link
Member

@MichaReiser MichaReiser Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe... that's one of the pedantic lint rules that I'm very keen on disabling if I can find the necessary support.

let start = slice_ty.start(self.db);
let stop = slice_ty.stop(self.db);
let step = slice_ty.step(self.db);
let chars: Vec<_> = literal_value.chars().collect();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This additional allocation is unfortunate, but .chars() is not an ExactSizeIterator (Unicode…), and I currently don't see how we can implement the most general form of slicing (e.g. something like string_literal[3:-5]) on DoubleEndedIterator alone. I could avoid the allocation by computing the length upfront (O(n)), but I'm not sure if it's worth investing more time into this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.

Comment on lines +31 to +32
reveal_type(b[0:2]) # revealed: Literal[b"\x00a"]
reveal_type(b[-3:]) # revealed: Literal[b"bc\xff"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bother repeating all tests here, since that seems sufficiently covered by the tuple/string cases + the slicing unit tests.

reveal_type(b) # revealed: Unknown
```

## Slices
Copy link
Contributor Author

@sharkdp sharkdp Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are not exhaustive in terms of checking slicing functionality, as we do that in the unit tests already. Let me know if you think otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that seems fine!

pub(super) fn slice_step_size_zero_diagnostic(&mut self, node: AnyNodeRef) {
self.add_diagnostic(
node,
"slice-step-zero",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. But it's probably going to follow Ruff's naming schema (internal document)

let start = slice_ty.start(self.db);
let stop = slice_ty.stop(self.db);
let step = slice_ty.step(self.db);
let chars: Vec<_> = literal_value.chars().collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, considering that slices into strings should be rare. We can optimize if this shows up in profiles.

Self::Item: 'a;
}

impl<I, T> PySlice for T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the usages of the PySlice type but would it make sense to implement it for &[T] instead of implementing it on iterators? Or are there cases where we only have an iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. We only use it on slices if we are okay with collecting the chars in the LiteralString case.

Ordering::Equal | Ordering::Greater => (start, 0, step),
};

Ok(Box::new(self.skip(skip).take(take).step_by(step)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could use Itertools::iter::Either to return one or the other iterator type without allocating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool — I didn't know about itertools::Either! Needed to adapt the zero-case a bit to also fit into the Either::Left side, but that's okay.

Comment on lines +34 to +37
b[0:4:0] # error: [zero-stepsize-in-slice]
b[:4:0] # error: [zero-stepsize-in-slice]
b[0::0] # error: [zero-stepsize-in-slice]
b[::0] # error: [zero-stepsize-in-slice]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird edge case that pyright and mypy do not detect (mypy crashes). I'm not sure if it's worth having a separate diagnostic for, but it was easily to implement. We could also ignore it and simplify infer a non-literal type (slice) for something like ::0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.

@sharkdp sharkdp marked this pull request as ready for review October 28, 2024 19:55
Comment on lines +3429 to +3433
self.add_diagnostic(
AnyNodeRef::ExprSlice(slice), // TODO
"zero-stepsize-in-slice",
format_args!("Slice step size can not be zero"),
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous version of this PR, I deferred reporting this error to the usage site where we try to slice a string/bytes-literal or a tuple. Raising the diagnostic here is too early. The slice might be used on a type with a custom __getitem__ impl that makes use of 0 as a step size somehow.

I'll fix this.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, modulo the already-commented step-zero error change!

Comment on lines +34 to +37
b[0:4:0] # error: [zero-stepsize-in-slice]
b[:4:0] # error: [zero-stepsize-in-slice]
b[0::0] # error: [zero-stepsize-in-slice]
b[::0] # error: [zero-stepsize-in-slice]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful to detect it when we can, but I agree with your comment below that we should mirror the runtime in when its an error, and when it isn't (you can define a custom type that takes a zero-step slice and doesn't crash). Which I think means (considering "define a custom type" also includes subclasses of commonly sliceable builtin types) we would only actually issue an error when slicing a literal type. Which is the case here, so these assertions look good.

reveal_type(b) # revealed: Unknown
```

## Slices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that seems fine!

@sharkdp sharkdp merged commit 56c796a into main Oct 29, 2024
@sharkdp sharkdp deleted the david/infer-slice-expressions branch October 29, 2024 09:17
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is great! Some post-merge review nitpicks below, but nothing major.

One other more significant thing to note is that it looks like we don't catch this error currently, and ideally at some point we will:

>>> "foo"["bar":"baz"]
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    "foo"["bar":"baz"]
    ~~~~~^^^^^^^^^^^^^
TypeError: slice indices must be integers or None or have an __index__ method

It seems we just infer "foo["bar":"baz"] as being @Todo: this is because we infer "bar":"baz" (accurately) as being a builtins.slice instance, which means we fallback to generic "call str.__getitem__ with slice passed in" logic, and currently we get @Todo as the result whenever we try to call str.__getitem__ because it's overloaded and we don't understand overloads yet. The problem here is that even when we do understand overloads we won't catch this error with the logic as we currently have it; we'll just start inferring str as the result.

For comparison, mypy does catch this error, but it also incorrectly flags code like this, which should pass without error:

>>> class Spam:
...     def __getitem__(self, item: slice | int) -> int:
...         return 42
...         
>>> Spam()["foo":"bar"]
42

I think this problem will naturally solve itself once we:

  1. Add better annotations to str.__getitem__ in typeshed (it should probably accept slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None]now thatsliceis generic, rather than justslice`)
  2. Support generic types in red-knot.

But it might be worth adding a TODO comment somewhere mentioning that this is something we need to address

Comment on lines +3421 to +3423
Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => {
SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm")))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could avoid the .expect() call here by doing something like:

Suggested change
Some(Type::IntLiteral(n)) if i32::try_from(n).is_ok() => {
SliceArg::Arg(Some(i32::try_from(n).expect("checked in branch arm")))
}
Some(Type::IntLiteral(n)) => match i32::try_from(n) {
Ok(i) => SliceArg::Arg(Some(i)),
Err(_) => SliceArg::Unsupported
}

And probably similar elsewhere. But it's probably not very important; the way you've done it is obviously safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could avoid the .expect() call here by doing something like:

Yes. I added the other three uses of this before. And there I really need/want it as a pattern guard, because I want control flow to fall through if i32::try_from fails. Because I don't want to repeat the whole logic for the "else" clause.

But right here, your solution is definitely better. Changed in #13982

Comment on lines +18 to +21
debug_assert!(index >= 0);

// SAFETY: `index` is non-negative, and `usize` is at least 32 bits.
usize::try_from(index).unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I probably would have written this as

Suggested change
debug_assert!(index >= 0);
// SAFETY: `index` is non-negative, and `usize` is at least 32 bits.
usize::try_from(index).unwrap()
usize::try_from(index)
.expect("Should only ever pass a positive integer to `from_nonnegative_i32`")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in #13982

let stop = slice_ty.stop(self.db);
let step = slice_ty.step(self.db);

if let Ok(new_elements) = elements.as_ref().py_slice(start, stop, step) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could probably avoid the .as_ref() call here and elsewhere if we implemented PySlice for &Box<[T]> as well as &[T]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you have a nice solution for this, but I quickly tried and everything I got required changes to the trait structure or other larger-scale code changes. I left those two .as_ref()s for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13983 :-)

Comment on lines +3269 to +3271
let start = slice_ty.start(self.db);
let stop = slice_ty.stop(self.db);
let step = slice_ty.step(self.db);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to do this several times -- is it worth adding an as_tuple method to SliceLiteralType?

impl<'db> SliceLiteralType<'db> {
    fn as_tuple(&self, db: &dyn Db) -> (Option<i32>, Option<i32>, Option<i32>) {
        (self.start(db), self.stop(db), self.step(db))
    }
}

And then this could be just

Suggested change
let start = slice_ty.start(self.db);
let stop = slice_ty.stop(self.db);
let step = slice_ty.step(self.db);
let (start, stop, step) = slice_ty.as_tuple(self.db)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thanks — changed in #13982

@sharkdp
Copy link
Contributor Author

sharkdp commented Oct 29, 2024

It seems we just infer "foo["bar":"baz"] as being @Todo

Yes

this is because we infer "bar":"baz" (accurately) as being a builtins.slice instance, which means we fallback to generic "call str.__getitem__ with slice passed in" logic, and currently we get @Todo as the result whenever we try to call str.__getitem__ because it's overloaded and we don't understand overloads yet.

Exactly.

The problem here is that even when we do understand overloads we won't catch this error with the logic as we currently have it; we'll just start inferring str as the result.

👍 Added a TODO comment in #13982.

sharkdp added a commit that referenced this pull request Oct 29, 2024
## Summary

Minor follow-up to #13917 — thanks @AlexWaygood for the post-merge
review.

- Add
SliceLiteralType::as_tuple
- Use .expect() instead of SAFETY
comment
- Match on ::try_from
result
- Add TODO comment regarding raising a diagnostic for `"foo"["bar":"baz"]`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[red-knot] Infer slice expression types

4 participants