[ty] Handle type narrowing of discriminated unions#23116
Conversation
|
relevant ticket: astral-sh/ty#1479 |
Typing conformance resultsNo changes detected ✅ |
|
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
unsupported-operator |
84 | 1 | 50 |
not-subscriptable |
96 | 1 | 0 |
possibly-missing-attribute |
26 | 16 | 35 |
invalid-argument-type |
8 | 5 | 10 |
no-matching-overload |
0 | 20 | 0 |
invalid-return-type |
0 | 1 | 2 |
not-iterable |
0 | 0 | 2 |
invalid-assignment |
0 | 0 | 1 |
| Total | 214 | 44 | 100 |
This commit adds type narrowing to the discriminated unions in the `narrow` crate, so that the type of the discriminated union is narrowed based on the value of the discriminator field.
cbeb545 to
45af5dd
Compare
|
My first thought is that this kind of narrowing isn't entirely sound. For example, what do we think about this program, which gives no type errors on this branch but crashes with from typing import Literal
class A:
kind: Literal["a"]
a_attr: int
class B:
kind: Literal["b"]
b_attr: str
class C(B):
# This is a "Liskov violation".
kind: Literal["oops"]
def __init__(self):
self.kind = "oops"
self.b_attr = "42"
def f(x: A | B):
if x.kind == "b":
x.b_attr
else:
x.a_attr
f(C()) |
I think the answer there is that this is why we need to improve our Liskov checks so that they flag incompatible attribute overrides as well as incompatible method overrides :-) I don't think that's something that this PR needs to concern itself with. In general, there are many places where ty's behaviour (and the behaviour of other type checkers!) is only sound if all classes in Python obey the Liskov Substitution Principle at all times. They don't, of course, but it's impossible to usefully reason about Python code unless you make that assumption, and the soundness hole is plugged by having Liskov checks enforced elsewhere. |
| def _(x: WithNone | WithInt): | ||
| if x.value is None: | ||
| reveal_type(x) # revealed: WithNone | ||
| else: | ||
| reveal_type(x) # revealed: WithInt |
There was a problem hiding this comment.
Let's add a test for is not also.
| // For equality constraints, all matching attributes must have literal types to safely | ||
| // narrow. Non-literal types (like `str` or `int`) could have subclasses that override | ||
| // `__eq__` in unexpected ways. For inequality constraints, we can narrow even with | ||
| // non-literal attribute types. | ||
| if constrain_with_equality | ||
| && !all_matching_attribute_types_are_literals(self.db, union, attr_name) | ||
| { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
Let's add a test case that fails if this check is deleted. (Though see also my comment below about whether we might want to instead do this member-by-member in the filter loop.)
| let place = self.expect_place(&value_place_expr); | ||
| Some(( | ||
| place, | ||
| NarrowingConstraint::replacement(UnionType::from_elements(self.db, filtered)), |
There was a problem hiding this comment.
In TypedDict narrowing, we only create one synthetic TypedDict type and intersect with (the negated version of) it to narrow large unions. That was necessary to avoid big hits to sympy in our CodSpeed benchmarks. But here, we might be taking a union of e.g. 100 elements and returning a narrowed union of e.g. 99 elements. Is that going to have O(n^2) runtime in these big cases? I don't see any complaints from CodSpeed, but I don't know whether there are any large unions of tagged classes in the repos we test?
Also, as long as we're doing it this way, I'm not sure we need the all_matching_attribute_types_are_literals check above, which short-circuits the entire narrowing process if any element of the union has a non-literal tag attribute. That was necessary in the TypedDict approach, because the negated type could eliminate too much. (It's only correct to the extent that disjointness implies !=, which isn't necessarily the case when you could have perverse subclasses of int/str/etc). But here, since we're going through the union member-by-member and specifying exactly which ones to keep, we could check each for the literal-ness of its tag at the same time we're checking is_disjoint_from? I acknowledge that this is a pretty contrived corner case (is anyone really going to make one kind attribute Literal[0] and another one str?), but concretely, this comment above is arguably false: "For equality constraints, all matching attributes must have literal types to safely narrow." (Only true of the TypedDict approach, not this approach.) Given how much logic is getting copied around as we expand this file, I want to be careful about letting incorrect assumptions sneak in. And this whole distinction might be worth a comment of its own.
| // class B: | ||
| // kind: Literal["b"] | ||
| // def _(x: A | B): | ||
| // if x.kind is "a": |
There was a problem hiding this comment.
"a" is not a singleton. (I'm guessing this is a copy-paste-o, but it's interesting to try to dodge CPython's string interning and actually demonstrate this. One way to do it is "b".replace("b", "a").)
There was a problem hiding this comment.
It's true that "a" is not a singleton, but I'm not sure that's relevant here? It's definitely an antipattern to use is or is not with a string literal, because -- as you say -- CPython's string interning is an implementation detail that doesn't always hold true, and where the details might change between Python versions or if you use a different implementation of Python altogether. None of that means, however, that it's unsafe for us to apply type narrowing after an if a.b is "a" condition. If the expression a.b is "a" resolves to True at runtime, a.b must indeed be an object with type "exactly str" that is equal to "a" -- which means that it is safe for us to narrow to Literal["a"] in that branch. (The reverse is not true: because of the fact you raise, that "a" is not a singleton, we cannot narrow the type to ~Literal["a"] in an else branch after the if a.b is "a" condition).
|
Closing this in favor of #24916 |
Summary
This commit adds type narrowing to the discriminated unions in the
narrowcrate, so that the type of the discriminated union is narrowed based on the value of the discriminator field. The motivating use case was type checking for the pydantic-ai library, which heavily depends on these unions, for example in the form:Test Plan
Created a new test.