…ter (#23715)
I discovered this while looking into some unexpected ecosystem results
on #22578. It took a couple of
days of wrestling with pi to discover the issue and reduce it to the
following example:
```py
def f(l: list[tuple[Any | str, Any | str]]) -> None:
# was: dict[str | Any, str | Any]
# should be: dict[Any | str, Any | str]
reveal_type(dict(l))
```
We are trying to infer a specialization of `dict` when we pass in the
`list[tuple[...]]` type an argument. This matches against the
constructor overload that takes in an `Iterator[tuple[_KT, _VT]]`. All
good so far.
As part of doing that, we create the constraint set `(Any | str ≤ _KT) ∧
(Any | str ≤ _VT)`, which has the following structure:
```
<0> (str ≤ _VT@dict) 4/4
┡━₁ <1> (Any ≤ _VT@dict) 3/3
│ ┡━₁ <2> (str ≤ _KT@dict) 2/2
│ │ ┡━₁ <3> (Any ≤ _KT@dict) 1/1
│ │ │ ┡━₁ always
│ │ │ └─₀ never
│ │ └─₀ never
│ └─₀ never
└─₀ never
```
A few interesting things have happened:
- Both typevars correctly have lower bound (contravariant) constraints:
function parameters are contravariant; `Iterator` is covariant; contra ×
co = contra
- The constraint set lines up with "source order": the type explicitly
mentioned in the code is `Any | str`, and `_KT` appears before `_VT` in
the definition of `dict`.
- We break apart lower bound unions: e.g. `Any | str ≤ _KT` becomes
`(Any ≤ _KT) ∧ (str ≤ _KT)`
- That gives us four "atomic" constraints, which we correctly create in
source order: `Any ≤ _KT`, `str ≤ _KT`, `Any ≤ _VT`, `str ≤ _VT`. (The
`1/1` etc in the graph rendering is the `source_order` that we have
assigned to each constraint.)
- Our BDDs use the _reverse_ of the constraint ID ordering, so the
constraint with the smallest ID (`Any ≤ _KT`) appears closest to the
leaf nodes. (This is on purpose for efficiency reasons)
Everything described so far is correct and expected behavior.
Next we have to find a solution for this constraint set. To do that, we
find every path from the root to the `always` terminal, remembering all
of the constraints that we encounter along that path. In this case,
there is only one path:
```
str ≤ _VT (4)
Any ≤ _VT (3)
str ≤ _KT (2)
Any ≤ _KT (1)
```
Those constraints are still in reverse order! But we're tracking a
`source_order` for each constraint, and we [sort by
`source_order`](https://github.com/astral-sh/ruff/blob/149c5786ca6ae135431b86d1cd8ab74763b9a95a/crates/ty_python_semantic/src/types/constraints.rs#L3042-L3051)
before turning this list of constraints into a solution. That _should_
mean that we build up the unions as `Any | str`.
So why were we seeing `str | Any` instead? If you print out the
`source_order`s that we actually see when we get to the sort, we have:
```
str ≤ _VT (4)
Any ≤ _VT (4)
str ≤ _KT (2)
Any ≤ _KT (2)
```
The `Any` constraints have copied their `source_order`s from the
corresponding `str` constraints!
What has happened is that some purposeful behavior has engaged too
aggressively. When we look at a path that represents a solution, we want
to know _all_ of the constraints that are true for that path — not just
the ones that are explicitly mentioned in the BDD. We have a
`SequentMap` type that records the relationships between constraints.
Among other things, it records _implications_: "when X is true, Y is
true as well". As we build up potential paths, each time we add one of
the constraints from the BDD, we immediately check the sequent map to
see if we now know any other _derived_ facts to be true. If so, we add
them to the path. And importantly, we want those derived facts to appear
"near" their "origin" constraint — so we give the derived constraint the
same `source_order` as the BDD constraint we just added.
In this case, the first constraint we encounter in the BDD is `str ≤ _VT
(4)`. The sequent map tells us that `str ≤ _VT` implies `Any ≤ _VT`, so
we add `Any ≤ _VT` as a _derived_ constraint, which inherits
`source_order=4` from its origin constraint. Next we encounter `Any ≤
_VT` as an _origin_ constraint (since it appears in the BDD). But since
it's already in the path, we don't add it as a duplicate or update its
`source_order`.
The fix is straightforward: origin `source_order`s should take
precedence over inherited derived `source_order`s.
AS of #23538, we no longer use salsa to intern the nodes of a constraint set BDD. They are still interned, but manually within a specific
ConstraintSetBuilder. Those builders are constructed per type comparison. That means that we no longer need to store a separatesource_orderin our BDD nodes. That was used in an attempt to have something more robust than salsa IDs, which would order constraints based on their relative ordering in the underlying Python source. That ordering is exactly whatConstraintSetBuildernow gives us.This also lets us revert our short-lived experiment with quasi-reduced BDDs. We introduced quasi-reduction #21744 in an attempt to remove some non-determinism in our tests. This non-determinism came about because a constraint set BDD might contain different sets of individual constraints depending on the (arbitrary) BDD variable ordering that is in force for a particular run of
ty. Quasi-reduction ensured that all of the constraints explicitly mentioned when constructing a constraint set actually appeared in the BDD structure, giving us a deterministic set of constraints to iterate over.