Expand aliases if necessary during immediacy computation#10017
Expand aliases if necessary during immediacy computation#10017stedolan wants to merge 1 commit intoocaml:trunkfrom
Conversation
|
Hm. In general I'm not too excited about the idea that we don't compute correct signatures for our types and modules, and that we will actually compute their static properties at the use-site rather than fix the problem. In theory I would prefer fixing the problem (and have a correct eager computation). In practice I understand that it may be difficult to do so (our recent stint with Now, let's consider how such an on-demand check would be best implemented. I must say that I am not a-priori convinced by your approach. What your PR do is to first perform the current "eager" computation (using modularly-computed signatures), admitting that it is incomplete, and then at the use-site perform a "lazy" computation if the eager result is not good enough for the use-site needs. I must say that I find this a bit complex/redundant. Here is an alternative proposal:
I believe that the end-result would be the same as with your system. More precisely:
|
|
Regarding eager vs. lazy: I think static properties are often computed lazily in the typechecker, and this mechanism of expanding abbreviations to avert an error is fairly widespread. For instance, if checking well-formedness of the type In fact, I suspect that in the cases where we do precompute static properties, there are likely to be bugs such as this one lurking. For instance, I just noticed this issue with variance computation (I'm not sure whether this is a known issue): Here, I suspect that correctly computing static properties eagerly would require some sort of cache invalidation mechanism for use during substitutions and the like, which sounds complicated. Regarding the alternative proposal: I find this one more complicated, not less! In my version, type constructor immediacies are sound overapproximations of the actual immediacy, used to avoid extra expansions. In yours, you're additionally caching negative information ( Also, I don't think this proposal works well with more than two possibilities. There are currently three, so this case goes wrong: Here, your algorithm says that nothing is In my algorithm, the immediacy is only ever an over-approximation, and substitutions / refinements / etc. can only ever lower immediacies, in the order: So, my algorithm checks |
|
(Thinking out loud.) Well obviously |
|
Right, that would definitely work. We could annotate types with an interval When we need to check an immediacy constraint
My proposal here can be viewed as a special case of this algorithm, where the lower bound The lack of case (2) means that if However, this extra expansion can only occur right when an immediacy error is about to be produced, so I'm not sure what we win by avoiding it. Another special case of the general algorithm (which I think is closer to what you have in mind) is that we could restrict intervals to always be of the form This would be less complicated than the full interval approach, but would still avoid some expansions-before-errors. It's still more complicated than the upper-bounds approach, though. |
|
The algorithm I had in mind has no clever "check" mode (checking is just inferring the immediacy and verifying that it is a subset of the expected immediacy), and "inference" works like this:
|
|
@gasche I find your suggested approach much harder to think about than Stephen's, and it seems to result in doing more expansions too. Could you expand on why you think it is better? |
|
There are two things that I don't like with this approach:
My proposal comes from studying Stephen's approach to understand when the current scenario falls short. It is not perfect (it could be made more precise by using different, yet more precise forms of immediacy), but it avoids the criticism above and I think it would work as well as Stephen's approach in practice. How it avoids the criticisms above:
Note:
I think that
Currently the two proposals are incomparable: having I think that those expansions have very little cost in practice; if there was a consensus on the idea that the approach is simpler, we could accept the extra expansions as a cost of simplicity. We could also consider adding a checking mode to my proposal, to cut early in exactly this scenario. This would be mostly equivalent to Stephen's implementation then, with the difference of my criticism (2): we have a clear specification of when we should expand, and the checking mode can sometimes optimize by skipping expansions if the result is precise enough (instead of having the expand-or-not decision be specified by failures, and only part of the checking specification). Finally, there could be other points in the design space to consider, for example: if we consider that the immediacy information of type constructors is fundamentally incorrect, maybe we should get rid of it? (Only keep it on abstract types that have no definition.) Then we would be back with a single check, no approximation (so my criticism (1) is solved), and we would expand all the time (so (2) is solved). We would have to discuss useless recomputations and possibly consider caching within type expressions (possibly only the Tconstr ones), but maybe those wouldn't be issues in practice and/or the solutions would lead to simple systems as well. |
I don't think this is an accurate comparison at all. There is only a single way to compute immediacy, but you can stop computation -- and crucially expansion -- early when you have enough information to make the decision you are trying to make. As far as I can tell, your proposal is just "Always assume that the check you are performing requires the most precise possible information". This means that you never take into account why you are checking immediacy: which seems to be the property you are aiming for. I can't see why you would want it as a property though, and it obviously requires doing more computation/expansion. |
|
The PR documents the following new interface: val approx_immediacy : Env.t -> type_expr -> Type_immediacy.t
(* Compute an over-approximation of the immediacy of a type *)
val check_immediacy : Env.t -> as_:Type_immediacy.t -> type_expr ->
(unit, Type_immediacy.Violation.t) result
(* Verify that a type has a specified immediacy *)I ask: what is the immediacy of a type? How does it relate to the proposed implementation? What I like about my solution is that the code answers this question clearly, whereas the current proposal (currently) does not, as far as I can tell. (The best definition of immediacy I can give from this API is to consider calls to In order to have a simple answer, I propose (a better description of immediacies, including a |
|
Thanks, that makes things clearer. Judging from that API, I think that the current PR isn't quite doing what I would expect (or if it is then the API should change). I would expect to have an The point where I disagree with you is on the removal of I also much prefer the view of ranges with an upper and lower bound, rather than thinking in terms of sets. In particular, I think that view will be more natural when this same mechanism is used for layouts in unboxed types. |
I'm not opposed to having a clever
This all depends on how we represent the static information on concrete types. If it is given as a partially-ordered set, then the natural notion of its abstractions are convex subsets (if |
|
@gasche I think perhaps I have not gotten across what's going on in this algorithm. In particular, this paragraph of yours describing a possible design:
reads to me like a reasonable description of what's already in this PR, so something's definitely been misunderstood. I'll have a go at explaining it properly. The problemTypes can be classified by "immediacy", which is an element of the following poset: These correspond to annotations There is a monotonicity / subtyping property here: a type of immediacy i is also a type of immediacy j, whenever i <= j. The problem that the compiler must solve is to verify immediacy annotations on types. There is currently an eager / inference-based algorithm, which verifies annotations by (a) computing the least immediacy for each type constructor and (b) comparing this with an annotation, if present. This algorithm is incorrect: the immediacy of a type alias can be refined by substitutions and/or may depend on type parameters (for instance, an abstract un-annotated type with Note that substitutions can only move a type downward in the immediacy poset: an abstract Checking-based algorithmWhat I'm proposing here is to switch from an inference-based to a checking-based algorithm. The fundamental operation is to check that a type has a given immediacy, rather than to infer the immediacy of a type from scratch. The simple version of this algorithm verifies an immediacy constraint on a type by first expanding the type fully (that is, until the head of the type is not an alias), and then checking the immediacy against the expanded type (or against its annotation, if abstract). An optimisation is possible here, and is implemented in the present PR. Sometimes, a type alias will always expand to an immediate type, regardless of parameters or substitutions. In such cases, we could speed up checking by avoiding some expansions. So, this patch associates with each type constructor an upper bound of its possible immediacies, allowing the check to be short-circuited sometimes. There is no backtracking here, just a short-circuit optimisation for some easy cases not requiring expansion. The optimisation is the same one as is currently present in unification (and elsewhere): the semantics of unification are as though type aliases are always expanded, but in some easy and common cases (e.g. unifying a type alias with no parameters with itself) it is possible to determine the result of unification without needing to expand. (I think there is some confusion here because I introduced the checking-based algorithm and the optimisation simultaneously, perhaps making it seem like the algorithm depended on both checking and inference, rather than checking + an optimisation to make checking faster in common cases. If it helps, I can prepare a version of this patch without the optimisation, which will only have Ctype.check_immediacy) Corrected inference-based algorithmThe algorithm that you're proposing instead is to stick with inference, but fix the inference based algorithm by tracking in its result whether or not the result is "stable" (that is, whether it can be refined by future environment substitutions). If it is not stable, then future inference computations will expand aliases rather than trusting the possibly-stale precomputed result. I don't like this approach because I think it is both more complicated and slower than the above. The complexity is because it requires keeping track of some notion of immediacy which is more detailed than what users can write in source code, involving a stability flag, intervals or convex subsets. The slowness is because of unnecessary expansion. For instance, consider an environment which contains a chain of type aliases ending in an abstract type: type a
type b = a
type c = b
...This signature contains no immediacy annotations, but IIUC your proposal will spend O(n^2) time computing immediacies here, because each alias has |
Well if you insist on keeping exactly the current language of immediacies (I think it was a mistake in the original implement to not include
Yes indeed, but (1) is it a problem in practice? and (2) similar examples exist with your approach as well, they only look even more theoretical (take this signature, add the equality I think that we may now have spent enough time discussion the philosophy of immediacy together. It's a lot of fun and I shall be happy to do it again (variance, separability, representational kinds or what have you), but in this instance we may be running out of productive comments that I can make to you, the Person Doing the Actual Work. I gave a more concise description of my API-level criticism in this comment, it might suggest improvements to your proposed API. Whether it does or not, if you prefer to keep the current approach I think that the productive next step for me would be to review the code. |
Right. Checking immediacy is worst-case O(n), so without fancy caching I imagine that checking n immediacy annotations is worst-case O(n^2). The thing I am (mildly!) concerned about is spending O(n^2) time computing immediacy in a program with 0 immediacy annotations.
Maybe hold off on detailed review for a bit? I've been thinking about an approach that doesn't approximate or cache anything, and I'll try and have a PR ready next week in that style. I think that the contentious part here has centered on the I proposed dealing with this by only ever treating the I now think that the better approach is not to have a |
|
See #10041 |
|
This is subsumed by #11841, closing. |
The following uses of
[@@immediate]currently fail:In each of these three cases, the type in question is in fact equal to
int, so I think this error is wrong.The issue is the same as in #10005: sometimes, we need to expand aliases to see that a type is immediate.
The fix here is to switch immediacy checking to the algorithm proposed in the unboxed types prototype (see RFC #10). The
Ctype.immediacyfunction is split in two:Ctype.approx_immediacycomputes a sound over-approximation of the immediacy of a type. (This is only an approximation because it's possible that environment substitutions or type parameters or expansions may cause a type's immediacy to be refined.)Ctype.check_immediacyverifies that a type has a given immediacy.Ctype.approx_immediacyis exactly the same as the currentCtype.immediacy.Ctype.check_immediacydiffers in theTconstrcase - if a path'stype_immediatefield doesn't match the given immediacy, then rather than triggering an error immediately the type is expanded.In other words, for type aliases the
type_immediatefield becomes merely an upper bound on the immediacy, which is consulted first during immediacy checking as an optimisation. If this optimised check fails, then the full check is done by expanding the alias.Note that this does not increase the amount of expansion done in programs that currently work: the extra expansion occurs only in cases where the current behaviour would fail.
TODO: there's a case I haven't figured out yet about
Typedecl_unboxed.Only_on_64_bits