New implementation of Env.make_copy_of_types#8774
Conversation
|
Could we discuss the secret goal a bit more? It's not completely clear to me why typeful ppxes (ppt ?) requires to get rid of AST traversals. Is the idea that the extension points in question would be translated into a "temporary" parsetree that should type-check in a reasonable way (this is what eg. ReLit does), but you don't want those to be traversed by extra computations with various expectations? (I'm surprised by the removal of |
I had to look at that myself in the past, and since it isn't exactly obvious, nor is it documented, let me try to give a rough explanation of what is happening. In the body of clauses which match on a GADT, we might introduce ambivalence, but we want to restrict the scope of such ambivalent types to the branch introducing the equation. type (_,_) eq = Refl : ('a,'a) eq
let cast : type a b. (a, b) eq -> a -> b =
fun eq x ->
match eq with Refl -> xWhen we arrive in the body of the branch, we know that However, and this is actually an implementation detail, what the typechecker actually knows in the body of the branch is that The copy of the type of This is not needed in principal mode, because in that mode we're much more careful about the amount of sharing that happens between types, which things are known principally and which aren't, etc. As for the actual implementation, I'll review it once there is a high-level agreement (with @gasche, and possibly others) about the acceptance of the change. |
|
The secret goal could be to allow hooking a custom
Ast_iterator is more powerful. |
|
I suggest to keep the discussion focused on this PR, which, I think, has merits independently of the "secret goal". |
|
I'm a bit uncomfortable with the general idea of removing all traversals on the parsetree. If I understand the idea correctly, it is: the parsetree cannot be trusted because it may contain arbitrary nodes that do not respect a traditional parsetree structure (such as: extension nodes, but you are leaving the door open for repurposing other constructs), and the only thing we can validly do on the parsetree is to type-check it and use the result. (Then we could in fact erase the types again and get the "real parsetree" of the program (or at least: the real untyped AST), and it would be valid to manipulate that, but this is not really the plan.) I find it very demanding of the compiler implementors (and other parsetree consumers) to basically say: this entire intermediate representation is now forbidden, you cannot make any assumptions on it before you do X. That's a very global constraint and it feels limiting (and in contradiction with many things people may want to do, for example ppx authors). Would it be possible to consider alternative designs that do not make such invasive changes to the assumptions we are allowed to make in the compiler implementation? For example, could there exists a design with "not-so-wrong AST proposals" that ensure that the parsetree obtained before type-checking is not quite right, but also robust enough for most operations to work correctly (for example: listing free variables)? (Again, this is what Relit does, if I remember correctly.) One option may be to look at the set of operations we perform on parsetree today, and ask for AST approximations that support at least those. This already restricts what parsetree consumers can assume (to a limited set of observations), but it's still more modular than "you can assume nothing at all until you typecheck". I have the impression that having one specific use-case (or several specific use-cases) for the feature would help evaluate these potential choices. If what we want to do is one of, for example, (1) inserting subtyping coercions more implicitly, or (2) doing more type-directed disambiguation (of free variables for example), or (3) generating runtime representations of static types, or (4) parsing domain-specific syntaxes for literals, or (5) embedding code from another language with anti-quotations in the middle, then I don't see any deep problem with the "almost-parsetree" approach I mentioned above. Remark: one thing that gives me pause is that the code in that PR (may have merits independently of the overal feature but) is in fact quite tricky and requires a subtle change to a delicate data-structure; it does increase the complexity of the overall implementation. |
Again, I suggest to focus on what this PR actually does. The current version relies on computing an upper approximation of free variables in a sub-expression. I find it intrinsically non satisfying that the type-checker needs to rely on such approximation. This excludes future extensions of the language were such approximations cannot be done purely syntactically -- for instance: (1) some kind of implicits, were expressions are synthesized from expected types (including unexpected reference to variables in scope; (2) a crazy extension were you could write To be clear: I'm not asking to set in stone that the type-checker doesn't do extra traversal of the Parsetree, just that we refrain from doing so for non-essential reasons. I think that after this PR and the much simpler #8775, there is currently no other such traversal (except the main type-checking pattern matching on the Parsetree), at least for expressions. |
|
@gasche I haven't reviewed the patch yet, but I would tend to side with @alainfrisch on this one, and I think you are are overreacting quite a bit. From a higher level perspective, having typechecking doing only one conceptual walk on the syntax tree is a very nice property to have, and helps with decoupling the syntax from the typing. Think in term of typechecking by constraints: once we have extracted the information we need from the syntax, we only work on constraints, and do not need the syntax anymore. Of course, the typechecker doesn't work like that in practice (yet .... we can dream), but ensuring the current implementation respects similar restrictions doesn't seem like a bad idea to me. Also, it doesn't preclude at all the idea of emitting parsetree and typing it on the fly (which was used by metaocaml for quite a while). |
Quite the contrary actually; it's much easier to reason about emitting Parsetree fragments on the fly if they don't have to respect some implicit invariants (such as not introducing new free variables). |
There was a problem hiding this comment.
So, if I understood everything correctly, a high level view of what's happening is this:
- before this PR, in situations where one of the branches is matching on a GADT constructor, we eagerly copy the type of each identifier that appears in the body of any of the branches.
- after this PR, in situations where one of the branches is matching of a GADT constructor, we lazily copy the type of all the identifiers in the environment.
Which sounds reasonable indeed.
As for the implementation: it all seems correct, but I'm not exactly convinced by IdTbl's API. In this PR IdTbl gains a new map : ('a -> 'a) -> 'a t -> 'a t function. However, it's mapping neither eagerly nor lazily. Instead, the supplied function is applied at the end of every successful lookup, just before returning the value (the new make_copy_of_types implementation works around that by doing its own memoization).
That feels wrong to me.
Also, while I don't expect this PR to have a noticeable impact on typing speed (perhaps it will bring a small improvement), I'd be curious to see if there are any impact on the memory usage.
Have you looked at that at all?
|
In general it is a bit risky to delay operations on types to a later point in time because most of them depend on global state of some sort -- especially the current level. I think it would be good to include a comment in |
I thought about caching the result of the map function in the data structure itself, but since there are different lookup functions (find_same, find_name, find_all), and even a remove function, this is not completely straightforward, and would be more complex than the current proposal.
I tried to exhibit an impact on performance with specially crafted synthetic code, but could only see a tiny gain, almost in the noise. Note that one could relatively easily avoid repeated copying of the types in case of nested pattern matching (changing the meaning of I haven't investigated memory usage, but I don't see how this could degrade things significantly. What is your concern? |
It depends on unifications that could have happened (on non-generalized part of the types), though. |
|
That's a good point -- I am not sure that this patch is correct then. The values that get copied will depend on when the lazy is forced, which is almost certainly going to screw up principality. |
Well, I think I could imagine some code that would result in:
Which is clearly a worst case in term of memory for the implementation you propose. And I was curious to see just how bad it can be, compared to the current implementation. So let's just forget about that. :) |
All this only happens in non-principal mode, so this might change some unspecified behavior (I haven't been able to find such a case, though), I don't think this affects correctness. |
|
@trefis pointed out to me that this function is only used in non-principal mode, so it's possible we can get away with it. The only issue is probably if we might affect the ambiguity escaping stuff, but I think that might be fine since we mostly only care about ensuring that types from outside the scope of the equation get copied when used inside the scope of the equations -- doing additional copies of things probably doesn't matter. At this point I would appreciate @garrigue's thoughts on the subject. |
|
I've pushed a commit that prevents repeated copying: after a lookup, only the outermost copying is applied. With this change, the code produced by the script below compiles much faster (tested with ocamlc.opt): 1.35s against 6.6s on trunk (type-checking only: 0.6s against 5.8s). let n1 = 2500
let n2 = 300
let n3 = 5
let () =
let f () =
for j = 1 to n2 do
Printf.printf "method f%i = ()\n" j
done
in
print_endline "type t = A : t";
for i = 1 to n3 do
Printf.printf "let a%i = object\n" i;
f ();
Printf.printf "end\n"
done;
print_endline "let f x =\n";
for i = 1 to n1 do
print_endline "match x with A -> ";
done;
for i = 1 to n3 do
Printf.printf "a%i," i;
done;
print_endline "()"Running the compiler with OCAMLRUNPARAM=v=1024: trunk: PR: |
|
From an API point of view, the last commit only makes things worse. Now we have a map function, for which the following equation holds: Or am I misunderstanding something? |
|
Right, but now the function is properly documented. Perhaps it should be renamed |
|
(Note that "API" is a strong word: the function is not exposed outside of |
It's inside a module
Well at this point, if you're going to rename it, I think you should rename it to If you're going to assume that the only "mapper" that is ever used is |
This would mean turning I've renamed Map to Copy, and removed the |
|
IIUC you're saying that you can't specialize That is a fair point, but IMHO it just means that you shouldn't be baking your optimization (which makes very specific assumptions) into that generic
I'm not convinced. |
I guess having a However, to put things in perspective: this optimization will only trigger when nesting matches on GADTs. Which already isn't that common, and probably never goes further than 3 levels. As much as I like adhoc micro benchmarks (really, I do) your example uses object types with 300 methods, which I guess could be realistic for things like Qt or Gtk, but nests 2.5k matches. |
|
Significant speedups can be observed with only nesting 2 GADT matching, and with smaller types, as below (type-checking goes from 2.1s on trunk, to 1.7s without the latest commits, to 1.5s with the latest version); of course, one needs to have many such nested matching to see a difference. So I'm definitely ok to drop the latest commits. let n2 = 20
let n4 = 10000
let () =
let f () =
for j = 1 to n2 do
Printf.printf "method f%i = ()\n" j
done
in
print_endline "type t = A : t";
Printf.printf "let a = object\n";
f ();
Printf.printf "end\n";
for j = 1 to n4 do
print_endline "let f x = match x with A -> match x with A -> a\n";
done |
|
Sorry for the long delay. I have just seen this thread, and will look at the PR. |
|
Thanks @garrigue . As said above, I'm perfectly ok removing the last two commits, which have a low benefits/complexity ratio. |
|
I did not do a full review yet, but the idea behind this PR seems correct to me. Concerning the difference due to laziness, it can only allow more programs to be typed (i.e. assume more precise types), so this is backward compatible and still correct for the non-principal mode. The extra change I'm suggesting might theoretically break some code, but I would be surprised if it happened in practice. As for the extra optimization (copying only once, and the one I'm suggesting), I think they may really matter, because in order to allow disambiguation of GADT constructors, this duplication now happens inside all pattern-matchings that contain constructors (GADT or not). So as long as the optimization is local enough, I would say it is worth doing it to avoid undue inefficiency. (*) Here is an example for how the restriction to unqualified identifiers has an impact: type _ t = Int : int t
module A = struct
let r = ref []
module M = struct let f () = List.hd !r end
open M
let () = r := [1]
let f1 : type a. a t -> a = function Int -> if true then (1:a) else f()
let f2 : type a. a t -> a = function Int -> if true then (1:a) else M.f()
endNote how this (contrived) code cause an error on |
|
By the way, the behavior is not very stable: in the above, if you write |
|
Actually, this difference is already related to laziness: since the component table is only built when one accesses components (either directly or through |
That's also what I thought, but it doesn't seem to be the case (anymore), since the duplication is done only if one case contains a GADT constructor, as decided after type-checking patterns (on "half-typed cases").
Just to clarify:
|
|
I forgot that refinement... So it may not matter that much anymore, at least in code using not too many GADTs. For the example, I wrote it to document the current behavior. If the cost does not matter, a more stable solution (which is not affected by various caches) would be
If the cost still matters, it would be both more efficient and uniform to handle identifiers in opened modules as qualified ones, even if it's not 100% backward compatible (we're really talking about strange cases). My experience would tend to say that cost eventually matters. |
|
@garrigue : so what would be your recommendation concerning this PR? What about merging without the last two commits for now? |
|
I'm fine with this proposal. It should be fully backward compatible (only accepts slightly more programs). |
1a81adb to
136cc3d
Compare
136cc3d to
4fdf3bb
Compare
|
Ok, I've removed the optimizations for now. @garrigue : can you formally approve the PR (and possibly merge it)? |
garrigue
left a comment
There was a problem hiding this comment.
I believe this is backward compatible, and cleaner than copying types according to the syntax.
|
I merged it. But we should still think about how to make the behavior more regular in the future. |
For reasons I cannot claim I fully understand, the type-checker copies the type of value declarations corresponding to possibly free variables in pattern matching branches. This happens only in non-principal mode and for cases whose pattern possibly contains GADT constructors.
This current implementation computes an upper bound of free variables by traversing the parsetree. This is one of the few ad hoc traversals of the parsetree (in addition to the main type-checking traversal), and I'd like to remove them as much as possible (secret goal: being able to register type-checking hooks to allowing e.g. expanding extension nodes knowing the actual typing environment).
So, this PR reimplements the copying of types in a different way. See the code for details. The type of an identifier is copied when that identifier is first accessed by the type-checker (instead of being copied before branches are type-checked). I don't think that it should matter, but I'm not 100% sure.