New implementation of Env.make_copy_of_types by alainfrisch · Pull Request #8774 · ocaml/ocaml

alainfrisch · 2019-06-28T10:13:54Z

For reasons I cannot claim I fully understand, the type-checker copies the type of value declarations corresponding to possibly free variables in pattern matching branches. This happens only in non-principal mode and for cases whose pattern possibly contains GADT constructors.

This current implementation computes an upper bound of free variables by traversing the parsetree. This is one of the few ad hoc traversals of the parsetree (in addition to the main type-checking traversal), and I'd like to remove them as much as possible (secret goal: being able to register type-checking hooks to allowing e.g. expanding extension nodes knowing the actual typing environment).

So, this PR reimplements the copying of types in a different way. See the code for details. The type of an identifier is copied when that identifier is first accessed by the type-checker (instead of being copied before branches are type-checked). I don't think that it should matter, but I'm not 100% sure.

gasche · 2019-06-28T10:35:11Z

Could we discuss the secret goal a bit more? It's not completely clear to me why typeful ppxes (ppt ?) requires to get rid of AST traversals. Is the idea that the extension points in question would be translated into a "temporary" parsetree that should type-check in a reasonable way (this is what eg. ReLit does), but you don't want those to be traversed by extra computations with various expectations?

(I'm surprised by the removal of iter_expression that sounds like a useful helper function to have in the compiler, and possibly for compiler-libs users. Or is there a better alternative already?)

trefis · 2019-06-28T11:42:22Z

For reasons I cannot claim I fully understand, the type-checker copies the type of value declarations corresponding to possibly free variables in pattern matching branches. This happens only in non-principal mode and for cases whose pattern possibly contains GADT constructors.

I had to look at that myself in the past, and since it isn't exactly obvious, nor is it documented, let me try to give a rough explanation of what is happening.

In the body of clauses which match on a GADT, we might introduce ambivalence, but we want to restrict the scope of such ambivalent types to the branch introducing the equation.
Take this example:

type (_,_) eq = Refl : ('a,'a) eq

let cast : type a b. (a, b) eq -> a -> b =
  fun eq x ->
    match eq with Refl -> x

When we arrive in the body of the branch, we know that x has type a, and we need to modify that type to make it a ≈ b.

However, and this is actually an implementation detail, what the typechecker actually knows in the body of the branch is that x is an instance of type a at a level l, which is outside of the scope of the equation. And that actually prevents us from introducing the ambivalence we wanted (because it would be leaking outside of the equation's scope).

The copy of the type of x solves that problem: it distinguishes the type used outside the body of the branch from the one used inside (i.e. no leaking), and in doing so updates the level of the one used inside the branch.

This is not needed in principal mode, because in that mode we're much more careful about the amount of sharing that happens between types, which things are known principally and which aren't, etc.

As for the actual implementation, I'll review it once there is a high-level agreement (with @gasche, and possibly others) about the acceptance of the change.

alainfrisch · 2019-06-28T11:43:49Z

The secret goal could be to allow hooking a custom type_expect function (for any expression, or restricted to extension nodes, to be seen). Depending on the version of the secret goal, the custom function could rewritte the Parsetree fragment, or directly do the type-checking, producing Typedtree nodes. In both cases, the function could access the current type-checking environment (otherwise it would be a plain ppx). Each extra traversal, in addition to type_expect, would need to be overridden in the same way, but sometimes this is not possible (if the "expansion" depends on the type-checking environment, or the expected type).

(I'm surprised by the removal of iter_expression that sounds like a useful helper function to have in the compiler, and possibly for compiler-libs users. Or is there a better alternative already?)

Ast_iterator is more powerful.

alainfrisch · 2019-07-01T07:09:33Z

I suggest to keep the discussion focused on this PR, which, I think, has merits independently of the "secret goal".

gasche · 2019-07-01T07:50:20Z

I'm a bit uncomfortable with the general idea of removing all traversals on the parsetree. If I understand the idea correctly, it is: the parsetree cannot be trusted because it may contain arbitrary nodes that do not respect a traditional parsetree structure (such as: extension nodes, but you are leaving the door open for repurposing other constructs), and the only thing we can validly do on the parsetree is to type-check it and use the result. (Then we could in fact erase the types again and get the "real parsetree" of the program (or at least: the real untyped AST), and it would be valid to manipulate that, but this is not really the plan.)

I find it very demanding of the compiler implementors (and other parsetree consumers) to basically say: this entire intermediate representation is now forbidden, you cannot make any assumptions on it before you do X. That's a very global constraint and it feels limiting (and in contradiction with many things people may want to do, for example ppx authors).

Would it be possible to consider alternative designs that do not make such invasive changes to the assumptions we are allowed to make in the compiler implementation? For example, could there exists a design with "not-so-wrong AST proposals" that ensure that the parsetree obtained before type-checking is not quite right, but also robust enough for most operations to work correctly (for example: listing free variables)? (Again, this is what Relit does, if I remember correctly.)

One option may be to look at the set of operations we perform on parsetree today, and ask for AST approximations that support at least those. This already restricts what parsetree consumers can assume (to a limited set of observations), but it's still more modular than "you can assume nothing at all until you typecheck".

I have the impression that having one specific use-case (or several specific use-cases) for the feature would help evaluate these potential choices. If what we want to do is one of, for example, (1) inserting subtyping coercions more implicitly, or (2) doing more type-directed disambiguation (of free variables for example), or (3) generating runtime representations of static types, or (4) parsing domain-specific syntaxes for literals, or (5) embedding code from another language with anti-quotations in the middle, then I don't see any deep problem with the "almost-parsetree" approach I mentioned above.

Remark: one thing that gives me pause is that the code in that PR (may have merits independently of the overal feature but) is in fact quite tricky and requires a subtle change to a delicate data-structure; it does increase the complexity of the overall implementation.

alainfrisch · 2019-07-01T15:53:38Z

ensure that the parsetree obtained before type-checking is not quite right, but also robust enough for most operations to work correctly (for example: listing free variables)?

Again, I suggest to focus on what this PR actually does. The current version relies on computing an upper approximation of free variables in a sub-expression. I find it intrinsically non satisfying that the type-checker needs to rely on such approximation. This excludes future extensions of the language were such approximations cannot be done purely syntactically -- for instance: (1) some kind of implicits, were expressions are synthesized from expected types (including unexpected reference to variables in scope; (2) a crazy extension were you could write { x=42; .. } : A.t were missing fields would be filled from identically named local variables. Anyway, even if language extensions actually considered in the future don't "mess" with the ability to upper-approximate free variables, one would still need to document the fact that e.g. "extension node expanders" need to respect this property. What this PR shows is that this specific constraint is not something deep required by the type-checker, but just a consequence of how things are being implemented currently; and that an alternative implementation exists which avoids the need to maintain this property.

To be clear: I'm not asking to set in stone that the type-checker doesn't do extra traversal of the Parsetree, just that we refrain from doing so for non-essential reasons. I think that after this PR and the much simpler #8775, there is currently no other such traversal (except the main type-checking pattern matching on the Parsetree), at least for expressions.

Drup · 2019-07-01T16:09:57Z

@gasche I haven't reviewed the patch yet, but I would tend to side with @alainfrisch on this one, and I think you are are overreacting quite a bit.

From a higher level perspective, having typechecking doing only one conceptual walk on the syntax tree is a very nice property to have, and helps with decoupling the syntax from the typing. Think in term of typechecking by constraints: once we have extracted the information we need from the syntax, we only work on constraints, and do not need the syntax anymore.

Of course, the typechecker doesn't work like that in practice (yet .... we can dream), but ensuring the current implementation respects similar restrictions doesn't seem like a bad idea to me. Also, it doesn't preclude at all the idea of emitting parsetree and typing it on the fly (which was used by metaocaml for quite a while).

alainfrisch · 2019-07-01T16:13:07Z

Also, it doesn't preclude at all the idea of emitting parsetree and typing it on the fly (which was used by metaocaml for quite a while).

Quite the contrary actually; it's much easier to reason about emitting Parsetree fragments on the fly if they don't have to respect some implicit invariants (such as not introducing new free variables).

trefis

So, if I understood everything correctly, a high level view of what's happening is this:

before this PR, in situations where one of the branches is matching on a GADT constructor, we eagerly copy the type of each identifier that appears in the body of any of the branches.
after this PR, in situations where one of the branches is matching of a GADT constructor, we lazily copy the type of all the identifiers in the environment.

Which sounds reasonable indeed.

As for the implementation: it all seems correct, but I'm not exactly convinced by IdTbl's API. In this PR IdTbl gains a new map : ('a -> 'a) -> 'a t -> 'a t function. However, it's mapping neither eagerly nor lazily. Instead, the supplied function is applied at the end of every successful lookup, just before returning the value (the new make_copy_of_types implementation works around that by doing its own memoization).
That feels wrong to me.

Also, while I don't expect this PR to have a noticeable impact on typing speed (perhaps it will bring a small improvement), I'd be curious to see if there are any impact on the memory usage.
Have you looked at that at all?

lpw25 · 2019-07-02T14:22:55Z

In general it is a bit risky to delay operations on types to a later point in time because most of them depend on global state of some sort -- especially the current level. I think it would be good to include a comment in Env.make_copy_of_types saying that it is fine in this case because Subst.type_expr does not depend on any global state (I had to check this point for myself when reading your code and the comment would have saved me some time).

alainfrisch · 2019-07-03T07:42:36Z

However, it's mapping neither eagerly nor lazily. Instead, the supplied function is applied at the end of every successful lookup, just before returning the value (the new make_copy_of_types implementation works around that by doing its own memoization). That feels wrong to me.

I thought about caching the result of the map function in the data structure itself, but since there are different lookup functions (find_same, find_name, find_all), and even a remove function, this is not completely straightforward, and would be more complex than the current proposal.

Also, while I don't expect this PR to have a noticeable impact on typing speed (perhaps it will bring a small improvement), I'd be curious to see if there are any impact on the memory usage.
Have you looked at that at all?

I tried to exhibit an impact on performance with specially crafted synthetic code, but could only see a tiny gain, almost in the noise. Note that one could relatively easily avoid repeated copying of the types in case of nested pattern matching (changing the meaning of IdTbl.map), which could bring some extra gain.

I haven't investigated memory usage, but I don't see how this could degrade things significantly. What is your concern?

alainfrisch · 2019-07-03T07:44:44Z

Subst.type_expr does not depend on any global state

It depends on unifications that could have happened (on non-generalized part of the types), though.

lpw25 · 2019-07-03T09:48:59Z

That's a good point -- I am not sure that this patch is correct then. The values that get copied will depend on when the lazy is forced, which is almost certainly going to screw up principality.

trefis · 2019-07-03T10:25:07Z

I haven't investigated memory usage, but I don't see how this could degrade things significantly. What is your concern?

Well, I think I could imagine some code that would result in:

storing a lot of these hashtbls that you introduce for memoization (you just need a lot of pattern matchings on GADTs)
underusing them

Which is clearly a worst case in term of memory for the implementation you propose. And I was curious to see just how bad it can be, compared to the current implementation.
That being said, 16 is the minimal size for hashtbls, and I guess you would really need a ton of matches for the overhead to become noticeable.

So let's just forget about that. :)

alainfrisch · 2019-07-03T10:43:43Z

The values that get copied will depend on when the lazy is forced, which is almost certainly going to screw up principality.

All this only happens in non-principal mode, so this might change some unspecified behavior (I haven't been able to find such a case, though), I don't think this affects correctness.

lpw25 · 2019-07-03T10:45:21Z

@trefis pointed out to me that this function is only used in non-principal mode, so it's possible we can get away with it. The only issue is probably if we might affect the ambiguity escaping stuff, but I think that might be fine since we mostly only care about ensuring that types from outside the scope of the equation get copied when used inside the scope of the equations -- doing additional copies of things probably doesn't matter.

At this point I would appreciate @garrigue's thoughts on the subject.

alainfrisch · 2019-07-03T12:10:26Z

I've pushed a commit that prevents repeated copying: after a lookup, only the outermost copying is applied. With this change, the code produced by the script below compiles much faster (tested with ocamlc.opt): 1.35s against 6.6s on trunk (type-checking only: 0.6s against 5.8s).

let n1 = 2500
let n2 = 300
let n3 = 5

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in
  print_endline "type t = A : t";
  for i = 1 to n3 do
    Printf.printf "let a%i = object\n" i;
    f ();
    Printf.printf "end\n"
  done;
  print_endline "let f x =\n";
  for i = 1 to n1 do
    print_endline "match x with A -> ";
  done;
  for i = 1 to n3 do
    Printf.printf "a%i," i;
  done;
  print_endline "()"

Running the compiler with OCAMLRUNPARAM=v=1024:

trunk:

allocated_words: 568881603
minor_words: 568829267
promoted_words: 129001886
major_words: 129054222
minor_collections: 2178
major_collections: 17
heap_words: 132075520
heap_chunks: 41
top_heap_words: 132075520
compactions: 0

PR:

allocated_words: 278927663
minor_words: 278875324
promoted_words: 11814203
major_words: 11866542
minor_collections: 1073
major_collections: 19
heap_words: 3032064
heap_chunks: 14
top_heap_words: 3032064
compactions: 0

trefis · 2019-07-03T12:18:26Z

From an API point of view, the last commit only makes things worse. Now we have a map function, for which the following equation holds: map f (map g t) = map f t.

Or am I misunderstanding something?

alainfrisch · 2019-07-03T12:20:12Z

Right, but now the function is properly documented. Perhaps it should be renamed project (as in "apply an idempotent mapping")?

alainfrisch · 2019-07-03T12:21:20Z

(Note that "API" is a strong word: the function is not exposed outside of Env.)

trefis · 2019-07-03T12:31:43Z

Note that "API" is a strong word: the function is not exposed outside of Env.

It's inside a module IdTbl, and used outside the module with IdTbl.map.
What other word do you suggest?

Perhaps it should be renamed project (as in "apply an idempotent mapping")?

Well at this point, if you're going to rename it, I think you should rename it to make_copy_of_types and specialize the implementation to not take a function at all.
Because right now, I think I could produce an idempotent mapping M that is not copy_types and produce a broken environment whenever the two interact.

If you're going to assume that the only "mapper" that is ever used is copy_types then please, do reflect that assumption in the API you provide. Don't provide a generic interface that doesn't behave as people would expect.

alainfrisch · 2019-07-03T13:20:56Z

I think you should rename it to make_copy_of_types and specialize the implementation to not take a function at all.

This would mean turning Map into a GADT constructor (and thus write explicit signature for all lookup functions), and one would need to keep the memoization table around anyway, mixing generic code (for all 'a) and code specific to value declarations in IdTbl. That would make the code worse, I think.

I've renamed Map to Copy, and removed the map function from IdTbl, inlined to its only call site, which now explicitly uses the Copy operator. Is this better?

trefis · 2019-07-03T13:48:28Z

IIUC you're saying that you can't specialize IdTbl for the particular copy function that we happen to use (Subst.type_expr Subst.identity) as I was sort of suggesting, because that function only applies to type_exprs, while the type of values ('a) is not fixed.

That is a fair point, but IMHO it just means that you shouldn't be baking your optimization (which makes very specific assumptions) into that generic IdTbl structure.

I've renamed Map to Copy, and removed the map function from IdTbl, inlined to its only call site, which now explicitly uses the Copy operator. Is this better?

I'm not convinced.
But others might have a different opinion.

trefis · 2019-07-03T14:15:13Z

I've renamed Map to Copy, and removed the map function from IdTbl, inlined to its only call site, which now explicitly uses the Copy operator. Is this better?

I'm not convinced.
But others might have a different opinion.

I guess having a set_projection_fun in the API, thus indicating only the latest one will apply, could be acceptable.

However, to put things in perspective: this optimization will only trigger when nesting matches on GADTs. Which already isn't that common, and probably never goes further than 3 levels.
But more importantly, will probably not make any noticeable difference unless you're copying big types (e.g. object types, polymorphic variant types).

As much as I like adhoc micro benchmarks (really, I do) your example uses object types with 300 methods, which I guess could be realistic for things like Qt or Gtk, but nests 2.5k matches.
And I don't think that's really representative of any program, do you?

alainfrisch · 2019-07-03T15:04:31Z

Significant speedups can be observed with only nesting 2 GADT matching, and with smaller types, as below (type-checking goes from 2.1s on trunk, to 1.7s without the latest commits, to 1.5s with the latest version); of course, one needs to have many such nested matching to see a difference. So I'm definitely ok to drop the latest commits.

let n2 = 20
let n4 = 10000

let () =
  let f () =
    for j = 1 to n2 do
      Printf.printf "method f%i = ()\n" j
    done
  in
  print_endline "type t = A : t";
  Printf.printf "let a = object\n";
  f ();
  Printf.printf "end\n";

  for j = 1 to n4 do
    print_endline "let f x = match x with A -> match x with A -> a\n";
  done

garrigue · 2019-07-10T07:51:57Z

Sorry for the long delay. I have just seen this thread, and will look at the PR.

alainfrisch · 2019-07-10T09:19:23Z

Thanks @garrigue . As said above, I'm perfectly ok removing the last two commits, which have a low benefits/complexity ratio.

garrigue · 2019-07-11T03:48:33Z

I did not do a full review yet, but the idea behind this PR seems correct to me.
It is arguably better than the current code, as it avoids unnecessary copying due to the approximation. Maybe this could be further improved by limiting copying to identifiers that were not introduced by open, since there seems to be no reason to handle them differently from prefixed identifiers, whose types are not copied (*). Now that you doing that inside TycompTbl this seems easy to implement.

Concerning the difference due to laziness, it can only allow more programs to be typed (i.e. assume more precise types), so this is backward compatible and still correct for the non-principal mode. The extra change I'm suggesting might theoretically break some code, but I would be surprised if it happened in practice.

As for the extra optimization (copying only once, and the one I'm suggesting), I think they may really matter, because in order to allow disambiguation of GADT constructors, this duplication now happens inside all pattern-matchings that contain constructors (GADT or not). So as long as the optimization is local enough, I would say it is worth doing it to avoid undue inefficiency.

(*) Here is an example for how the restriction to unqualified identifiers has an impact:

type _ t = Int : int t
module A = struct
  let r = ref []
  module M = struct let f () = List.hd !r end
  open M
  let () = r := [1]
  let f1 : type a. a t -> a = function Int -> if true then (1:a) else f()
  let f2 : type a. a t -> a = function Int -> if true then (1:a) else M.f()
end

Note how this (contrived) code cause an error on f2 but not on f1.

garrigue · 2019-07-11T03:51:43Z

By the way, the behavior is not very stable: in the above, if you write open M after the assignment, both functions are fine.

garrigue · 2019-07-11T04:02:55Z

Actually, this difference is already related to laziness: since the component table is only built when one accesses components (either directly or through open), and types are substituted at this point (which generalizes the levels of the non-variable nodes), whether some part of the type is generalized or not depends of when it was accessed first. And it also explains why the above kind of code (where instantiation of some free variable is delayed) is actually the only way one can have non-generalized parts in the type of a module component, justifying that it is (mostly) sufficient to copy the types of non-qualified identifiers.

alainfrisch · 2019-07-11T08:18:02Z

I think they may really matter, because in order to allow disambiguation of GADT constructors, this duplication now happens inside all pattern-matchings that contain constructors (GADT or not)

That's also what I thought, but it doesn't seem to be the case (anymore), since the duplication is done only if one case contains a GADT constructor, as decided after type-checking patterns (on "half-typed cases").

Note how this (contrived) code cause an error on f2 but not on f1.

Just to clarify:

Do you think it would be better if the error was raised on f1 as well, or is it just that it's an acceptable downside, which should not happen too much in practice?
If it is better to have an error on f1, is this because it would be more coherent with f2, or for some intrinsic reason?

garrigue · 2019-07-11T10:24:58Z

I forgot that refinement... So it may not matter that much anymore, at least in code using not too many GADTs.

For the example, I wrote it to document the current behavior.
The problem does not seem a big one, but the current situation is not very clean.
This example also shows that there is actually no point in copying types from a component table if no instantiation has occurred since the table was built (such instantiations are very rare, and clearly not principal, but I'm not sure this is properly detected due to the module machinery).

If the cost does not matter, a more stable solution (which is not affected by various caches) would be

copy types of qualified identifiers too (i.e. do the copy when accessing NameMap too, but this may be a pain)
do not cache the copying, rather copy on every access (so that we get the best possible type)

If the cost still matters, it would be both more efficient and uniform to handle identifiers in opened modules as qualified ones, even if it's not 100% backward compatible (we're really talking about strange cases).

My experience would tend to say that cost eventually matters.

alainfrisch · 2019-07-16T07:28:57Z

@garrigue : so what would be your recommendation concerning this PR? What about merging without the last two commits for now?

garrigue · 2019-07-17T06:33:36Z

I'm fine with this proposal. It should be fully backward compatible (only accepts slightly more programs).
In the long run, this copying approach is only a hack around the lack of unsharing of internal nodes in non-principal mode. This is cleaner, but has to be of the same hackish nature.

alainfrisch · 2019-07-17T08:24:01Z

Ok, I've removed the optimizations for now. @garrigue : can you formally approve the PR (and possibly merge it)?

garrigue

I believe this is backward compatible, and cleaner than copying types according to the syntax.

garrigue · 2019-07-18T01:12:45Z

I merged it. But we should still think about how to make the behavior more regular in the future.

trefis self-assigned this Jun 28, 2019

This was referenced Jun 28, 2019

Reimplement Typecore.final_subexpression on Typedtree, not Parsetree #8775

Merged

Slower compilation 4.07 => 4.08 #8776

Closed

trefis reviewed Jul 2, 2019

View reviewed changes

alainfrisch requested a review from garrigue July 10, 2019 09:18

alainfrisch force-pushed the simplify_duplicate_ident branch from 1a81adb to 136cc3d Compare July 17, 2019 08:16

New implementation of Env.make_copy_of_types

4fdf3bb

alainfrisch force-pushed the simplify_duplicate_ident branch from 136cc3d to 4fdf3bb Compare July 17, 2019 08:21

garrigue approved these changes Jul 18, 2019

View reviewed changes

garrigue merged commit 5ccf3c5 into ocaml:trunk Jul 18, 2019

Conversation

alainfrisch commented Jun 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jun 28, 2019

Uh oh!

trefis commented Jun 28, 2019

Uh oh!

alainfrisch commented Jun 28, 2019

Uh oh!

alainfrisch commented Jul 1, 2019

Uh oh!

gasche commented Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Jul 1, 2019

Uh oh!

Drup commented Jul 1, 2019

Uh oh!

alainfrisch commented Jul 1, 2019

Uh oh!

trefis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lpw25 commented Jul 2, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

lpw25 commented Jul 3, 2019

Uh oh!

trefis commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

lpw25 commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

trefis commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

trefis commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

trefis commented Jul 3, 2019

Uh oh!

trefis commented Jul 3, 2019

Uh oh!

alainfrisch commented Jul 3, 2019

Uh oh!

garrigue commented Jul 10, 2019

Uh oh!

alainfrisch commented Jul 10, 2019

Uh oh!

garrigue commented Jul 11, 2019

Uh oh!

garrigue commented Jul 11, 2019

Uh oh!

garrigue commented Jul 11, 2019

Uh oh!

alainfrisch commented Jul 11, 2019

Uh oh!

garrigue commented Jul 11, 2019

Uh oh!

alainfrisch commented Jul 16, 2019

Uh oh!

garrigue commented Jul 17, 2019

Uh oh!

alainfrisch commented Jul 17, 2019

Uh oh!

garrigue left a comment

Choose a reason for hiding this comment

Uh oh!

alainfrisch commented Jun 28, 2019 •

edited

Loading

gasche commented Jul 1, 2019 •

edited

Loading

trefis left a comment •

edited

Loading