Automatic generalize by garrigue · Pull Request #12946 · ocaml/ocaml

garrigue · 2024-01-29T06:45:36Z

This PR makes the generalization automatic when leaving a scope.

Namely, we introduce a notion of type node pool in Btype.

Every new type node is added to the pool corresponding to its level using Btype.add_to_pool.
Ctype.with_local_level_generalize[_structure] create a new pool.
** When leaving its scope, all the nodes in the pool which have level higher than then previous level are generalized
** In the _structure case, type variables rather have their level lowered.
** Nodes with a level lower or equal to the previous level are moved to the pool corresponding to their actual level.
** Nodes that are already at generic_level and Tlink's are ignored.
** Calls to lower_contravariant must be added to the before_generalize argument.
For compatibility, if there is no pool for a given level, then the nodes are added to the last pool.
Since nodes at generic_level are not tracked, we need to add them back to a pool if their level is lowered (in update_level).

We could get rid of all calls to generalize and generalize_structure, but limited_generalize and generalize_spine are still needed for classes.
Checked that there is no negative impact on compiler performance on ocamldoc.

Written with @t6s

gasche · 2024-02-13T09:51:06Z

Naive question: to perform generalization, we only need to traverse the type variables that occur in the result type. Here, if I understand correctly, we generalize all type variables at the level we are exiting, including those that were only used internally for type-checking and do not occur in the result. This changes the algorithmic complexity of inference, right?

In the past I proposed that the with_local_level_* wrappers could pass a function argument that "registers" a type expression as being part of the result, which marks it for generalization later. This would be less cumbersome than the current code in trunk, still more work than the present proposal in the PR, higher risk of introducing bugs by forgetting to register certain variables, but it would have the right algorithmic complexity. Maybe it would be possible to have a debug mode that tracks all type variables in a pool, but invalidates all those that have not been "recorded" (registered as roots) in this way, and would help catch this family of bugs (forgetting to register some types).

For example, in trunk we have

      let cty, ty, force, cty', ty', force' =
        with_local_level_iter ~post:generalize_structure begin fun () ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          ((cty, ty, force, cty', ty', force'),
           [ ty; ty' ])
        end

with your PR we have the nicer

      let cty, ty, force, cty', ty', force' =
        with_local_level_generalize_structure begin fun () ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          (cty, ty, force, cty', ty', force')
        end

but a middle-road would be

      let cty, ty, force, cty', ty', force' =
        with_local_level_iter begin fun gen ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          (cty, gen.structure ty, force,
           cty', gen.structure ty', force')
        end

garrigue · 2024-02-15T08:49:25Z

Indeed, it looks like we do useless work when we generalize nodes that will just be garbage collected.
However, our measurements show that there is no impact on performance, so this is not a big deal.
Moreover, the meaning of "being part of the result" is not so clear.
In trunk, we only generalize nodes which are still relevant to type inference. But this means that we break level-decreasing invariants on the types inside the typed tree, so that we have to take fresh copies when we use them in the middle-end and back-end (hence the comment about using Ctype.correct_levels). With automatic generalize, it should be possible to dispense with that, as we always generalize all the nodes that can be. (Being sure that the invariant is always enforced may require a more thorough review.)

garrigue · 2024-02-20T03:07:37Z

Performance measurements on ocamldoc, using ocamlc.opt and ocamlopt.opt.
There seems to be a 2% slowdown with ocamlc.opt, but with ocamlopt.opt it is not even clear which is faster.
We also did some measurements on versions that generalize a bit less for reference (correspond to log messages in git).

ocamlc.opt
trunk
1.979u 0.913s 0:03.02 95.3%	0+0k 0+0io 18159pf+0w
1.989u 0.903s 0:03.00 96.0%	0+0k 0+0io 17930pf+0w
1.984u 0.905s 0:03.01 95.6%	0+0k 0+0io 17930pf+0w

automatic_generalize
2.011u 0.938s 0:03.06 96.0%	0+0k 0+0io 17691pf+0w
2.021u 0.921s 0:03.05 96.3%	0+0k 0+0io 17691pf+0w
2.023u 0.932s 0:03.07 96.0%	0+0k 0+0io 17691pf+0w

register pools for all levels
2.011u 0.925s 0:03.05 96.0%	0+0k 0+0io 17839pf+0w
2.010u 0.932s 0:03.05 96.3%	0+0k 0+0io 17839pf+0w
2.013u 0.933s 0:03.06 96.0%	0+0k 0+0io 17839pf+0w

automatic generalization in typedecl
2.002u 0.917s 0:03.03 96.0%	0+0k 0+0io 17928pf+0w
1.999u 0.911s 0:03.03 95.7%	0+0k 0+0io 17928pf+0w
2.000u 0.911s 0:03.02 96.3%	0+0k 0+0io 17928pf+0w

ocamlopt.opt
trunk
4.510u 1.438s 0:06.52 91.1%	0+0k 0+0io 30239pf+0w
4.507u 1.414s 0:06.46 91.4%	0+0k 0+0io 30007pf+0w
4.510u 1.405s 0:06.46 91.4%	0+0k 0+0io 30007pf+0w
4.506u 1.413s 0:06.45 91.6%	0+0k 0+0io 30007pf+0w

automatic_generalize
4.495u 1.381s 0:06.39 91.8%	0+0k 0+0io 29747pf+0w
4.512u 1.415s 0:06.47 91.4%	0+0k 0+0io 29747pf+0w
4.518u 1.412s 0:06.47 91.4%	0+0k 0+0io 29747pf+0w
4.491u 1.413s 0:06.48 91.0%	0+0k 0+0io 29747pf+0w

Note that to test with ocamlop.opt one has to add the following line to Makefile:

ocamldoc/%: CAMLOPT = $(BEST_OCAMLOPT) $(STDLIBFLAGS)

garrigue · 2024-02-20T03:28:12Z

The PR is now ready for review.
As the measurements show ,there seems to be a small impact of performance, but we believe the simplicity of use and the improved invariant are worth it. There seems to be no complexity problem, as nodes that do not escape a scope are going to be only generalized once, and at low cost.

lpw25 · 2024-03-04T10:06:36Z

I have a couple of concerns about this PR.

The first is around safety: generlizing type nodes is not the conservative option. When leaving a scope for an expansive expression, this PR means that a failure to call lower_contravariant on a type will cause that type to be unsoundly generalized. I think that for these scopes it would be better for only types that had lower_contravariant called on them to get generalized and for any other types in the scope to just have their level lowered.

The second is around performance. The benchmarks here are pretty small and already show some cost. I'd prefer to see something a little bigger and more varied tested if possible. Have you tried profiling to see if any of the new code is particularly hot? Have you tried using a hashtable instead of a map in a ref? Or even just a dynamic array?

garrigue · 2024-03-05T03:46:33Z

@lpw25
Concerning the first point (generalization of expansive expressions), the relaxed value restriction says that it is safe to generalize variables that have no negative occurrence in the type of the expression. Variables that do not appear in that type have no negative occurrence, so it is clearly safe to generalize them. Your concern is about forgetting to call lower_contravariant. My only answer is that it is easy not to forget it, since it only has to be called on the type of the expression. And there is clearly no difference here with what was done before. (What you suggest is possible, by generalizing variables manually and calling with_local_level_generalize_structure, but the benefit seems limited.)

For performance, we could do more benchmarks, but I'm no specialist. ocamldoc is already middle-sized, and has the particularity of exercising objects. I tried below with the compilation of ocamlopt (using ocamlc.opt) for a more standard style of code, with a slowdown of about 0.6%. It could be nice to have an example with a lot of functors too, but I would not expect major differences.

# trunk
time make ocamlopt CAMLC=../trunk/ocamlc.opt > /dev/null
9.626u 3.378s 0:13.23 98.1%	0+0k 0+0io 97125pf+0w
9.647u 3.404s 0:13.27 98.2%	0+0k 0+0io 96894pf+0w
9.645u 3.371s 0:13.23 98.3%	0+0k 0+0io 96894pf+0w
# automatic generalize
time make ocamlopt CAMLC=./ocamlc.ag.opt > /dev/null
9.687u 3.392s 0:13.32 98.1%	0+0k 0+0io 96705pf+0w
9.714u 3.391s 0:13.32 98.3%	0+0k 0+0io 96705pf+0w
9.712u 3.394s 0:13.34 98.2%	0+0k 0+0io 96705pf+0w

I did not try profiling. It is clear that the hot spots are registering fresh nodes, and traversing them for generalization. The cost of handling generic nodes in update_level is less clear; I would expect inlining to get rid of the cost of the set_level local function.

It might be worth trying with a hash table, however the current code exploits the persistence of maps, so this requires some changes in the code to undo changes.

garrigue · 2024-03-05T04:47:36Z

I did try using a hash table (branch automatic_generalize_hashtbl).

The results are not bad:

9.612u 3.409s 0:13.26 98.1%	0+0k 0+0io 96191pf+0w
9.688u 3.525s 0:13.44 98.2%	0+0k 0+0io 96191pf+0w
9.644u 3.429s 0:13.27 98.4%	0+0k 0+0io 96191pf+0w

I.e., this is very close to trunk. Could switch to this if the difference matters.

I also tried a hash table specialized to integers, but there is not improvement in performance.

goldfirere · 2024-04-18T12:39:20Z

I will plan to review this next week.

garrigue · 2024-04-18T13:39:21Z

Removed Ctype.correct_levels. This didn't take long, and the test suite goes through. You might still want to check on a bigger code base.

garrigue · 2024-04-18T14:25:03Z

I also checked that remove Ctype_correct_levels from Typeop.scrape_ty indeed fails in trunk (otherwise, the test suite is dubious...)
There were surprisingly few failures, but the following tests indeed fail (in trunk):

List of failed tests:
    tests/effects/issue479.ml
    tests/typing-misc/constraints.ml
    tests/typing-objects/Exemples.ml
    tests/typing-poly/poly.ml

Those are the ones exercising the type checker.
There were no errors in typing-gadts.
By the way, automatic_generalize fails on tests/lib-threads/torture.ml, but IIRC this one is stochastic.

goldfirere

Read through this at last. I think I would want to understand this better before Approving. Ideally comments would lead the way, but it may also be helpful to schedule a call to review this together.

typing/btype.ml

typing/btype.mli

typing/typecore.ml

typing/btype.ml

typing/btype.mli

typing/typecore.ml

goldfirere · 2024-05-28T13:00:59Z

I've reviewed @garrigue's two latest commits about updating btype and Pexp_apply.

garrigue · 2024-05-30T08:44:52Z

@goldfirere Thank for your view review. I am still updating the code, as you helped discover a number of possible improvements, and hope to finish by next week. Then we can schedule a call.

garrigue

@goldfirere I think I answered all your questions.
I have pushed the updated code.
If you have questions we can have a discussion.
As usual 9pm in Japan would be OK. Though we have already a meeting programmed on the 20th.

typing/btype.ml

typing/btype.mli

typing/btype.ml

typing/ctype.ml

typing/typecore.ml

typing/typedecl.ml

typing/btype.ml

goldfirere · 2024-06-14T17:23:45Z

Just one last little comment request, and then we can merge! :)

garrigue · 2024-06-19T08:21:34Z

Thank you for your approval. However, rebasing causes an error in the testsuite, and we are still inverstigating the cause. (There is an easy workaround, but I would really like to understand what is happening.)

use with_local_level_generalize in typecore allow lowering generic_level nodes do more automatic generalization fix add_to_pool clear simple_abbrevs in with_local_level_generalize - rename [?post] to [?before_generalize] - use [with_local_level_generalize*] for easy cases in typecore.ml use with_local_level_generalize_* for easy parts of typeclass.ml fix copy_spine use before_generalize use automatic generalize for univars in Typecore.type_label_exp make almost all generalization automatic in typecore missing generalization in type_let no manual generalize in typecore Typeclass.type_classes automatic generalization in typedecl remove remaining uses of generalize add Changes

…-modules-bugs/pr7601_ok.ml)

garrigue · 2024-06-20T06:55:50Z

@goldfirere Ironically, it seems that the cause of the failure is #13088, your PR about making list_labels pure, that I reviewed myself.
What is happening is not completely clear, but this shows that it is very difficult to produce a correct map from levels to pools if we work level by level.
So we tried another approach, using rather a stack of pools, associated to the level at which they will be generalized. Then one just adds nodes to the topmost pool whose level is sufficiently low. Search in this structure is linear, but since insertion is amortized, this ends up being more efficient than lookup in a map. And the correctness of lookup in this structure is trivial.

It would be nice if you could review the last commit, which is the only new one. It also uses with_level in a few places, to avoid direct assignments to current_level.

We again did benchmark compiling ocamldoc and ocamlopt using ocamlc.opt, and divine surprise, we are now faster than trunk for ocamlopt.

ocamlopt
trunk
time make ocamlopt CAMLC=../trunk/ocamlc.opt > /dev/null
9.924u 3.684s 0:13.86 98.1%	0+0k 0+0io 99022pf+0w
10.001u 3.868s 0:14.19 97.6%	0+0k 0+0io 98814pf+0w
9.943u 3.717s 0:13.87 98.4%	0+0k 0+0io 98582pf+0w

with pool stack
time make ocamlopt CAMLC=./ocamlc.agps.opt > /dev/null
9.856u 3.728s 0:13.80 98.3%	0+0k 0+0io 98786pf+0w
9.882u 3.688s 0:13.77 98.4%	0+0k 0+0io 98553pf+0w
9.946u 3.908s 0:14.16 97.7%	0+0k 0+0io 98553pf+0w

ocamldoc
trunk
1.732u 0.715s 0:02.50 97.6%	0+0k 0+0io 17914pf+0w
1.739u 0.717s 0:02.51 97.2%	0+0k 0+0io 17913pf+0w
1.742u 0.716s 0:02.51 97.6%	0+0k 0+0io 17913pf+0w

with pool stack (ocamlc.agps.opt)
1.738u 0.719s 0:02.51 97.2%	0+0k 0+0io 17913pf+0w
1.744u 0.720s 0:02.52 97.6%	0+0k 0+0io 17913pf+0w
1.753u 0.724s 0:02.55 96.8%	0+0k 0+0io 17913pf+0w

goldfirere · 2024-06-20T14:27:13Z

typing/btype.ml

+let rec dummy = {level = max_int; pool = []; next = dummy}
+let pool_stack = s_table (fun () -> {level = 0; pool = []; next = dummy}) ()
+
+let rec pool_of_level level pool =


It would be nice to highlight that this algorithm is linear, but that you believe amortization still suggests this will be efficient. I further think this might be more efficient than it appears because almost all the time, the topmost pool is the right one (which was also found quickly in the old code)... and higher pools are more likely the answer every step of the way.

If we wanted to keep this structure but found the linear-time access to be troublesome, we could easily switch to using binary search. This would require storing the pools in an array, but preallocating, say, 100 pools is probably fine. I don't think this change is warranted, but it might be nice to leave a comment saying that we could switch to a more efficient implementation if called for.

ccasin mentioned this pull request Feb 9, 2024

generic Tvar should have fixed jkind oxcaml/oxcaml#2269

Merged

garrigue force-pushed the automatic_generalize branch from ff85d78 to a360fa1 Compare February 19, 2024 03:58

garrigue marked this pull request as ready for review February 20, 2024 03:07

garrigue force-pushed the automatic_generalize branch from edd8f0b to ff52647 Compare February 29, 2024 01:49

garrigue force-pushed the automatic_generalize branch from e5e3156 to 4bc04ab Compare April 18, 2024 13:37

goldfirere reviewed May 26, 2024

View reviewed changes

goldfirere reviewed May 28, 2024

View reviewed changes

typing/btype.ml Outdated Show resolved Hide resolved

typing/btype.ml Outdated Show resolved Hide resolved

typing/btype.ml Outdated Show resolved Hide resolved

typing/btype.ml Show resolved Hide resolved

goldfirere reviewed May 28, 2024

View reviewed changes

typing/btype.mli Outdated Show resolved Hide resolved

typing/typecore.ml Show resolved Hide resolved

garrigue commented Jun 14, 2024

View reviewed changes

goldfirere approved these changes Jun 18, 2024

View reviewed changes

t6s and others added 6 commits June 20, 2024 15:30

deprecate Ctype.generalize

8da0765

register pools for all levels

4f95684

generalize more + hygiene

ccc5779

use automatic generalization in moregen

9240296

simplify

ca905dc

garrigue added 16 commits June 20, 2024 15:30

name subject_level

8fb9f5f

add explanation about use of Transient_expr.set_level

1ca294c

remove Ctype.generalize*

1c7529a

remove Ctype.correct_levels

28cc582

update btype.ml following Richard's comments

ebf1261

remove strange nesting of levels in Pexp_apply case

7e6953c

typos in comments

6f0bf50

use Misc.protect_refs

1b7aa06

explain with_local_level_gen and make it closer to original code

7eeb5af

generalize_spine and generalize_class_signature_spine

a7ebd3f

explain levels and generalization in moregeneral

2460ee7

add more comments to with_local_level_gen

80bcacf

restructure a bit with_local_level_gen

dc75c1a

forgotten comment + hygiene

f09ed75

typo

16cc67d

switch to using stack of pools (avoids lookup failure in tests/typing…

b7aba09

…-modules-bugs/pr7601_ok.ml)

garrigue force-pushed the automatic_generalize branch from ff8cfe9 to b7aba09 Compare June 20, 2024 06:39

goldfirere approved these changes Jun 20, 2024

View reviewed changes

garrigue added 2 commits June 21, 2024 10:03

add comment about complexity

22f40a1

fix comment

fbe7ac5

garrigue merged commit b0b9a71 into ocaml:trunk Jun 21, 2024

Octachron mentioned this pull request Dec 17, 2024

5.3: some constrained types are not generalized #13688

Closed

garrigue mentioned this pull request Feb 23, 2026

New well-founded implementation based on the definition smuenzel/ocaml#1

Draft

Conversation

garrigue commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Feb 13, 2024

Uh oh!

garrigue commented Feb 15, 2024

Uh oh!

garrigue commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garrigue commented Feb 20, 2024

Uh oh!

lpw25 commented Mar 4, 2024

Uh oh!

garrigue commented Mar 5, 2024

Uh oh!

garrigue commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldfirere commented Apr 18, 2024

Uh oh!

garrigue commented Apr 18, 2024

Uh oh!

garrigue commented Apr 18, 2024

Uh oh!

goldfirere left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldfirere commented May 28, 2024

Uh oh!

garrigue commented May 30, 2024

Uh oh!

garrigue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldfirere commented Jun 14, 2024

Uh oh!

garrigue commented Jun 19, 2024

Uh oh!

garrigue commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldfirere Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

garrigue commented Jan 29, 2024 •

edited

Loading

garrigue commented Feb 20, 2024 •

edited

Loading

garrigue commented Mar 5, 2024 •

edited

Loading

garrigue commented Jun 20, 2024 •

edited

Loading