Skip to content

Automatic generalize#12946

Merged
garrigue merged 24 commits intoocaml:trunkfrom
COCTI:automatic_generalize
Jun 21, 2024
Merged

Automatic generalize#12946
garrigue merged 24 commits intoocaml:trunkfrom
COCTI:automatic_generalize

Conversation

@garrigue
Copy link
Copy Markdown
Contributor

@garrigue garrigue commented Jan 29, 2024

This PR makes the generalization automatic when leaving a scope.

Namely, we introduce a notion of type node pool in Btype.

  • Every new type node is added to the pool corresponding to its level using Btype.add_to_pool.
  • Ctype.with_local_level_generalize[_structure] create a new pool.
    ** When leaving its scope, all the nodes in the pool which have level higher than then previous level are generalized
    ** In the _structure case, type variables rather have their level lowered.
    ** Nodes with a level lower or equal to the previous level are moved to the pool corresponding to their actual level.
    ** Nodes that are already at generic_level and Tlink's are ignored.
    ** Calls to lower_contravariant must be added to the before_generalize argument.
  • For compatibility, if there is no pool for a given level, then the nodes are added to the last pool.
  • Since nodes at generic_level are not tracked, we need to add them back to a pool if their level is lowered (in update_level).

We could get rid of all calls to generalize and generalize_structure, but limited_generalize and generalize_spine are still needed for classes.
Checked that there is no negative impact on compiler performance on ocamldoc.

Written with @t6s

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 13, 2024

Naive question: to perform generalization, we only need to traverse the type variables that occur in the result type. Here, if I understand correctly, we generalize all type variables at the level we are exiting, including those that were only used internally for type-checking and do not occur in the result. This changes the algorithmic complexity of inference, right?

In the past I proposed that the with_local_level_* wrappers could pass a function argument that "registers" a type expression as being part of the result, which marks it for generalization later. This would be less cumbersome than the current code in trunk, still more work than the present proposal in the PR, higher risk of introducing bugs by forgetting to register certain variables, but it would have the right algorithmic complexity. Maybe it would be possible to have a debug mode that tracks all type variables in a pool, but invalidates all those that have not been "recorded" (registered as roots) in this way, and would help catch this family of bugs (forgetting to register some types).

For example, in trunk we have

      let cty, ty, force, cty', ty', force' =
        with_local_level_iter ~post:generalize_structure begin fun () ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          ((cty, ty, force, cty', ty', force'),
           [ ty; ty' ])
        end

with your PR we have the nicer

      let cty, ty, force, cty', ty', force' =
        with_local_level_generalize_structure begin fun () ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          (cty, ty, force, cty', ty', force')
        end

but a middle-road would be

      let cty, ty, force, cty', ty', force' =
        with_local_level_iter begin fun gen ->
          let (cty, ty, force) =
            Typetexp.transl_simple_type_delayed env sty
          and (cty', ty', force') =
            Typetexp.transl_simple_type_delayed env sty'
          in
          (cty, gen.structure ty, force,
           cty', gen.structure ty', force')
        end

@garrigue
Copy link
Copy Markdown
Contributor Author

Indeed, it looks like we do useless work when we generalize nodes that will just be garbage collected.
However, our measurements show that there is no impact on performance, so this is not a big deal.
Moreover, the meaning of "being part of the result" is not so clear.
In trunk, we only generalize nodes which are still relevant to type inference. But this means that we break level-decreasing invariants on the types inside the typed tree, so that we have to take fresh copies when we use them in the middle-end and back-end (hence the comment about using Ctype.correct_levels). With automatic generalize, it should be possible to dispense with that, as we always generalize all the nodes that can be. (Being sure that the invariant is always enforced may require a more thorough review.)

@garrigue garrigue force-pushed the automatic_generalize branch from ff85d78 to a360fa1 Compare February 19, 2024 03:58
@garrigue
Copy link
Copy Markdown
Contributor Author

garrigue commented Feb 20, 2024

Performance measurements on ocamldoc, using ocamlc.opt and ocamlopt.opt.
There seems to be a 2% slowdown with ocamlc.opt, but with ocamlopt.opt it is not even clear which is faster.
We also did some measurements on versions that generalize a bit less for reference (correspond to log messages in git).

ocamlc.opt
trunk
1.979u 0.913s 0:03.02 95.3%	0+0k 0+0io 18159pf+0w
1.989u 0.903s 0:03.00 96.0%	0+0k 0+0io 17930pf+0w
1.984u 0.905s 0:03.01 95.6%	0+0k 0+0io 17930pf+0w

automatic_generalize
2.011u 0.938s 0:03.06 96.0%	0+0k 0+0io 17691pf+0w
2.021u 0.921s 0:03.05 96.3%	0+0k 0+0io 17691pf+0w
2.023u 0.932s 0:03.07 96.0%	0+0k 0+0io 17691pf+0w

register pools for all levels
2.011u 0.925s 0:03.05 96.0%	0+0k 0+0io 17839pf+0w
2.010u 0.932s 0:03.05 96.3%	0+0k 0+0io 17839pf+0w
2.013u 0.933s 0:03.06 96.0%	0+0k 0+0io 17839pf+0w

automatic generalization in typedecl
2.002u 0.917s 0:03.03 96.0%	0+0k 0+0io 17928pf+0w
1.999u 0.911s 0:03.03 95.7%	0+0k 0+0io 17928pf+0w
2.000u 0.911s 0:03.02 96.3%	0+0k 0+0io 17928pf+0w

ocamlopt.opt
trunk
4.510u 1.438s 0:06.52 91.1%	0+0k 0+0io 30239pf+0w
4.507u 1.414s 0:06.46 91.4%	0+0k 0+0io 30007pf+0w
4.510u 1.405s 0:06.46 91.4%	0+0k 0+0io 30007pf+0w
4.506u 1.413s 0:06.45 91.6%	0+0k 0+0io 30007pf+0w

automatic_generalize
4.495u 1.381s 0:06.39 91.8%	0+0k 0+0io 29747pf+0w
4.512u 1.415s 0:06.47 91.4%	0+0k 0+0io 29747pf+0w
4.518u 1.412s 0:06.47 91.4%	0+0k 0+0io 29747pf+0w
4.491u 1.413s 0:06.48 91.0%	0+0k 0+0io 29747pf+0w

Note that to test with ocamlop.opt one has to add the following line to Makefile:

ocamldoc/%: CAMLOPT = $(BEST_OCAMLOPT) $(STDLIBFLAGS)

@garrigue garrigue marked this pull request as ready for review February 20, 2024 03:07
@garrigue
Copy link
Copy Markdown
Contributor Author

The PR is now ready for review.
As the measurements show ,there seems to be a small impact of performance, but we believe the simplicity of use and the improved invariant are worth it. There seems to be no complexity problem, as nodes that do not escape a scope are going to be only generalized once, and at low cost.

@garrigue garrigue force-pushed the automatic_generalize branch from edd8f0b to ff52647 Compare February 29, 2024 01:49
@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Mar 4, 2024

I have a couple of concerns about this PR.

The first is around safety: generlizing type nodes is not the conservative option. When leaving a scope for an expansive expression, this PR means that a failure to call lower_contravariant on a type will cause that type to be unsoundly generalized. I think that for these scopes it would be better for only types that had lower_contravariant called on them to get generalized and for any other types in the scope to just have their level lowered.

The second is around performance. The benchmarks here are pretty small and already show some cost. I'd prefer to see something a little bigger and more varied tested if possible. Have you tried profiling to see if any of the new code is particularly hot? Have you tried using a hashtable instead of a map in a ref? Or even just a dynamic array?

@garrigue
Copy link
Copy Markdown
Contributor Author

garrigue commented Mar 5, 2024

@lpw25
Concerning the first point (generalization of expansive expressions), the relaxed value restriction says that it is safe to generalize variables that have no negative occurrence in the type of the expression. Variables that do not appear in that type have no negative occurrence, so it is clearly safe to generalize them. Your concern is about forgetting to call lower_contravariant. My only answer is that it is easy not to forget it, since it only has to be called on the type of the expression. And there is clearly no difference here with what was done before. (What you suggest is possible, by generalizing variables manually and calling with_local_level_generalize_structure, but the benefit seems limited.)

For performance, we could do more benchmarks, but I'm no specialist. ocamldoc is already middle-sized, and has the particularity of exercising objects. I tried below with the compilation of ocamlopt (using ocamlc.opt) for a more standard style of code, with a slowdown of about 0.6%. It could be nice to have an example with a lot of functors too, but I would not expect major differences.

# trunk
time make ocamlopt CAMLC=../trunk/ocamlc.opt > /dev/null
9.626u 3.378s 0:13.23 98.1%	0+0k 0+0io 97125pf+0w
9.647u 3.404s 0:13.27 98.2%	0+0k 0+0io 96894pf+0w
9.645u 3.371s 0:13.23 98.3%	0+0k 0+0io 96894pf+0w
# automatic generalize
time make ocamlopt CAMLC=./ocamlc.ag.opt > /dev/null
9.687u 3.392s 0:13.32 98.1%	0+0k 0+0io 96705pf+0w
9.714u 3.391s 0:13.32 98.3%	0+0k 0+0io 96705pf+0w
9.712u 3.394s 0:13.34 98.2%	0+0k 0+0io 96705pf+0w

I did not try profiling. It is clear that the hot spots are registering fresh nodes, and traversing them for generalization. The cost of handling generic nodes in update_level is less clear; I would expect inlining to get rid of the cost of the set_level local function.

It might be worth trying with a hash table, however the current code exploits the persistence of maps, so this requires some changes in the code to undo changes.

@garrigue
Copy link
Copy Markdown
Contributor Author

garrigue commented Mar 5, 2024

I did try using a hash table (branch automatic_generalize_hashtbl).

The results are not bad:

9.612u 3.409s 0:13.26 98.1%	0+0k 0+0io 96191pf+0w
9.688u 3.525s 0:13.44 98.2%	0+0k 0+0io 96191pf+0w
9.644u 3.429s 0:13.27 98.4%	0+0k 0+0io 96191pf+0w

I.e., this is very close to trunk. Could switch to this if the difference matters.

I also tried a hash table specialized to integers, but there is not improvement in performance.

@goldfirere
Copy link
Copy Markdown
Contributor

I will plan to review this next week.

@garrigue garrigue force-pushed the automatic_generalize branch from e5e3156 to 4bc04ab Compare April 18, 2024 13:37
@garrigue
Copy link
Copy Markdown
Contributor Author

Removed Ctype.correct_levels. This didn't take long, and the test suite goes through. You might still want to check on a bigger code base.

@garrigue
Copy link
Copy Markdown
Contributor Author

I also checked that remove Ctype_correct_levels from Typeop.scrape_ty indeed fails in trunk (otherwise, the test suite is dubious...)
There were surprisingly few failures, but the following tests indeed fail (in trunk):

List of failed tests:
    tests/effects/issue479.ml
    tests/typing-misc/constraints.ml
    tests/typing-objects/Exemples.ml
    tests/typing-poly/poly.ml

Those are the ones exercising the type checker.
There were no errors in typing-gadts.
By the way, automatic_generalize fails on tests/lib-threads/torture.ml, but IIRC this one is stochastic.

Copy link
Copy Markdown
Contributor

@goldfirere goldfirere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through this at last. I think I would want to understand this better before Approving. Ideally comments would lead the way, but it may also be helpful to schedule a call to review this together.

@goldfirere
Copy link
Copy Markdown
Contributor

I've reviewed @garrigue's two latest commits about updating btype and Pexp_apply.

@garrigue
Copy link
Copy Markdown
Contributor Author

@goldfirere Thank for your view review. I am still updating the code, as you helped discover a number of possible improvements, and hope to finish by next week. Then we can schedule a call.

Copy link
Copy Markdown
Contributor Author

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldfirere I think I answered all your questions.
I have pushed the updated code.
If you have questions we can have a discussion.
As usual 9pm in Japan would be OK. Though we have already a meeting programmed on the 20th.

@goldfirere
Copy link
Copy Markdown
Contributor

Just one last little comment request, and then we can merge! :)

@garrigue
Copy link
Copy Markdown
Contributor Author

Thank you for your approval. However, rebasing causes an error in the testsuite, and we are still inverstigating the cause. (There is an easy workaround, but I would really like to understand what is happening.)

t6s and others added 6 commits June 20, 2024 15:30
use with_local_level_generalize in typecore

allow lowering generic_level nodes

do more automatic generalization

fix add_to_pool

clear simple_abbrevs in with_local_level_generalize

- rename [?post] to [?before_generalize]
- use [with_local_level_generalize*] for easy cases in typecore.ml

use with_local_level_generalize_* for easy parts of typeclass.ml

fix copy_spine

use before_generalize

use automatic generalize for univars in Typecore.type_label_exp

make almost all generalization automatic in typecore

missing generalization in type_let

no manual generalize in typecore

Typeclass.type_classes

automatic generalization in typedecl

remove remaining uses of generalize

add Changes
@garrigue garrigue force-pushed the automatic_generalize branch from ff8cfe9 to b7aba09 Compare June 20, 2024 06:39
@garrigue
Copy link
Copy Markdown
Contributor Author

garrigue commented Jun 20, 2024

@goldfirere Ironically, it seems that the cause of the failure is #13088, your PR about making list_labels pure, that I reviewed myself.
What is happening is not completely clear, but this shows that it is very difficult to produce a correct map from levels to pools if we work level by level.
So we tried another approach, using rather a stack of pools, associated to the level at which they will be generalized. Then one just adds nodes to the topmost pool whose level is sufficiently low. Search in this structure is linear, but since insertion is amortized, this ends up being more efficient than lookup in a map. And the correctness of lookup in this structure is trivial.

It would be nice if you could review the last commit, which is the only new one. It also uses with_level in a few places, to avoid direct assignments to current_level.

We again did benchmark compiling ocamldoc and ocamlopt using ocamlc.opt, and divine surprise, we are now faster than trunk for ocamlopt.

ocamlopt
trunk
time make ocamlopt CAMLC=../trunk/ocamlc.opt > /dev/null
9.924u 3.684s 0:13.86 98.1%	0+0k 0+0io 99022pf+0w
10.001u 3.868s 0:14.19 97.6%	0+0k 0+0io 98814pf+0w
9.943u 3.717s 0:13.87 98.4%	0+0k 0+0io 98582pf+0w

with pool stack
time make ocamlopt CAMLC=./ocamlc.agps.opt > /dev/null
9.856u 3.728s 0:13.80 98.3%	0+0k 0+0io 98786pf+0w
9.882u 3.688s 0:13.77 98.4%	0+0k 0+0io 98553pf+0w
9.946u 3.908s 0:14.16 97.7%	0+0k 0+0io 98553pf+0w

ocamldoc
trunk
1.732u 0.715s 0:02.50 97.6%	0+0k 0+0io 17914pf+0w
1.739u 0.717s 0:02.51 97.2%	0+0k 0+0io 17913pf+0w
1.742u 0.716s 0:02.51 97.6%	0+0k 0+0io 17913pf+0w

with pool stack (ocamlc.agps.opt)
1.738u 0.719s 0:02.51 97.2%	0+0k 0+0io 17913pf+0w
1.744u 0.720s 0:02.52 97.6%	0+0k 0+0io 17913pf+0w
1.753u 0.724s 0:02.55 96.8%	0+0k 0+0io 17913pf+0w

let rec dummy = {level = max_int; pool = []; next = dummy}
let pool_stack = s_table (fun () -> {level = 0; pool = []; next = dummy}) ()

let rec pool_of_level level pool =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to highlight that this algorithm is linear, but that you believe amortization still suggests this will be efficient. I further think this might be more efficient than it appears because almost all the time, the topmost pool is the right one (which was also found quickly in the old code)... and higher pools are more likely the answer every step of the way.

If we wanted to keep this structure but found the linear-time access to be troublesome, we could easily switch to using binary search. This would require storing the pools in an array, but preallocating, say, 100 pools is probably fine. I don't think this change is warranted, but it might be nice to leave a comment saying that we could switch to a more efficient implementation if called for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants