Skip to content

Delay type duplication in type_cases (follow up to #1648)#1735

Merged
trefis merged 8 commits intoocaml:trunkfrom
trefis:delay-type-duplication
Apr 25, 2018
Merged

Delay type duplication in type_cases (follow up to #1648)#1735
trefis merged 8 commits intoocaml:trunkfrom
trefis:delay-type-duplication

Conversation

@trefis
Copy link
Copy Markdown
Contributor

@trefis trefis commented Apr 23, 2018

This GPR implements what @garrigue suggested on #1648 (comment), namely:

Note that it also means that you could delay building the copied environment to after typing the patterns (since their typing does not depend on values in the environment), and only copy identifiers used in GADT branches.

Which, if I understood correctly, meant: do a single copy of the type of identifiers that get used in branches containing GADTs, and update all the branch environments to refer to that copy.
Note: to be able to do this, it was also necessary to delay the introduction of pattern variables in the environment; which is an harmless change. More precisely: the values in the environment need to be exactly the same when duplicating the types and when the environment gets updated.

Using the same highly elaborate benchmark as last time (i.e. running "make clean && make world.opt" 10 times), I get a slight performance improvement (< 1%, which isn't surprising considering the slow down was also < 1% last time) on average.

I'm not sure this was worth the time I spent on it, but it gave me an excuse to do some minor cleanups, so I'd be in favor of merging.

@garrigue
Copy link
Copy Markdown
Contributor

This looks much more complicated than I expected, as I was really thinking of doing exactly the same process as before, just after typing the patterns.
But I understand now why: I was forgetting that GADT pattern-matching itself was updating the environment (because it's only about the types), and that type_pattern was adding new values to it (because those are different values).
Your solution seems to be working around these 2 problems, by updating directly the value part of the environment, and delaying the addition of patterns.
Yet, it looks like an overkill.

I can think of a simpler approach (avoiding extending env.ml), by working independently in each branch, just memoizing the copy. I.e., in each branch which added GADT equations, just after typing the pattern (before or after check_scope_escape):

  • remember Ident.current_time when entering type_cases, as old_ident_time
  • extract the list of identifiers used in the branch, and copy them if their id in ext_env is older than old_ident_time, adding the copy to ext_env
  • share the individual copies through memoizing between the branches (to avoid copying twice the type of the same identifier)

I would expect this to be no more costly, and avoid all the extra code for the workarounds.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Apr 24, 2018

I think I disagree with you here:

  • your suggestion also requires updating env.ml: the comparison of the time of the ident in ext_env against old_current_time can only be done from inside Env (and requires changing IdTbl.update)
  • your proposition doesn't seem simpler than the one implemented in this PR. I'd say it's the contrary: the only "difficulty" here was delaying the addition of pattern variables, which poses no difficulty in practice. While yours require a slightly smarter lookup in the environment and implementing a cache.
  • your proposition seems a priori less efficient: we'd have to do lookups in all the ext_env while the current implementation only goes through one environment
  • the extra code here is mostly code move; with your suggestion, there would be some actual new code

Furthermore the modification done to add_pattern_variables (and the changes it implies) seems to me like a good clean up, independently of this PR. And which I intend to do at some point point anyway, as it also enables a change I already pointed out in #1675 ( be0eea5 ): instead of generalizing the type of every subpattern, we only need to generalize the type of pattern variables.

@garrigue
Copy link
Copy Markdown
Contributor

OK, it seems that there were already some changes that I didn't follow: Env.copy_types is already there since 4.06, and I was not aware of that. My suggestion had the previous code in mind, which didn't rely on IdTbl.update, and would have been easier to modify in that direction (just add one line to Env.update_value).
On the other hand, you are clearly adding a fair amount of code, even if this code arguably does nothing (just splits some processing into two, or update Env.t more cleverly).

Could you justify better this so-called "clean-up", which is actually some restructuring of the code?

@garrigue
Copy link
Copy Markdown
Contributor

Sorry, I hadn't read the last paragraph of your answer.

let ty_res, duplicated_ident_types =
if does_contain_gadt && not !Clflags.principal then
correct_levels ty_res, duplicate_ident_types half_typed_cases env
else ty_res, duplicate_ident_types [] env
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of this call to duplicate_ident_types [] ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a well-typed noop: notice that duplicate_ident_types returns an Env.copy_of_types, which is abstract.

typing/env.ml Outdated
{to_copy = l; initial_values = env.values; new_values = values}

let do_copy_types { to_copy = l; initial_values; new_values = values } env =
if initial_values != env.values then invalid_arg "Env.do_copy_types";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard style seems to be to use fatal_error rather than invalid_arg in the compiler.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it.

@garrigue
Copy link
Copy Markdown
Contributor

OK, I have a read the code with your explanations in background, and now I think I understand it fully. This does indeed what you announced: implement what I suggested, working around the problem of environment merging. The only real logical code move is that the additions of variables is now in type_cases rather than in type_pattern, but the support for Unused_var is more natural here (by symmetry with type_let), so this seems reasonable.
get_copy_of_types / do_copy_types looks a bit ad hoc. On the other hand it gives optimal performance, so why not.
Concerning the style: systematically doing pattern matching on records is not optimal in size of code, do you have some reason to prefer it to the dot-notation?

@mshinwell
Copy link
Copy Markdown
Contributor

The advantage of matching on record patterns rather than using record projections is that you get exhaustivity checking...

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Apr 24, 2018

Concerning the style: systematically doing pattern matching on records is not optimal in size of code, do you have some reason to prefer it to the dot-notation?

Do you mean source code size here?
If yes then I'm not sure I agree with you, and I'm also not sure it matters... I don't mind changing the code to use projection instead of destructing if you'd like that better.

@garrigue
Copy link
Copy Markdown
Contributor

It was just out of curiosity, since you use it even for very small functions, where all labels are used only once, and you explicitly disable exhaustiveness checking...
I have no special opinion on this, as long as the code is readable.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Apr 24, 2018

Yes, there might have also been an impulse to reduce the diff :)

Anyway, do you approve of this PR in the end?

Copy link
Copy Markdown
Contributor

@garrigue garrigue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only stylistic comments left 😀

typing/env.mli Outdated
val copy_types: string list -> t -> t
(* Used only in Typecore.duplicate_ident_types. *)
type copy_of_types

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: since those 3 definitions are a set, no need to have empty lines between them.

typing/env.mli Outdated
(* Used only in Typecore.duplicate_ident_types. *)
type copy_of_types

val get_copy_of_types: string list -> t -> copy_of_types
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the name rather be make_copy_of_types ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I'll change that.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Apr 24, 2018

I'll rebase (squashing the style commits) and merge in trunk in a few hours unless new comments appear.

typing/env.mli Outdated
type copy_of_types
val make_copy_of_types: string list -> t -> copy_of_types
val do_copy_types: copy_of_types -> t -> t
(** [do_copy_types copy env] will raise [Invalid_argument] if the values in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version will raise a fatal error rather than an Invalid_argument, isn'it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, updating now.

@trefis trefis force-pushed the delay-type-duplication branch 3 times, most recently from 6c64f6e to 969252b Compare April 24, 2018 14:45

let all_idents_cases half_typed_cases =
let idents = Hashtbl.create 8 in
let f = function
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a longer name than just f (extract?) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Boarf", also this particular code predates this GPR (it was just moved around).

@trefis trefis force-pushed the delay-type-duplication branch from 969252b to 1f3f943 Compare April 24, 2018 15:43
@trefis trefis force-pushed the delay-type-duplication branch from 1f3f943 to ac1ced7 Compare April 25, 2018 09:10
@trefis trefis merged commit 6fe29ae into ocaml:trunk Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants