Skip to content

Remove positions from paths#1610

Merged
lpw25 merged 5 commits intoocaml:trunkfrom
lpw25:no-positions-in-paths
Nov 21, 2018
Merged

Remove positions from paths#1610
lpw25 merged 5 commits intoocaml:trunkfrom
lpw25:no-positions-in-paths

Conversation

@lpw25
Copy link
Copy Markdown
Contributor

@lpw25 lpw25 commented Feb 14, 2018

Currrently, paths contain the integer position of a module field within the runtime representation of the module:

type t =
    Pident of Ident.t
  | Pdot of t * string * int
  | Papply of t * t

Since Path.ts appear within types, module types etc. they must remain valid through the various manipulations of the type system. Unfortunately, the positions do not remain valid under module substitution -- which leads to various soundness bugs:

        OCaml version 4.06.0

# module type S = sig
  module M : sig
    module A : sig end
    module B : sig end
  end
  module N = M.A
end

module Foo = struct
  module B = struct let x = 0 end
  module A = struct let x = "hello" end
end

module Bar : S with module M := Foo = struct module N = Foo.A end

let () = print_string Bar.N.x;;
                              Segmentation fault
        OCaml version 4.06.0

# module M : sig
  module N : sig
    module A : sig val x : string end
    module B : sig val x : int end
  end
  module F (X : sig module A = N.A end) : sig val s : string end
end = struct
  module N = struct
    module B = struct let x = 0 end
    module A = struct let x = "hello" end
  end
  module F (X : sig module A : sig val x : string end end) = struct
    let s = X.A.x
  end
end

module N = M.F(struct module A = M.N.A end)

let () = print_string N.s;;
                                    Segmentation fault

Rather than trying to change module substitution to take account of path indices, which appears quite difficult and will become even more difficult as the module system evolves, this PR removes the positions from paths. Instead we add an address type to the Env module:

type address =
  | Aident of Ident.t
  | Adot of address * int

along with functions for finding the "address" of a given path:

val find_value_address: Location.t -> Path.t -> t -> address
val find_module_address: Location.t -> Path.t -> t -> address
val find_class_address: Location.t -> Path.t -> t -> address
val find_constructor_address: Location.t -> Path.t -> t -> address

These functions are used in bytecomp to translate paths to lambda code. Since we previously had to call normalize_path at that point anyway, this is a pretty simple change to make.

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 14, 2018

If you read PRs through email and you were, like me, confused by the absence of an int argument in the Path.t type given in Leo's message, this was a typo which is now fixed. Before the PR, Path.t is defined as

type t =
    Pident of Ident.t
  | Pdot of t * string * int
  | Papply of t * t

while the PR changes it to just (only Pdot changes, dropping the int argument):

type t =
    Pident of Ident.t
  | Pdot of t * string
  | Papply of t * t

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 14, 2018 via email

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Feb 14, 2018

Yeah, sorry about that. I should remember to make a comment when correcting particularly confusing typos.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 14, 2018 via email

@let-def
Copy link
Copy Markdown
Contributor

let-def commented Feb 14, 2018

.depend seems out of date.

@Drup
Copy link
Copy Markdown
Contributor

Drup commented Feb 14, 2018

Exposing the position in the path itself always seemed particularly dubious to me, so I'm happy to see that go away.

Before I really dive into the code, can you give some indications on what this module_presence that is now sparkled everywhere is for ? Is it still only for module aliases ?

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Feb 14, 2018

To make the addresses a little easier to calculate I moved the presence information for module aliases from the module type (i.e. Mty_alias(Mta_present, path)) to the signature (i.e. (Sig_module(id, Mta_present, Mty_alias(path))). This also enabled me to fully remove absent aliases from the runtime representation -- previously they would get a slot containing () that was never used for anything.

@let-def
Copy link
Copy Markdown
Contributor

let-def commented Feb 14, 2018

From a first review, to me it looks like a clear improvement (I remember doing sometime wrong things with path positions when playing with modular implicits) and the patch seems correct but I don't understand all the parts.

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 14, 2018

Maybe you want to point at the parts you don't understand, and they could benefit from a comment :-)

Copy link
Copy Markdown
Contributor

@Drup Drup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main commit is quite difficult to review (mostly trivial or loosely related changes spread out over 50+ files, and the occasional subtle thing). I would have preferred to have the the purely trivial parts split out from the subtle ones, and then squash before merge. That would have made it much easier to review.

But apart from that, I think this is a clear improvement. The handling of address is much more obvious than before, and this removes some weird invariants from the typedtree. The code is also a tiny bit cleaner.

I mostly looked at the typechecking-related parts, as I'm not an expert in bytecomp/, but it all looks pretty good to me.

| Record_unboxed of bool (* Unboxed single-field record, inlined or not *)
| Record_inlined of int (* Inlined record *)
| Record_extension (* Inlined record under extension *)
| Record_extension of Path.t (* Inlined record under extension *)
Copy link
Copy Markdown
Contributor

@Drup Drup Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with compilation of inlined record. Why is the path now needed here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be: to avoid doing things like this.
That is: unless I missunderstood, it is a cleanup unrelated to the rest of this PR.

Copy link
Copy Markdown
Contributor

@Drup Drup Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that piece of code. That's a very valid reason, although if it is, it really should be in a different commit.

let name = Ident.name param in
let exp =
{ exp with exp_loc = loc; exp_desc =
{ exp with exp_loc = loc; exp_env = env; exp_desc =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On various occasion where a typechecking function synthesized a new desc, you now add it to the environment before returning it, where previously it was just returned. I suppose it is to make sure it has an address, but I'm not really sure why it is ok to add new things to the env that were not added previously.

Also, in that specific case, it's added only to the expression part, not the pattern one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On various occasion where a typechecking function synthesized a new desc, you now add it to the environment before returning it, where previously it was just returned. I suppose it is to make sure it has an address, but I'm not really sure why it is ok to add new things to the env that were not added previously.

Was "typechecking" a typo here?
Also, regardless of why it is now added in the environment (though I think I agree with the reason you propose), I think it's more correct (in this particular instance, i.e. in push_defaults) to add it to the environment than not.

Also, in that specific case, it's added only to the expression part, not the pattern one.

This is similar to what happens in the typechecker, if you look at Typecore.type_cases you'll see that the pattern variables (i.e. the bindings produced by the pattern) only get added to the environment of the body, not of the pattern.

typing/env.ml Outdated
if Ident.global id && not (Ident.same id (Path.head path'))
then add_required_global id;
path'
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When normalizing, you consider modules that are aliases but have an address. I failed to find where you introduce aliases that have an address different than None. Where does that happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a module is present (i.e. it's presence is Mta_present in a module type) then it will be given an address (see store_module and components_of_module_maker). This can happen for aliases than are produced via strengthening (see strengthen_decl which can produce aliases, and strengthen_sig which then attaches these aliases to a presence that can be Mta_present).

match Env.find_class_address Location.none path env with
| addr -> eval_address addr
| exception Not_found ->
fatal_error ("Cannot find address for: " ^ (Path.name path))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe factorize that code a little bit? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Aug 6, 2018

Sorry for the slow response. I've finally found time to look at this again. I've rebased the code and addressed @Drup's comments.

@Drup
Copy link
Copy Markdown
Contributor

Drup commented Aug 6, 2018

My small concerns have been addressed. I think we need a review from someone familiar with bytecomp now.

@damiendoligez
Copy link
Copy Markdown
Member

Since this fixes a soundness bug, I would like it to see it move forward.
@trefis would you do a formal review?

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Oct 26, 2018

Could we hold off for a week or so? I've got an improved version implemented as part of my transparent ascription patch, and I'm planning to extract it into this PR.

@lpw25 lpw25 force-pushed the no-positions-in-paths branch from 5a74793 to cd31809 Compare November 9, 2018 18:22
@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Nov 9, 2018

Ok I've pushed the improved version and rebased it up to trunk. Should now be ready for review.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 13, 2018

I'm having a look.

Copy link
Copy Markdown
Contributor

@trefis trefis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good (and correct) modulo the few comments I left here and there.


let enter_module ~scope ?arg s mty env =
let enter_value ?check name desc env =
let id = Ident.create_local name in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear:

scope (create_scoped ~scope:0 "foo") == 0
scope (create_local "foo")           == highest_scope (== 100_000_000)

So that's an actual change, somewhat unrelated to this PR.
But I believe it is fine: we only check the scope of type constructors, we should never check the scope of value paths.

typing/types.mli Outdated
and alias_presence =
and module_presence =
| Mta_present
| Mta_absent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename these to Mp_present | Mp_absent?

pc1,
(fun pc ->
match pc with
| _, (Tcoerce_primitive _ | Tcoerce_alias _) -> pc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a comment.
Apparently p1 in these cases would be -1, because it's not a runtime component (cf. includemod.ml), so we can't really go have a look in v2.

cont)

in
let aliases = List.fold_right store_alias aliases lambda_unit in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could drop the cont parameter of store_alias, have it return the Lprim and simplify this last line to: make_sequence store_alias aliases.


let partial_function loc () =
let slot =
transl_extension_path Location.none
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use Location.none over loc here?

typing/env.ml Outdated
(fun id ->
is_local (find_same id tbl2) &&
let descr, _ = find_same id tbl2 in
is_local descr &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels wrong: it "works", but it's weird.
Here is the type of the function after this change:

val diff_keys: ('a -> bool) -> 'b t -> ('a * 'c) t -> Ident.t list

Before that it was:

val diff_keys: ('a -> bool) -> 'b t -> 'a t -> Ident.t list

which seems more natural.

I think it's the caller of this function that should be changed, while this function should remain untouched. (And the function appears to be called only once, so it shouldn't be too tedious to update)

let (desc, _) = NameMap.find name components in
(Pdot (root, name), desc) :: find_all name next
with Not_found ->
find_all name next
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for TycompTbl.diff_keys the changes to this function are weird, and inconsistent with the other changes in this module: this function is making the assumption that the 'a in current : Ident.tbl (* in [t] *) and in components: 'a list NameMap.t (* in [opened] *) is a tuple:

val find_all: string -> ('a * 'b) t -> (Path.t * 'a) list

But there's no reason to. And, for example, I believe you wouldn't be able to call find_all on the modtypes field of the environment with this implementation.
It's the callers that should change.

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Nov 16, 2018

Ok, I've addressed the review comments and fixed the failing tests.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 19, 2018

I forgot to say it during my review: but you updated toploop,topdirs,etc. But not the opt versions. And you'll need to :)

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Nov 19, 2018

I've updated ocamlnat and addressed @let-def's comments.

@lpw25 lpw25 force-pushed the no-positions-in-paths branch from 604c4a4 to e47cb6a Compare November 20, 2018 14:38
@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Nov 20, 2018

Rebased. Merging once CI passes.

@lpw25 lpw25 force-pushed the no-positions-in-paths branch from e47cb6a to 68f2b7c Compare November 20, 2018 16:37
@lpw25 lpw25 merged commit 6bdce9e into ocaml:trunk Nov 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants