Skip to content

Merge signature and structure (parsing-wise)#312

Closed
Drup wants to merge 13 commits intoocaml:trunkfrom
Drup:merge_sig_str
Closed

Merge signature and structure (parsing-wise)#312
Drup wants to merge 13 commits intoocaml:trunkfrom
Drup:merge_sig_str

Conversation

@Drup
Copy link
Copy Markdown
Contributor

@Drup Drup commented Nov 26, 2015

Beginning of the discussion and original motivation are in mantis ticket 6681.

The main original issue was that it is not possible to write the following code inside a .mli:

[%%foo 
  val x : int
  module M : S 
]

The payload of an extension must be a structure. The recent introduction of val into structure doesn't solve this, I want to be able to have any signature items in my payload.

After discussion, @alainfrisch proposed (and implemented) to merge structures and signatures in the parser, so that the payload is allowed more freedom. Wrongly formated parsetrees are rejected with a new error: "This kind of item is not valid in structures" (resp. "signatures") (Improvements on the wording are very welcome). This also means that the syntax error messages are oh-so-slightly better. :)

This leads to a conflict in the grammar, which is solved by being more strict in what the #use directive parses: a ";;" must always be introduced between a structure and a directive. See the test case for an example. This was already the case for a significant part of the structure items (in particular, the let). I lack data to see how much breakage this would cause. If it's considered too much, we may be able to refine that restriction.

The new form of the parsetree doesn't break ppxs as much as you could think (at least ppx_deriving, lwt, and jsoo are left unharmed, I didn't tried janestreet's ppxs since there are independent breakages due to 4.03, but I doubt it will be terrible).

test passes, it bootstraps and I tried all the large camlp4-independent codebases I could find. Generated documentation also seems correct on lwt. I advise reading commit by commit.

alainfrisch and others added 13 commits November 25, 2015 17:54
… and structures. The type-checker rejects rebinding in signatures.

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15625 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
…es conflicts in the grammar, I don't understamd why at this point.)

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15628 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
In particular, the following code is rejected.

   let module M : T in ...
With this patch, ";;" is always needed between a structure and a
directive (and nowhere else). ";;" can still be added at will any
number of times between structure items and between directives.
@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

(The CI fails is camlp4, for obvious reasons)

@garrigue
Copy link
Copy Markdown
Contributor

I'm a bit concerned by the impact on exhaustiveness when doing typechecking or dependency analysis.
Also, you will get a conflict when I merge ocamldep-modalias.

@alainfrisch
Copy link
Copy Markdown
Contributor

It should be noted that this proposal would enable more code sharing (Typedtree, untypeast, and even possibly the code to type-check signatures and structures). It would also quite easily enable to represent module aliases in signatures more directly and avoid Pmty_alias.

I'm a bit concerned by the impact on exhaustiveness when doing typechecking or dependency analysis.

What do you mean? The code to typecheck signatures, for instance, lists explicitly which kinds of items are not supported in this context.

@garrigue
Copy link
Copy Markdown
Contributor

It should be noted that this proposal would enable more code sharing
(Typedtree, untypeast, and even possibly the code to type-check signatures
and structures). It would also quite easily enable to represent module
aliases in signatures more directly and avoid Pmty_alias.

Right... and wrong. You end up moving syntactic checks to the type checking
phase. Since with the ppx we cannot assume anything about the syntax tree
other than its type, this creates a mix of concerns. There are some
advantages too, such as better error messages.

Also, code sharing means that you must read the code twice, in different
modes, and this after every change. Avoiding that would require a redesign
of the meta-theory, with the sharing at the theoretical level (which could
be a good thing).

I'm a bit concerned by the impact on exhaustiveness when doing
typechecking or dependency analysis.

What do you mean? The code to typecheck signatures, for instance, lists
explicitly which kinds of items are not supported in this context.

Exactly this: you cannot just say, let us handle all the cases in the type,
you must think about what is legit where.

@alainfrisch
Copy link
Copy Markdown
Contributor

Re Mpty_alias: Removing Pmty_alias prevents ppx from generating parsetrees that does not have a concrete syntax, or could even confuse the type-checker (does it currently ensure that Pmty_alias only occur in admissible contexts?). Also, it simplifies the explanation of the Parsetree. All this helps everyone working on the Parsetree (fewer different constructions and invariants to maintain).

As for sharing the actual type-checking code and Typedtree, I agree this probably needs some more thinking. Contrary to Pmty_alias which could be done soon and easily, this is of course a much bigger decision.

Exactly this: you cannot just say, let us handle all the cases in the type, you must think about what is legit where.

Indeed, but this needs to be done only once in the type-checker, and this is not particularly difficult. I don't see how this could possibly go wrong. My point is that with ppx and other tools, there are many "clients" of the Parsetree representation in addition to the typechecker (and ocamldep), and so it becomes worth putting more efforts in the type-checker to simplify the definition and implicit invariants of the Parsetree representation.

@mshinwell
Copy link
Copy Markdown
Contributor

I would argue this is too late for 4.03.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

Exactly this: you cannot just say, let us handle all the cases in the type, you must think about what is legit where.

I also fail to see the issue. As you can see in the patch, you just need to reject the irrelevant cases in typemod/typedecl, early in typechecking, and that is all. Would you prefer a pre-filter pass before calling the typechecker, in order to reject invalid forms ?

I would argue this is too late for 4.03.

I think the risk is not that high (because we have opam to check) and I was hoping we would have a clean complete break for ppxs on 4.03, but yes, it's a bit late.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 26, 2015

Exactly this: you cannot just say, let us handle all the cases in the type, you must think about what is legit where.

I also fail to see the issue. As you can see in the patch, you just need to reject the irrelevant cases in typemod/typedecl, early in typechecking, and that is all. Would you prefer a pre-filter pass before calling the typechecker, in order to reject invalid forms ?

A prefilter pass doesn't mean you don't have to think about what is legit or not, it just means that what is not won't appear there (because it was reject before), but you still have to explicitly "handle" such cases (by repeating "they cannot appear here", aka assert false).

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

Please don't merge the structure and signature type entirely. If it make sense to have any signature item in a structure, then we should have something like this:

type structure_item_desc = Pstr_eval of ... | ... | Psig_item of signature_item_desc

and structure items that are exactly the same as signature items (such as Pstr_type) should be dropped in favor of signature items ones.
This way we can at least have signature be proper signature just by definition. This will have the same advantages for error messages and will make the code clearer as it's explicit what is not a valid signature item.

Having the parsetree being close to the real language is extremely valuable. The Camlp4 AST was really loose, and while this indeed increased sharing in a few places, it was often a real nightmare for the code of Camlp4 syntax extensions as you always had to think in terms of what the real OCaml construct was and not in terms of what Camlp4 was giving you. In the end people often ended-up parsing the Camlp4 AST into something closer to the OCaml language...

Switching our Camlp4 codebase to ppx was a breeze for this particular reason: we were able to remove a lot of code that was here just to make sense of the Camlp4 AST. It'd be a shame if we had to go back to this when we need to process a signature.

@alainfrisch
Copy link
Copy Markdown
Contributor

@diml The proposal is not just to use a single type to represent both signature and structure items, but also to be more permissive and allow most signature (resp. structure) items in structures (resp. signature) at the syntactic level. This cannot be fully achieved, in particular because of include statements that share some concrete syntax but with different payload; otherwise, we are quite close.

Are you against this more permissive parsing (whose original motivation is to allow more forms of payloads in attributes/extension nodes), or just against using a single type to represent both structure and signature items even if they are not parsed strictly in the same way?

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

@alainfrisch, I'm only against having type signature = structure in the AST that is exposed to ppx rewriters. The new parsing is fine I think. AFAIU all we need for the original request is to have signature included in structure, which seems OKish as we can probably make sense of it (like with forward declarations). However the opposite inclusion will cause more bad than good. I think the types used for the parsetree should reflect this one-way inclusion.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

So you would like to restrict the proposal so that signature ⊂ structure but not the other way around. I see. It indeeds still solve my original issue, but I don't see the point. structure items in signatures are quite clearly identified. My experience with ppx is that you identify some construct you are interested in and just copy the rest without caring. What are other people's opinion on that ?

[..] which seems OKish as we can probably make sense of it.

I just want to point out that this proposal is not to "make sense of it" at all. It only allows to parse and have a representation in the parsetree (so that ppx can manipulate it). As far as the compiler goes, those construction don't have a semantic (and are hence rejected).

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

It has several advantages:

  • a value of type signature is a real signature, not something you need to filter manually
  • things like val x : int are represented by Psig (Psig_value _) instead of Pstr_primitive, which is much cleaner

There are many changes you could do to the parsetree that would be useful to ppx rewriters. The less structured the AST is and the more freedom you get; that's one thing we had with the Camlp4 AST and it was horrible every time you needed to make sense of it.

Nobody requested being able to write expressions in signature, so let's keep them clean.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

Nobody requested being able to write expressions in signature, so let's keep them clean.

Why not. I don't care either way, if other people agree, I will make that change.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 26, 2015

Just dropping in to say that:

  • I agree with basically everything @diml said
  • I would really like to see the mapping val x : int ==> Pstr_primitive disappear before the release. Either by using @diml's solution or by introducing a dedicated constructor (although that seems a bit silly if we merge this PR for 4.04 and remove the constructor then)

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

I would really like to see the mapping val x : int ==> Pstr_primitive disappear before the release.

This would deserve another discussion, since it was not introduced by this patch. Fwiw, I agree.

@alainfrisch
Copy link
Copy Markdown
Contributor

If we go the Pstr_sig (Psig_...) way, much more ppx code will be broken. One could argue that the fix is not very difficult, and that most ppx will be broken anyway for other reasons.

We would still have the rather unpleasant property that Psig_include will be an "impossible" case in a structure in the Parsetree (unless generated by a pp/ppx).

The cleanest approach would be to introduce a notion or sig_or_str_item (syntactically), and have one Pstr_ and on Psig_ constructor pointing to it. This sig_or_str_item would include all signature items except Psig_include I think.

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

That or we come up with a syntax for including a module type in a structure. Actually without a syntax for it how do you write an include in a payload that is supposed to contain a signature?

@alainfrisch
Copy link
Copy Markdown
Contributor

Using a new syntax is what was originally suggested on Mantis, but it seems unfortunate to add more syntax just for ppx, and it's likely that authors of ppx will end up being forced to support both Pstr_type and Pstr_sig (Psig_type).

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

That or we come up with a syntax for including a module type in a structure. Actually without a syntax for it how do you write an include in a payload that is supposed to contain a signature?

I was planning to solve that. A similar (simpler) problem arise with "let module ..." and it's solvable. I wanted feedback before dedicating time to it.

@alainfrisch
Copy link
Copy Markdown
Contributor

I also want to insist that the goal is note to remove structure as Camlp4 did (e.g. allowing multiple "equations" on types); but to be more liberal in the parser to support more kinds of attribute/extension payload, to provide better error message, to reduce code duplication in the parser/printast/untypeast/..., and to ensure that identical syntax is parsed in the same way in signatures and structures (for example, "module X = Y") so as to simplify the understanding of the parsetree.

@alainfrisch
Copy link
Copy Markdown
Contributor

Actually without a syntax for it how do you write an include in a payload that is supposed to contain a signature?

I would actually argue to use a different syntax for include in structure and signatures (for instance, include type S or include sig S in signatures, and keep include in structures), precisely to make the parsing less "contextual". We would of course keep include S in signatures for backward compatibility, but this would at least give an explicit syntax for structures for the only remaining form of sig item that cannot be parsed in structures.

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

it's likely that authors of ppx will end up being forced to support both Pstr_type and Pstr_sig (Psig_type)

If we have Pstr_sig (Psig_type _) we don't need to have Pstr_type

I also want to insist that the goal is note to remove structure as Camlp4 did [...]

Well if you merge everything you do end up with an AST where lot of things don't make sense. You can achieve the parser improvements without changing the Parsetree.

I would actually argue to use a different syntax for include in structure and signatures

I agree with that

I'm also surprised that nobody suggested just adding a new character for signature payloads, this is so much simpler: [%foo$ <sig>]

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

I'm also surprised that nobody suggested just adding a new character for signature payloads, this is so much simpler: [%foo$ ]

In the mantis ticket, that was actually my first idea.

@alainfrisch
Copy link
Copy Markdown
Contributor

If we have Pstr_sig (Psig_type _) we don't need to have Pstr_type

I thought you were proposing to add some syntax to inject arbitrary signature items into structures, syntactically. Something like: "sig ...". Then you could write type t = int and sig type t = int would be two valid structures (syntactically). Would you suggest to represent them in the same way in the Parsetree (as Pstr_type)? If not (to avoid duplication), then the argument Pstr_sig cannot be arbitrary an signature item (otherwise there is no syntax for Pstr_sig (Psig_type)), which means we need to introduce another type.

Well if you merge everything you do end up with an AST where lot of things don't make sense.

Not a lot. Only Pstr_eval, Pstr_include and Psig_include, I think. All other AST corresponds to concrete syntax.

[%foo$ ]

If this is a new form of payload, users of ppx will likely want to be able to write [%foo type t = int] in addition to [%foo$ type t = int], which would put pressure on ppx authors to support both forms, which complexify their code.

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

I thought you were proposing to add some syntax to inject arbitrary signature items into structures

Ah, no, I was just proposing to do the same as you do in this patch, but with (approximately) these types:

type signature_item_desc = <same as trunk>

type structure_item_desc =
  | Pstr_eval of expression * attributes
        (* E *)
  | Pstr_value of rec_flag * value_binding list
        (* let P1 = E1 and ... and Pn = EN       (flag = Nonrecursive)
           let rec P1 = E1 and ... and Pn = EN   (flag = Recursive)
         *)
  | Pstr_module of module_binding
        (* module X = ME *)
  | Pstr_recmodule of module_binding list
        (* module rec X1 = ME1 and ... and Xn = MEn *)
  | Pstr_modtype of module_type_declaration
        (* module type S = MT *)
  | Pstr_include of include_declaration
        (* include ME *)
  | Pstr_sig of signature_item_desc

and in the parser you would do:

signature_item:
    structure_item { match $1 with Pstr_sig x -> x | _ -> fail ... }

Whenever something is common to structures and signatures, you would use the constructor from signature_item_desc.

If this is a new form of payload, users of ppx will likely want to be able to write [%foo type t = int] in addition to [%foo$ type t = int], which would put pressure on ppx authors to support both forms, which complexify their code.

I'm not sure. For instance with ppx_sexp_conv we can do [%sexp_of: int] but nobody asked for [%sexp_of int]. As long as you get a clear error message from the rewriter it's fine.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

@diml Your proposal would break a lot more ppxs.

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2015

Well, 4.03 already break a lot of ppxs so I wouldn't be to worried about that. If you use the compiler internals you have to be ready to follow its evolution.

At least it my proposal has the clear advantage of having stricter types, which will be valuable in the long run.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 26, 2015

@diml The types are not much stricter. In the current solution, the signature type contains 3 constructors that are not signature items. It's very very far from the "flexibility" of camlp4's ast.

Or maybe we can just add a new payload syntax....

@alainfrisch
Copy link
Copy Markdown
Contributor

It's not just about breaking ppx, it's also that the required fix only adds noise to them: {pstr_desc=Pstr_sig(Psig_type ...)}. At this point, it's just better to duplicate the constructors, I'd say (one loose some code sharing, but this is rather minor).

But I still don't see the practical drawbacks of the full merge between signature and structure items in the Parsetree. In which sense would this complicate the writing of ppx or make their code more fragile?

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 27, 2015

it's also that the required fix only adds noise to them

This shouldn't be too bad for people using ppx_metaquot :)

At this point, it's just better to duplicate the constructors

meh.

@alainfrisch
Copy link
Copy Markdown
Contributor

This shouldn't be too bad for people using ppx_metaquot :)

I don't see how it can be used to do anything useful with, say, type declarations (where you typically want to extract the list of type declarations in a variable). A concrete-syntax pattern would work fine for extraction only a set of type declarations with known names, arities (and choice of concrete name for variables) and definitions.

Anyway.

I found [%foo sig .. end] very verbose, but I'm ok with [%%foo$ ...]. I'm not a big fan of adding more cryptic characters (at the risk of having source code looking more and more like comic-style insults), but this might be the easiest move.

Would anyone be opposed to that?

@mshinwell
Copy link
Copy Markdown
Contributor

[%foo sig .. end] seems far clearer to me than [%%foo$ ...].

@alainfrisch
Copy link
Copy Markdown
Contributor

[%foo sig .. end] seems far clearer to me than [%%foo$ ...]

But it's not coherent with [%foo: ...] or [%foo? ...].

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 30, 2015

I still like [%foo sig <sig> end] better¹, but maybe [%foo:: <sig>], by analogy with [%foo: <type>]? The problem with $ is that it doesn't mean anything signature-related -- while there is a clear intuition for : on types, and arguably for ? on patterns.

¹: I don't think "not coherent with other short syntaxes" is necessarily an issue. There can only be a very finite number of short syntaxes, so if we want to add more it is inevitable that others would be more verbose -- verbose is good when it means "explicit" and "understandable".

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Nov 30, 2015

I fully agree with @gasche.

Also, ultimately, the terseness of the syntax is useless, because I also want to be able to write type%foo ... or val%foo .. and in this case, you don't need the discriminating piece of syntax.

I will do a PR with this alternative proposition.

@Drup
Copy link
Copy Markdown
Contributor Author

Drup commented Dec 9, 2015

@alainfrisch @gasche Should we close this one ? I don't have much interest in the feature. I think it may be worthwhile to revisit when working on simplifying the parser.

@alainfrisch
Copy link
Copy Markdown
Contributor

This proposal would still be useful to (i) share some more code in the compiler and (ii) provide better error messages. But this probably not enough to justify the change. Let's close it.

@alainfrisch alainfrisch closed this Dec 9, 2015
@Drup Drup deleted the merge_sig_str branch October 19, 2017 14:38
@Drup Drup restored the merge_sig_str branch October 19, 2017 14:38
@Drup Drup deleted the merge_sig_str branch April 6, 2018 19:07
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Feb 1, 2022
* Cfg instruction that can raise terminates a block.

* Check that instructions in the middle of the block cannot raise

Co-authored-by: Xavier Clerc <xclerc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants