Merge signature and structure (parsing-wise)#312
Conversation
git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15623 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15624 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
… and structures. The type-checker rejects rebinding in signatures. git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15625 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15627 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
…es conflicts in the grammar, I don't understamd why at this point.) git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15628 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/branches/merge_sig_str@15629 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
In particular, the following code is rejected. let module M : T in ...
With this patch, ";;" is always needed between a structure and a directive (and nowhere else). ";;" can still be added at will any number of times between structure items and between directives.
|
(The CI fails is camlp4, for obvious reasons) |
|
I'm a bit concerned by the impact on exhaustiveness when doing typechecking or dependency analysis. |
|
It should be noted that this proposal would enable more code sharing (Typedtree, untypeast, and even possibly the code to type-check signatures and structures). It would also quite easily enable to represent module aliases in signatures more directly and avoid Pmty_alias.
What do you mean? The code to typecheck signatures, for instance, lists explicitly which kinds of items are not supported in this context. |
Right... and wrong. You end up moving syntactic checks to the type checking Also, code sharing means that you must read the code twice, in different
Exactly this: you cannot just say, let us handle all the cases in the type, |
|
Re Mpty_alias: Removing Pmty_alias prevents ppx from generating parsetrees that does not have a concrete syntax, or could even confuse the type-checker (does it currently ensure that Pmty_alias only occur in admissible contexts?). Also, it simplifies the explanation of the Parsetree. All this helps everyone working on the Parsetree (fewer different constructions and invariants to maintain). As for sharing the actual type-checking code and Typedtree, I agree this probably needs some more thinking. Contrary to Pmty_alias which could be done soon and easily, this is of course a much bigger decision.
Indeed, but this needs to be done only once in the type-checker, and this is not particularly difficult. I don't see how this could possibly go wrong. My point is that with ppx and other tools, there are many "clients" of the Parsetree representation in addition to the typechecker (and ocamldep), and so it becomes worth putting more efforts in the type-checker to simplify the definition and implicit invariants of the Parsetree representation. |
|
I would argue this is too late for 4.03. |
I also fail to see the issue. As you can see in the patch, you just need to reject the irrelevant cases in typemod/typedecl, early in typechecking, and that is all. Would you prefer a pre-filter pass before calling the typechecker, in order to reject invalid forms ?
I think the risk is not that high (because we have opam to check) and I was hoping we would have a clean complete break for ppxs on 4.03, but yes, it's a bit late. |
A prefilter pass doesn't mean you don't have to think about what is legit or not, it just means that what is not won't appear there (because it was reject before), but you still have to explicitly "handle" such cases (by repeating "they cannot appear here", aka |
|
Please don't merge the type structure_item_desc = Pstr_eval of ... | ... | Psig_item of signature_item_descand structure items that are exactly the same as signature items (such as Having the parsetree being close to the real language is extremely valuable. The Camlp4 AST was really loose, and while this indeed increased sharing in a few places, it was often a real nightmare for the code of Camlp4 syntax extensions as you always had to think in terms of what the real OCaml construct was and not in terms of what Camlp4 was giving you. In the end people often ended-up parsing the Camlp4 AST into something closer to the OCaml language... Switching our Camlp4 codebase to ppx was a breeze for this particular reason: we were able to remove a lot of code that was here just to make sense of the Camlp4 AST. It'd be a shame if we had to go back to this when we need to process a signature. |
|
@diml The proposal is not just to use a single type to represent both signature and structure items, but also to be more permissive and allow most signature (resp. structure) items in structures (resp. signature) at the syntactic level. This cannot be fully achieved, in particular because of include statements that share some concrete syntax but with different payload; otherwise, we are quite close. Are you against this more permissive parsing (whose original motivation is to allow more forms of payloads in attributes/extension nodes), or just against using a single type to represent both structure and signature items even if they are not parsed strictly in the same way? |
|
@alainfrisch, I'm only against having |
|
So you would like to restrict the proposal so that
I just want to point out that this proposal is not to "make sense of it" at all. It only allows to parse and have a representation in the parsetree (so that ppx can manipulate it). As far as the compiler goes, those construction don't have a semantic (and are hence rejected). |
|
It has several advantages:
There are many changes you could do to the parsetree that would be useful to ppx rewriters. The less structured the AST is and the more freedom you get; that's one thing we had with the Camlp4 AST and it was horrible every time you needed to make sense of it. Nobody requested being able to write expressions in signature, so let's keep them clean. |
Why not. I don't care either way, if other people agree, I will make that change. |
|
Just dropping in to say that:
|
This would deserve another discussion, since it was not introduced by this patch. Fwiw, I agree. |
|
If we go the We would still have the rather unpleasant property that Psig_include will be an "impossible" case in a structure in the Parsetree (unless generated by a pp/ppx). The cleanest approach would be to introduce a notion or sig_or_str_item (syntactically), and have one Pstr_ and on Psig_ constructor pointing to it. This sig_or_str_item would include all signature items except Psig_include I think. |
|
That or we come up with a syntax for including a module type in a structure. Actually without a syntax for it how do you write an include in a payload that is supposed to contain a signature? |
|
Using a new syntax is what was originally suggested on Mantis, but it seems unfortunate to add more syntax just for ppx, and it's likely that authors of ppx will end up being forced to support both |
I was planning to solve that. A similar (simpler) problem arise with "let module ..." and it's solvable. I wanted feedback before dedicating time to it. |
|
I also want to insist that the goal is note to remove structure as Camlp4 did (e.g. allowing multiple "equations" on types); but to be more liberal in the parser to support more kinds of attribute/extension payload, to provide better error message, to reduce code duplication in the parser/printast/untypeast/..., and to ensure that identical syntax is parsed in the same way in signatures and structures (for example, "module X = Y") so as to simplify the understanding of the parsetree. |
I would actually argue to use a different syntax for |
If we have
Well if you merge everything you do end up with an AST where lot of things don't make sense. You can achieve the parser improvements without changing the Parsetree.
I agree with that I'm also surprised that nobody suggested just adding a new character for signature payloads, this is so much simpler: |
In the mantis ticket, that was actually my first idea. |
I thought you were proposing to add some syntax to inject arbitrary signature items into structures, syntactically. Something like: "sig ...". Then you could write
Not a lot. Only Pstr_eval, Pstr_include and Psig_include, I think. All other AST corresponds to concrete syntax.
If this is a new form of payload, users of ppx will likely want to be able to write |
Ah, no, I was just proposing to do the same as you do in this patch, but with (approximately) these types: type signature_item_desc = <same as trunk>
type structure_item_desc =
| Pstr_eval of expression * attributes
(* E *)
| Pstr_value of rec_flag * value_binding list
(* let P1 = E1 and ... and Pn = EN (flag = Nonrecursive)
let rec P1 = E1 and ... and Pn = EN (flag = Recursive)
*)
| Pstr_module of module_binding
(* module X = ME *)
| Pstr_recmodule of module_binding list
(* module rec X1 = ME1 and ... and Xn = MEn *)
| Pstr_modtype of module_type_declaration
(* module type S = MT *)
| Pstr_include of include_declaration
(* include ME *)
| Pstr_sig of signature_item_descand in the parser you would do: signature_item:
structure_item { match $1 with Pstr_sig x -> x | _ -> fail ... }Whenever something is common to structures and signatures, you would use the constructor from
I'm not sure. For instance with |
|
@diml Your proposal would break a lot more ppxs. |
|
Well, 4.03 already break a lot of ppxs so I wouldn't be to worried about that. If you use the compiler internals you have to be ready to follow its evolution. At least it my proposal has the clear advantage of having stricter types, which will be valuable in the long run. |
|
@diml The types are not much stricter. In the current solution, the Or maybe we can just add a new payload syntax.... |
|
It's not just about breaking ppx, it's also that the required fix only adds noise to them: {pstr_desc=Pstr_sig(Psig_type ...)}. At this point, it's just better to duplicate the constructors, I'd say (one loose some code sharing, but this is rather minor). But I still don't see the practical drawbacks of the full merge between signature and structure items in the Parsetree. In which sense would this complicate the writing of ppx or make their code more fragile? |
This shouldn't be too bad for people using ppx_metaquot :)
meh. |
I don't see how it can be used to do anything useful with, say, type declarations (where you typically want to extract the list of type declarations in a variable). A concrete-syntax pattern would work fine for extraction only a set of type declarations with known names, arities (and choice of concrete name for variables) and definitions. Anyway. I found Would anyone be opposed to that? |
|
[%foo sig .. end] seems far clearer to me than [%%foo$ ...]. |
But it's not coherent with |
|
I still like ¹: I don't think "not coherent with other short syntaxes" is necessarily an issue. There can only be a very finite number of short syntaxes, so if we want to add more it is inevitable that others would be more verbose -- verbose is good when it means "explicit" and "understandable". |
|
I fully agree with @gasche. Also, ultimately, the terseness of the syntax is useless, because I also want to be able to write I will do a PR with this alternative proposition. |
|
@alainfrisch @gasche Should we close this one ? I don't have much interest in the feature. I think it may be worthwhile to revisit when working on simplifying the parser. |
|
This proposal would still be useful to (i) share some more code in the compiler and (ii) provide better error messages. But this probably not enough to justify the change. Let's close it. |
* Cfg instruction that can raise terminates a block. * Check that instructions in the middle of the block cannot raise Co-authored-by: Xavier Clerc <xclerc@users.noreply.github.com>
Beginning of the discussion and original motivation are in mantis ticket 6681.
The main original issue was that it is not possible to write the following code inside a
.mli:The payload of an extension must be a structure. The recent introduction of val into structure doesn't solve this, I want to be able to have any signature items in my payload.
After discussion, @alainfrisch proposed (and implemented) to merge structures and signatures in the parser, so that the payload is allowed more freedom. Wrongly formated parsetrees are rejected with a new error: "This kind of item is not valid in structures" (resp. "signatures") (Improvements on the wording are very welcome). This also means that the syntax error messages are oh-so-slightly better. :)
This leads to a conflict in the grammar, which is solved by being more strict in what the
#usedirective parses: a ";;" must always be introduced between a structure and a directive. See the test case for an example. This was already the case for a significant part of the structure items (in particular, thelet). I lack data to see how much breakage this would cause. If it's considered too much, we may be able to refine that restriction.The new form of the parsetree doesn't break ppxs as much as you could think (at least ppx_deriving, lwt, and jsoo are left unharmed, I didn't tried janestreet's ppxs since there are independent breakages due to 4.03, but I doubt it will be terrible).
test passes, it bootstraps and I tried all the large camlp4-independent codebases I could find. Generated documentation also seems correct on lwt. I advise reading commit by commit.