Attach documentation comments to Parsetree#149
Conversation
|
Note that this patch does not add the Documentation comments in bad positions cause warning 50, which is off by default. |
|
Thanks for preparing this new patch and for explaining the rules. Is it a fair description of this new approach to say that it adds an intermediate layer between the lexer and the parser? This layer is a kind of state machine that monitors new lines and doc comments in the stream of tokens produced by the lexer, removes them from the stream seen by the parser and keeps relevant information "on the side" for later use by parsing actions. Inserting comments in the Parsetree is done more locally than with the previous approach, but otherwise, the idea is similar: doc comments are collected by the lexer but not directly considered as part of the yacc grammar (but inserted in the Parsetree by actions -- for the new version -- or by a postprocessing pass -- in the previous patch).
Can you clarify what it means exactly? For a record field, can the comment appear after the ";"? after the } for the final field? For a constructor, can the comment for a constructor appear after the "|" that follows its declaration? How is the ambiguity resolved for the case of a doc comment after the last constructor in a sum type? Is this case treated by:
How is the following interpreted: Formally, the first doc comments follows immediately the constructor but also the type definition, so according to the rules, it should be attached to both (and thus ignored?). |
It does not add the layer -- the layer was already there for storing comments in cmt files. It extends the layer to store information about the relationship between newlines and comments. Otherwise that is a fair description of what is going on.
It is useful to think of the comment information as being passed to the parser in the same way that locations are given to the parser. Unfortunately, ocamlyacc does not let you change the information which is passed in this way so we have to use a side-channel instead.
Actually, they are quite different. The old patch did not collect comments in the lexer, it collected them by re-lexing the source using a specialized second lexer.
The
The rules can be considered as given in order of precedence. First we try to attach comments to fields and constructors, then we try to attach comments to items, then we try to treat them as floating comments. |
Just to be sure I understand type t = A | B
(** The type for bla *) will attach the comment to type t = A | B (** *)
(** The type for bla *)Which is fine by me, it seems to me that APIs exposing regular variants are not that widspread and the "error" can be corrected on a case by case basis. |
|
(actually IIRC this behaviour the one implemented by ocamldoc). |
Yes. Another option is: |
parsing/comments.ml
Outdated
There was a problem hiding this comment.
Please be a bit more careful with the copyright headers. I know it's annoying to have to pay attention to them, but it's not that hard.
There was a problem hiding this comment.
I forgot about those. I've fixed the headers of comments.ml and comments.mli in a new commit.
The layer existed only to collect comments and store them in the .cmt file; there was no "logic" in it, and it did not impact the Parsetree (correct me if I'm wrong). Now it is a non-trivial component which defines part of the mapping from the source code to the Parsetree. I'm fine with the approach, but let's not hide the fact that it increases significantly the complexity of this mapping and the ways it could be documented (even semi-formally). I still believe the future of the ecosystem would be brighter if we could converge -- after a migration period to a more rigid system -- (different syntaxes, stricter placement rules) that could be recognized by minor additions to the existing yacc grammar (as would be achieved by providing a lighter syntax for But I've to recognize that I seem to be the only one concerned by the extra complexity added by your proposal... And that being said, I find the current version much more acceptable than the previous one. |
I think that is a reasonable position to take, but I think that it would still be very useful to be able to improve the tooling for existing code bases using the existing syntax -- hence the need for this patch. |
Then what about introducing the more rigid system now? We could support both modes in the compiler (and mark the legacy mode as deprecated at some point), and also provide a tool to automate the transition (the tool could recognize doc comments using the same code as the current ocamldoc, and patch the source just enough to remove those comments and re-insert them at the proper place and with the proper syntax, and perhaps adapt whitespaces/newlines with some heuristics). |
I only said that I thought your position was reasonable -- not that I agreed with it. I am not convinced that it is worth the trouble of adding a new syntax just to avoid using comments for documentation. Why not just use Besides, agreeing on a new syntax takes time, and I would very much like to include this patch in 4.02.2 so that codoc can be used as soon as possible. |
parsing/lexer.mll
Outdated
There was a problem hiding this comment.
Could we factorize the very similar logic in the three cases above?
|
Forgetting the debate about the observable behavior for a while, I had a closer look at the implementation. I'm not fond of the way the parser semantic actions are changed by this patch. I find the changes to the lexer extremely reasonable, but the parser change look somewhat invasive and a bit fragile (it's easy to forget to do the right thing when changing the grammar, and thus I would be worried that it could complicate grammar maintenance). I must say that on this specific aspect I liked the approach of #51 better: traverse the constructed AST after-the-fact to implement the node-specific attachment rules. In some sense it is more declarative, while the current implementation is side-effect-based (and the effects are interleaved with the parsing actions). I don't know whether the move from declarative to imperative attachment is related to your change in attachment rules: did the new approach become easier to implement? I see that the new patch also has better ways to warn the user when the comments are not at the right place; are those easier with the more imperative style? |
Firstly, it is important to note that this patch makes only cosmetic changes the parser grammar (I changed some left-recursion to right-recursion to improve the consistency of the code). All important parser changes are to the parser actions. These actions appear imperative, but are in essence declarative. They take the exact same approach as that used to attach locations to the AST. Unfortunately, you can't pass arbitrary information through ocamlyacc as location information so instead it goes via a side-channel (a table mapping locations to the comment information).
The new approach is -- in my opinion -- vastly superior to the old one because it is implemented in terms of the concrete syntax rather than the abstract syntax. This makes it much easier to understand and maintain. |
|
The kind of issues I'm thinking of is for example in this code, you have to remember to use Similarly, I wonder whether we could abstract away on this pattern where you have either just
That is a good point, thanks. |
That one is a bit unfortunate. There is essentially one remaining corner case with the new rules -- comments after the last constructor of a variant definition. This code handles that case, so I would expect it to be a little annoying. Are there other examples which look hard to maintain? I suspect there are, but I reckon that most of them can be simplified with some thought (unlike the above example which seems inherently awkward). |
|
I propose to evolve "docstrings" into more rigid placement rules, which would integrate more smoothly with the language grammar. A first proposal has been posted there: http://caml.inria.fr/mantis/view.php?id=6811 The new mode would be optional, disabled by default for some OCaml releases, then enabled by default, and finally made mandatory at some point. In the Mantis ticket, I suggest two important migration paths:
|
|
@alainfrisch As I told in mantis I'm not sure this is a very good idea. When I'm playing with code/designing I move things a lot around, comment out some parts, uncomment part of it back, etc. and when I do that I don't think I want to be bothered about rigid documentation placing rules that would make my parsing fail. |
|
@dbuenzli: see Mantis for a discussion on this topic of error recovery to turn errors caused by badly located doc comments into warnings instead of fatal errors, even in strict-doc mode. That said, I'm not sure how much code refactoring would actually suffer in practice from fatal errors caused by badly located doc comments (we are not talking about arbitrary comments, just doc comments, which typically don't occur within function bodies). It should be quite easy to fix such an error, and it's better done during refactoring than after the fact. At worst, users can always turn off strict-doc during an intensive refactoring if they are annoyed by error. But I agree that if we could get only warnings and not a fatal error on the first badly located doc comment, it's nice, even though this might not justify a much increased complexity in the system. |
I don't know either, but I think this should be really experimented with to prove that it won't be annoying. For example I guess this would fail: type t = [ A (** doc a *) (* | B *) (** doc b *) | C (** doc c *) ]I'm not sure I want to have to bother here that I need to comment the B case with its own docstring.
I tend to compile with the compiler's default warning levels (except for a few exceptional sources tweaking warning levels is annoying, you want to use the compiler's defaults) and any triggered warning always gets all my attention (non exhaustive pattern matches are sadly a warning) so this would still be noise to me. Also IIRC the compiler doesn't report errors before warnings so when you use things like emacs' compilation mode you have to jump over all the warnings before getting to the error which is annoying. In general I'm suspicious of the whole warning idea and not really in favour of adding more of them. |
|
|
In response to feedback that the patch was too heavy on changes to parser.mly, I have updated the patch. This update changes the failure behaviour of ambiguous comments like: type t Previously this would raise a warning and ignore the comment. Now this raises a warning and attaches the comment to both items. This changes allows us to remove all context-dependent aspects of the attachment rules, significantly reducing the complexity of the code in parser.mly. I think that the code in parser.mly should now be simple to read and maintain for anyone familiar with how the existing parser handles locations and attributes. This updated patch has only had cursory testing, so it probably still needs a bit of work. |
|
@lpw25, the updated patch seems to have a merge conflict against the 4.02 branch |
Fixed (I think) |
That's precisely the problem, I can't forget about it since I will have to put that preference it in each of my makefiles. Another annoyance is that warning preferences changes the experience of the language from one person to the other and lead to discussions like this. |
|
@dbuenzli For the discussion of my "alternative" proposal, I suggest to continue on Mantis (the discussion about warnings also concern Leo's proposal). (I use quotes around "alternative", because supporting both proposals would provide a smooth migration strategy.)
Yes, clearly, this needs some experimentation.
This would work "by chance", with two doc comments attached to "A".
Doesn't it seem a good idea to get some feedback from the compiler for badly located doc comments? This can be annoying during refactoring, but especially if we go for stricter rules, it makes sense to have the compiler enforce them (at least as warnings).
It should not be very difficult to collect warnings and report them after type-checking (in case of success or error). |
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15968 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
|
I ran some quick tests of the performance of the patch by essentially doing These are obviously not very accurate, but they indicate that the patch does not cause any major regressions in terms of compilation speed. |
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15968 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@15968 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
|
Merged in 4.02 branch |
|
Mmh I don't see it in the commits yet. Is there any problems ? |
|
I'm not on the list of known commiters for the svn-to-git transformation. Perhaps this is breaking it. @avsm might know how to fix it. |
|
@lpw25 wasn't in the authors.txt; fixed in ocaml/ocaml.org-scripts@76854df |
What's the plan for merging this in trunk? |
|
As for all changes: by default I will merge it after the 4.02.2 release, but if you need it sooner ask @lpw25 to merge it now (and make my life easier in the process...) |
|
If this can make your life easier, it seems fair to ask @lpw25 to do the merge sooner than later, since he insisted to push the change to 4.02.2. Merging in trunk will also reduce risks of future conflicts and give larger visibility to the change (for instance for people who track trunk, as LexiFi does). |
|
I'll try and find time to push it to trunk this week |
|
Ping? |
This slipped down my list of things to do. I hope to find time in the next week or so. |
|
@lpw25 I'll wait for that before I merge 4.02.2 into trunk. |
|
Committed to trunk (revision 16189). |
The old domain termination logic was roughly:
- turn off incoming interrupts
- empty minor heap / mark stack
- quit
This had a subtle race condition: after incoming interrupts are
turned off, this domain no longer participates in STW heap cycling.
So, should a major GC cycle end just after we turn off incoming
interrupts, the GC colours can flip as we're emptying the mark
stack, causing segfaults / corruption.
The fix is to have domain termination logic like this:
loop {
- empty minor heap / mark stack
- handle incoming interrupts
} until no more incoming interrupts
- turn off incoming interrupts
- quit
This patch is a new version of #51. There are a number of differences between this patch and the old one:
The rules for attaching comments to nodes are as follows:
[@@@ ...]attributes).The main differences between these rules and the existing ocamldoc rules are:
This patch has only been tested very briefly, and needs more testing before merging, but I thought that I would make a pull request to get early feedback (and maybe some help testing).
Note that the change in approach means that this patch will not work for people using camlp4, there would need to be a corresponding patch for camlp4. Alternatively, the old version of the patch could be used to create a tool that would generate .cmt(i) files containing documentation comments from sources using camlp4.