Lex identifiers in comments#1901
Conversation
damiendoligez
left a comment
There was a problem hiding this comment.
Wait a second:
- I'm pretty sure this is not complete.
- @xavierleroy will probably have objections too.
|
I guess @damiendoligez summons me so that I can express my usual concern about this topic. Comments are intended first and foremost for writing prose in natural language. Commenting code out comes second. "Lexing" the contents of comments using the same lexical rules as for OCaml code might improve the "commenting out" behavior, but should not complicate the writing of natural language. Here, I think @damiendoligez has counterexamples showing that lexing identifiers is not enough, and I would like to be reassured that normal uses of single quotes in text is not impacted. |
|
I agree that commenting out code is not what comments are intended for, but comments are used for documentation too, and thus can contain code. In any case, the current situation is that the content of comments is checked for strings and character literals, which maybe already complicates things more than it should, especially if it doesn't even work. I think this simple fix makes it work in all cases without complicating things further, but maybe I'm missing something. The only use of single quotes in natural language that is affected, is when the sequence I stumbled upon it while trying to let Atom highlight comments correctly, which turns out to be rather difficult, and I don't think any editor does it right (but I'm getting there). |
Emacs (caml-font.el) did, at some point before the lexing of number literals was simplified. It should be easier now. |
|
Give me a few days to think about completeness and check this against OPAM packages. |
|
Side note: I almost wish that there was a distinct syntax for commenting out blocks of code, because I sometimes get myself in trouble when I include double quotes and other such things in comments. The fact that comments aren't pure uninterpreted blocks of text is a double edged sword. It makes it much easier to comment out code, but it makes the behavior of comments as documentation a bit more unexpected. |
Note especially lightweight, but you can use quoted strings: (*{foo|
blabl"abla
|foo}*) |
I suppose you can, though you need caution if commenting out the last part of a function. More to the point, though, this doesn't fix the fact that a comment like: is unexpectedly hard to write. |
|
I think Alain's point is that you just write: I.e. this is a "special syntax" for free-form comments, not for commented code. |
What do you mean?
(*{| |
This doesn't work for docstrings. Something like "quoted comments" (e.g. |
I mean that if one is naively writing comments and is used to other languages, one believes that the contents of a comment are free text, but they aren't. You can't just write whatever you feel like inside them. If there was a distinct way to comment out code blocks, this would not have to be the case. But that's not going to happen so it's just me musing. |
I only tried tuareg mode, which does it wrong. |
|
@314eter Possibly a good reason to file a report with the Tuareg team, pointing them at the code that works correctly. |
|
The same problem occurs in ocamllex, where something like gives an error, and it can be solved the same way in Can I include that in this PR, or is it better to open a new one? |
|
I think it's better to open a new PR for ocamllex. This is a language change, while for ocamllex it's a bug fix. |
damiendoligez
left a comment
There was a problem hiding this comment.
I do believe this is complete, and a good change. It doesn't break any OPAM package (that can be compiled with trunk).
However, we'll need to update caml-font.el accordingly. I can help with that.
|
I have no experience with emacs and/or lisp, so I'll probably mess something up if I try to update |
66d32c5 to
496ce53
Compare
|
Fixing |
Would similar fixes to the tuareg code be straightforward? |
1bb1924 to
b1901cb
Compare
|
@damiendoligez Do you have time to take a look at this (and maybe #1932) again? |
|
I removed |
damiendoligez
left a comment
There was a problem hiding this comment.
Good to merge as soon as Changes is rebased. And apologies for the delay.
d0bca98 to
4a72f59
Compare
|
I rebased. |
|
Merged, thanks! |
|
In ocaml/opam-repository#16114 (comment) the current maintainer of For example, comparing the same code with OCaml 4.09.0 and 4.10.0, we get a different result: vs. Is this change intended? |
|
That change is intended, since comments are used to comment out OCaml code or to write OCaml code in documentation. In OCaml, Unfortunately, this breaks in other languages like PHP. I don't know why it didn't fail to build when @damiendoligez tested this change on opam (#1901 (review)). |
The latest version of |
I don't know whether it's documented anywhere, but it seems comments, strings and character literals in comments are lexed such that all code can be commented. But there is one exception.
(* f' '"' *)is lexed as the character literal
' 'followed by the unclosed string"' *).This can be solved by lexing identifiers in comments too.