Lex identifiers in comments by 314eter · Pull Request #1901 · ocaml/ocaml

314eter · 2018-07-12T14:24:51Z

I don't know whether it's documented anywhere, but it seems comments, strings and character literals in comments are lexed such that all code can be commented. But there is one exception.

(* f' '"' *)

is lexed as the character literal ' ' followed by the unclosed string "' *).

This can be solved by lexing identifiers in comments too.

nojb

LGTM.

damiendoligez

Wait a second:

I'm pretty sure this is not complete.
@xavierleroy will probably have objections too.

xavierleroy · 2018-07-12T17:25:08Z

I guess @damiendoligez summons me so that I can express my usual concern about this topic.

Comments are intended first and foremost for writing prose in natural language. Commenting code out comes second. "Lexing" the contents of comments using the same lexical rules as for OCaml code might improve the "commenting out" behavior, but should not complicate the writing of natural language.

Here, I think @damiendoligez has counterexamples showing that lexing identifiers is not enough, and I would like to be reassured that normal uses of single quotes in text is not impacted.

314eter · 2018-07-12T18:08:52Z

I agree that commenting out code is not what comments are intended for, but comments are used for documentation too, and thus can contain code. In any case, the current situation is that the content of comments is checked for strings and character literals, which maybe already complicates things more than it should, especially if it doesn't even work. I think this simple fix makes it work in all cases without complicating things further, but maybe I'm missing something.

The only use of single quotes in natural language that is affected, is when the sequence '"' (single-double-single) occurs, and even then it's only a problem in certain combinations like a'"' or a' '"'.

I stumbled upon it while trying to let Atom highlight comments correctly, which turns out to be rather difficult, and I don't think any editor does it right (but I'm getting there).

damiendoligez · 2018-07-13T11:34:29Z

I don't think any editor does it right

Emacs (caml-font.el) did, at some point before the lexing of number literals was simplified. It should be easier now.

damiendoligez · 2018-07-13T11:40:59Z

Give me a few days to think about completeness and check this against OPAM packages.

pmetzger · 2018-07-13T11:42:25Z

Side note: I almost wish that there was a distinct syntax for commenting out blocks of code, because I sometimes get myself in trouble when I include double quotes and other such things in comments. The fact that comments aren't pure uninterpreted blocks of text is a double edged sword. It makes it much easier to comment out code, but it makes the behavior of comments as documentation a bit more unexpected.

alainfrisch · 2018-07-13T11:51:07Z

I almost wish that there was a distinct syntax

Note especially lightweight, but you can use quoted strings:

(*{foo|
      blabl"abla
   |foo}*)

pmetzger · 2018-07-13T12:11:13Z

Note especially lightweight, but you can use quoted strings:

I suppose you can, though you need caution if commenting out the last part of a function. More to the point, though, this doesn't fix the fact that a comment like:

(* recognizes: ' " @ + *)

is unexpectedly hard to write.

damiendoligez · 2018-07-13T12:25:51Z

I think Alain's point is that you just write:

(*{| recognizes: ' " @ + |}*)

I.e. this is a "special syntax" for free-form comments, not for commented code.

alainfrisch · 2018-07-13T12:30:09Z

I suppose you can, though you need caution if commenting out the last part of a function.

What do you mean?

More to the point, though, this doesn't fix the fact that a comment like:

(*{|
recognizes: ' " @ +
|}*)

314eter · 2018-07-13T12:30:53Z

Not especially lightweight, but you can use quoted strings

This doesn't work for docstrings. Something like "quoted comments" (e.g. (*foo| recognizes: ' " @ + |foo*)) would work, but it's too late for that.

pmetzger · 2018-07-13T12:33:50Z

What do you mean?

I mean that if one is naively writing comments and is used to other languages, one believes that the contents of a comment are free text, but they aren't. You can't just write whatever you feel like inside them. If there was a distinct way to comment out code blocks, this would not have to be the case. But that's not going to happen so it's just me musing.

314eter · 2018-07-13T14:02:27Z

Emacs (caml-font.el) did, at some point before the lexing of number literals was simplified.

I only tried tuareg mode, which does it wrong. caml-font.el handles comments correctly, but is worse than tuareg for other things.

pmetzger · 2018-07-13T15:13:57Z

@314eter Possibly a good reason to file a report with the Tuareg team, pointing them at the code that works correctly.

314eter · 2018-07-14T01:11:14Z

The same problem occurs in ocamllex, where something like

rule token = parse _ { f' '"' }

gives an error, and it can be solved the same way in lex/lexer.mll.

Can I include that in this PR, or is it better to open a new one?

damiendoligez · 2018-07-16T12:36:58Z

I think it's better to open a new PR for ocamllex. This is a language change, while for ocamllex it's a bug fix.

damiendoligez

I do believe this is complete, and a good change. It doesn't break any OPAM package (that can be compiled with trunk).

However, we'll need to update caml-font.el accordingly. I can help with that.

314eter · 2018-07-16T15:38:08Z

I have no experience with emacs and/or lisp, so I'll probably mess something up if I try to update caml-font.el.

314eter · 2018-07-18T13:39:14Z

Fixing caml-font.el was not as difficult as I thought.

pmetzger · 2018-07-18T15:32:53Z

Fixing caml-font.el was not as difficult as I thought.

Would similar fixes to the tuareg code be straightforward?

314eter · 2018-12-03T22:46:29Z

@damiendoligez Do you have time to take a look at this (and maybe #1932) again?

314eter · 2019-02-26T15:44:04Z

I removed caml-font.el and opened ocaml/caml-mode#4 instead.

damiendoligez

Good to merge as soon as Changes is rebased. And apologies for the delay.

314eter · 2019-04-29T10:55:18Z

I rebased.

damiendoligez · 2019-04-30T14:10:43Z

Merged, thanks!

kit-ty-kate · 2020-03-31T21:55:25Z

In ocaml/opam-repository#16114 (comment) the current maintainer of pfff (cc @aryx) did hit an issue that seems to have been carried by this change in the lexer.

For example, comparing the same code with OCaml 4.09.0 and 4.10.0, we get a different result:

        OCaml version 4.09.0

# type t = Test of unit (* '"' or b'"' *) * unit list * unit (* '"' *);;
type t = Test of unit * unit list * unit

vs.

        OCaml version 4.10.0

# type t = Test of unit (* '"' or b'"' *) * unit list * unit (* '"' *);;
type t = Test of unit

Is this change intended?

314eter · 2020-03-31T22:29:06Z

That change is intended, since comments are used to comment out OCaml code or to write OCaml code in documentation. In OCaml, b' would be an identifier, followed by the string "' *)...(* '".

Unfortunately, this breaks in other languages like PHP. I don't know why it didn't fail to build when @damiendoligez tested this change on opam (#1901 (review)).

kit-ty-kate · 2020-03-31T22:49:50Z

I don't know why it didn't fail to build when @damiendoligez tested this change on opam (#1901 (review)).

The latest version of pfff that was available on opam did not support OCaml >= 4.09, so this particular one wasn't tested.

nojb approved these changes Jul 12, 2018

View reviewed changes

damiendoligez requested changes Jul 12, 2018

View reviewed changes

alainfrisch closed this Jul 13, 2018

alainfrisch reopened this Jul 13, 2018

damiendoligez requested changes Jul 16, 2018

View reviewed changes

Chris00 mentioned this pull request Jul 16, 2018

Comments with string embedded ocaml/tuareg#95

Open

314eter mentioned this pull request Jul 17, 2018

Quoted strings and octal character literals in ocamllex actions #1912

Merged

314eter force-pushed the apostrophe-in-comment branch from 66d32c5 to 496ce53 Compare July 18, 2018 13:37

314eter mentioned this pull request Jul 26, 2018

Octal character literals and apostrophes in ocamlyacc actions #1932

Merged

314eter force-pushed the apostrophe-in-comment branch from 1bb1924 to b1901cb Compare December 3, 2018 20:36

314eter mentioned this pull request Feb 26, 2019

Octal character literals and identifiers in comments ocaml/caml-mode#4

Merged

damiendoligez approved these changes Apr 26, 2019

View reviewed changes

314eter added 4 commits April 26, 2019 15:19

Lex identifiers in comments

4e09ec8

Add changes entry

77b2ad5

Octal character literal in comments

af103fd

Update changes entry

4a72f59

314eter force-pushed the apostrophe-in-comment branch from d0bca98 to 4a72f59 Compare April 26, 2019 13:23

damiendoligez merged commit 9f904f9 into ocaml:trunk Apr 30, 2019

Conversation

314eter commented Jul 12, 2018

Uh oh!

nojb left a comment

Choose a reason for hiding this comment

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

xavierleroy commented Jul 12, 2018

Uh oh!

314eter commented Jul 12, 2018

Uh oh!

damiendoligez commented Jul 13, 2018

Uh oh!

damiendoligez commented Jul 13, 2018

Uh oh!

pmetzger commented Jul 13, 2018

Uh oh!

alainfrisch commented Jul 13, 2018

Uh oh!

pmetzger commented Jul 13, 2018

Uh oh!

damiendoligez commented Jul 13, 2018

Uh oh!

alainfrisch commented Jul 13, 2018 • edited by damiendoligez Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

314eter commented Jul 13, 2018

Uh oh!

pmetzger commented Jul 13, 2018

Uh oh!

314eter commented Jul 13, 2018

Uh oh!

pmetzger commented Jul 13, 2018

Uh oh!

314eter commented Jul 14, 2018

Uh oh!

damiendoligez commented Jul 16, 2018

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

314eter commented Jul 16, 2018

Uh oh!

314eter commented Jul 18, 2018

Uh oh!

pmetzger commented Jul 18, 2018

Uh oh!

314eter commented Dec 3, 2018

Uh oh!

314eter commented Feb 26, 2019

Uh oh!

damiendoligez left a comment

Choose a reason for hiding this comment

Uh oh!

314eter commented Apr 29, 2019

Uh oh!

damiendoligez commented Apr 30, 2019

Uh oh!

kit-ty-kate commented Mar 31, 2020

Uh oh!

314eter commented Mar 31, 2020

Uh oh!

kit-ty-kate commented Mar 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alainfrisch commented Jul 13, 2018 •

edited by damiendoligez

Loading