Skip to content

[lexer] keyword protected quotation token for arbitrary text#9733

Merged
ppedrot merged 5 commits intorocq-prover:masterfrom
gares:quotations
Mar 31, 2019
Merged

[lexer] keyword protected quotation token for arbitrary text#9733
ppedrot merged 5 commits intorocq-prover:masterfrom
gares:quotations

Conversation

@gares
Copy link
Copy Markdown
Member

@gares gares commented Mar 10, 2019

This PR implements a feature I could use in elpi. Today the only token that can carry arbitrary text is STRING that suffers from escaping shortcomings. This PR implements a class of tokens that can carry arbitrary text, so that 3rd party languages can be easily embedded in .v files.

From the last commit:
One can now register a quotation using a grammar rule with
QUOTATION "name:". In that case name: becomes a keyword and the token is
generated for name: followed by a an identifier or a parenthesized
text. Eg

  constr:x
  string:[....]
  ltac:(....)
  ltac:{....}

The delimiter is made of 1 or more occurrences of the same parenthesis,
eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to
contain the closing delimiter, one can make the delimiter longer and avoid
confusion (no escaping). Eg

string:[[ .. ']' .. ]]

Nesting the delimiter is allowed, eg ((..((...))..)) is OK.

The text inside the quotation is returned as a string (including the
parentheses), so that a third party parser can take care of it.

Keywords don't need to end in :.

@gares gares changed the title Quotations [lexer] keywrod protected quotation token for arbitrary text Mar 10, 2019
@JasonGross JasonGross changed the title [lexer] keywrod protected quotation token for arbitrary text [lexer] keyword protected quotation token for arbitrary text Mar 10, 2019
@ppedrot
Copy link
Copy Markdown
Member

ppedrot commented Mar 11, 2019

Note that this is probably going to conflict with #8764, it might be worthwhile to factor out the common parts.

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 11, 2019

I'm fine taking the conflicts, but #8764 seems to have stalled, I'll ping.

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 11, 2019

Wrt the changes, they are quite orthogonal as far as I can tell.

@proux01
Copy link
Copy Markdown
Contributor

proux01 commented Mar 11, 2019

Wrt the changes, they are quite orthogonal as far as I can tell.

Indeed, but Pierre Marie is right in that some conflicts may arise (c.f. 4bc83c8).

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 11, 2019

Sure. I was just offering to give you precedence. My change is smaller and it is hence easier to rebase. Of course this works for me only if you push your PR ahead, hence my ping.

@gares gares marked this pull request as ready for review March 11, 2019 16:47
@gares gares requested review from a team, ejgallego and mattam82 as code owners March 11, 2019 16:47
@gares gares added the needs: documentation Documentation was not added or updated. label Mar 11, 2019
@gares gares added this to the 8.10+beta1 milestone Mar 11, 2019
@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 11, 2019

OK, I've tested it in elpi and it works for me. I've added the needs:documentation because I don't know where to put it (the commit message/PR header contains quite some doc already).

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 11, 2019

@ppedrot BTW some of this code and/or the delimiting convention could be used in coqpp too. I had to write ... -> { ... '\x7b' ... } somewhere in elpi in a grammar extend, since '}' would confuse coqpp by terminating the ML block too early. The convention proposed here would have let me just write ... -> {{ ... '}' ... }}.

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 15, 2019

ping @coq/parsing-maintainers (and @coq/doc-maintainers to tell me where I should put the doc of the QUOTATION token, especially the non-escaping rules).

@Zimmi48
Copy link
Copy Markdown
Member

Zimmi48 commented Mar 15, 2019

This should probably go in the “Syntax extension” chapter (https://coq.inria.fr/refman/user-extensions/syntax-extensions.html) if it impacts the users directly. If it only concerns the plugin writers, then it should go in the non-existing plugin writing documentation chapter 😈. Until then, you can just create a new Markdown file in dev/doc.

@Zimmi48
Copy link
Copy Markdown
Member

Zimmi48 commented Mar 15, 2019

The index in https://github.com/coq/coq/tree/master/dev#miscellaneous-information-about-the-code-devdoc should be updated if you create a new file in dev/doc.

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 18, 2019

It does not concern users, unless they use an extension language that uses the feature.
So I'll write the comment in CLexer or Tok for now.

@gares gares removed the needs: documentation Documentation was not added or updated. label Mar 18, 2019
@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 18, 2019

ping @coq/parsing-maintainers for the review

@gares gares added kind: enhancement Enhancement to an existing user-facing feature, tactic, etc. needs: review labels Mar 20, 2019
Comment thread gramlib/plexing.mli
Comment thread parsing/cLexer.mli

(** This should be functional but it is not due to the interface *)
val add_keyword : string -> unit
val add_keyword : ?quotation:starts_quotation -> string -> unit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be clearer to have two functions ? add_keyword and add_quotation ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but in the current model a quotation is always initiated by a keyword. So the API would look like

val add_keyword : string -> unit
val add_keyword_for_quotation : string -> unit

that is pretty much what we have in the patch...

Copy link
Copy Markdown
Contributor

@ejgallego ejgallego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not expert enough in this code, I let @ppedrot or @herbelin handle this.

I saw nothing wrong.

Copy link
Copy Markdown
Member

@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is good, I only have a minor comment.

Comment thread parsing/cLexer.mli Outdated
@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 21, 2019

This till lacks an assignee. @ppedrot can you take it?

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 26, 2019

ping

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 28, 2019

ping @ppedrot (time for 8.10 is running out)

@ppedrot
Copy link
Copy Markdown
Member

ppedrot commented Mar 30, 2019

@gares Can you rebase? I'll merge it after.

gares added 5 commits March 31, 2019 14:33
Tokens were having a double role:
- the output of the lexer
- the items of grammar entries, especially terminals

Now tokens are the output of the lexer, and this paves the way for
using a richer data type, eg including Loc.t

Patterns, as in Plexing.pattern, only represent patterns (for tokens)
and now have a bit more structure (eg the wildcard is represented
as None, not as "", while a regular pattern for "x" as Some "x")
One can now register a quotation using a grammar rule with
QUOTATION("name:"). "name:" becomes a keyword and the token is
generated for name: followed by a an identifier or a parenthesized
text. Eg

  constr:x
  string:[....]
  ltac:(....)
  ltac:{....}

The delimiter is made of 1 or more occurrences of the same parenthesis,
eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to
contain the closing delimiter, one can make the delimiter longer and avoid
confusion (no escaping). Eg

  string:[[ .. ']' .. ]]


Nesting the delimiter is allowed, eg ((..((...))..)) is OK.

The text inside the quotation is returned as a string (including the
parentheses), so that a third party parser can take care of it.

Keywords don't need to end in ':'.
@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 31, 2019

I did rebase the overlay, the error seems from another pr

@Zimmi48
Copy link
Copy Markdown
Member

Zimmi48 commented Mar 31, 2019

The error seems very related to this PR to me:

File "./src/Rewriter.v", line 1347, characters -3739--3739:
Warning: Not interpreting "*)" as the end of current non-terminated comment
because it occurs in a non-terminated string of the comment.
[comment-terminator-in-string,parsing]
File "./src/Rewriter.v", line 3585, characters -1686-0:
Error: Syntax Error: Lexer: Unterminated string

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 31, 2019

Ah ok, the Ltac2 one is fixed now.
The one of Fiat seems indeed related, I look into it tomorrow. It is still weird that it passed CI a week ago...

@proux01
Copy link
Copy Markdown
Contributor

proux01 commented Mar 31, 2019

Even weirder, I just rebased #9815 over this PR and CI passes (with just a simple overlay for ltac2).

@gares
Copy link
Copy Markdown
Member Author

gares commented Mar 31, 2019

Restarted, and now it is green... I guess they pushed a bad commit

@ppedrot ppedrot merged commit f832476 into rocq-prover:master Mar 31, 2019
ppedrot added a commit that referenced this pull request Mar 31, 2019
…ry text

Ack-by: Zimmi48
Ack-by: ejgallego
Ack-by: gares
Reviewed-by: ppedrot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind: enhancement Enhancement to an existing user-facing feature, tactic, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants