Conversation
|
We need to have some syntax to distinguish the barred delimiters |
|
My vote would be for just having the "barred" versions. I think that the simplicity and the consistency with the existing quotation syntax is worth the cost of an extra couple of unobtrusive characters. |
|
In the above example,
Does the quoted string work when we have the balanced use of Is the syntax below supported? That feature would enable ppxes to support a large number of dsl. I think it best if we leave the anti-quotation mechanism to ppx authors to exploit and implement. |
|
To provide more context to my example above, here is the jsx syntax I would like to eventually support/implement in ocaml. https://github.com/bikallem/jsx-ssr/blob/master/examples/JsxTester.re Note that the example is my current experiments with the ReasonML JSX syntax. I would like to implement the same syntax with the addition of |
Would replacing let div1 = {%M.foo <hello> <hi> }
let div2 = {%%M.foo| <hello>
<hi> |}
let div3 = {%M.foo| <hello> <hi> |}
|
|
Some clarifications:
I strongly think we should optimize for the most common form, i.e, the short one. For quoted strings, the long form |
|
Some remarks:
|
|
Alright, I still would prefer the shortest version, but consistency is more important. I'll change the PR to only add barred versions. Any opinion on the position question? |
|
Your plan wrt. position sounds very reasonable to me. |
|
Done and done. All the locations should be decent. The patch to add locations to string constants is big, but boring. |
|
The implementation is fine, but there hasn't been enough discussion among language maintainers to create a consensus on the new language feature. We may discuss it on the occasion of an upcoming development meeting, or in an online discussion round. |
|
@gasche ping. Any movement on this perhaps? I think this new ppx syntax along with the upcoming ppxlib work (https://discuss.ocaml.org/t/the-future-of-ppx/3766) will make the ppx system in ocaml much more ergonomic and useful for use cases other than UI(jsx/html) such as sql, configs, json and so on. Would really love to see this merged soonish. 🥇 |
|
I just rebased. The change entry is still in the 4.10 section, but I suspect I'll have to move it before merging. |
aba02be to
890b644
Compare
|
Rebased, Short manual added, All tests pass. :) |
|
We had a developer meeting today where the syntax as currently proposed (only the barred form, with a space between the extension and quote name) was approved in principle. We still need someone to do a thorough review of the implementation. I did a review of an earlier version of the patch (without the location changes), this would need to be reviewed again. |
parsing/lexer.mll
Outdated
| let kwdopchar = | ||
| ['$' '&' '*' '+' '-' '/' '<' '=' '>' '@' '^' '|'] | ||
|
|
||
| let extattrident = identchar+ ('.' identchar+)* |
There was a problem hiding this comment.
This would require a comment to explain why the definition is as such. If I understand correctly, you are trying to accept the same as the parser's attr_id category, modulo whitespace differences.
The parser definition requires components (between the dots) to be valid OCaml identifiers or keywords. Your definition accepts components that start with a number. Why not require a (lowercase | uppercase) at the head of components?
parsing/lexer.mll
Outdated
| let store_escaped_uchar lexbuf u = | ||
| if in_comment () then store_lexeme lexbuf else store_string_utf_8_uchar u | ||
|
|
||
| let with_string f lexbuf = |
There was a problem hiding this comment.
Could you use a clearer name for f? consume_string?
There was a problem hiding this comment.
Actually even with_string is a confusing name; usually we call with_foo functions of the form init -> (foo -> 'a) -> 'a. What about build_string_token or just string_token?
There was a problem hiding this comment.
It's called with_string because it's exactly the same as with_comment which is defined just underneath. Should I also change that function ?
There was a problem hiding this comment.
You have a semi-valid argument for leaving the code exactly as ugly as it currently is. I think you should do as you think is best.
There was a problem hiding this comment.
Renamed to wrap_(string|comment)_lexer.
parsing/parser.mly
Outdated
| %token STAR | ||
| %token <string * string option> STRING | ||
| %token <string * Location.t * string option> STRING | ||
| %token <string * string * Location.t * string option> BRACEPERCENTBRACE |
There was a problem hiding this comment.
I dislike the token name. To me a token named BRACEPERCENTBRACE looks like {%{ in program text. You should call it QUOTED_STRING_EXTENSION or something.
There was a problem hiding this comment.
All the other tokens are named after an approximation of the syntax they represent in term of characters, not in term of semantics, but I don't care enough to argue either way.
There was a problem hiding this comment.
Exactly: a token called BRACEPERCENTBRACE should have the text representation {%{, not the whole string. We call the string token STRING, not DOUBLEQUOTE.
parsing/parser.mly
Outdated
| let wrap_mksig_ext ~loc (item, ext) = | ||
| wrap_sig_ext ~loc (mksig ~loc item) ext | ||
|
|
||
| let mk_quotedext ~loc ~shift (id, content, strloc, delim) = |
There was a problem hiding this comment.
It's late and my head hurts looking at this code. Why is it so complex if you did all the work in the lexer?
There was a problem hiding this comment.
Because life is not all pink and sometimes you need to get your hand dirty with locations. :)
The complex part of this code computes the position of foo.bar in {%foo.bar-baz|...baz|}. It's only complicated because the representation of position is mildly akward.
Moving this code to the lexer is feasible, but requires differentiating code path between strings and quoted extensions. I do not see how that improves anything.
There was a problem hiding this comment.
One thing that I find suspicious for example, is that the values passed to the ~shift parameter depend on the lexical representation of the token. This sounds like lexer-level logic rather than parser-level logic.
At least it would be nice there were comments on what various parts of the code do. (What is the shift argument for? What is the difference between loc and strloc?)
There was a problem hiding this comment.
The last commit moves it to the lexer.
Co-Authored-By: Alain Frisch <alain@frisch.fr>
gasche
left a comment
There was a problem hiding this comment.
I find the new implementation approach easier to read, thanks. (Having a name for the auxiliary concept of "get the location of the identifier" makes it easier to follow.)
I think (again) that this is generally good to go. Please see the last question about the use of ghost locations in the grammar rule.
|
Quoted extensions should probably be handled in comments too, such that |
|
I would expect |
|
We would like the following extension of the testsuite to work properly. It currently does not: diff --git a/testsuite/tests/parsing/quotedextensions.ml b/testsuite/tests/parsing/quotedextensions.ml
index 6fcb9e2654..3962cef32d 100644
--- a/testsuite/tests/parsing/quotedextensions.ml
+++ b/testsuite/tests/parsing/quotedextensions.ml
@@ -30,3 +30,15 @@ let {%M.foo bar| <hello>{|x|} |bar}
{x}
</hello>
|}
+
+(* Double quotes inside quoted strings inside comments: *)
+(* {|"|}, and *)
+(* [%foo {|"|}], and *)
+(* {%foo|"|}
+ should be valid inside comments *)
+
+(* Double quotes inside quoted strings inside comments: *)
+(* {|*)|}, and *)
+(* [%foo {|*)|}], and *)
+(* {%foo|*)|}
+ should be valid inside comments *) |
This is the follow up of discussion started in #8802. cc @lpw25 @gasche @bikallem
This introduces a new shortcut for ppx extensions. Quoted extensions of the form
{%foo|....|}are desugared into
[%foo{|...|}]....can be arbitrary content, similar to quoted strings.The goal of such quoted extensions is to allow easier insertion of sublanguages. For instance, tyxml's ppx currently uses the form:
With quoted extensions, we can get rid of string syntax, which makes it lighter
This would be useful for many things, such as better support for PPX that implement jsx, sql, regex, etc.
Currently this syntax does not implement antiquotations. The plan is to first implement antiquotations outside the compiler (using the ocaml parser directly) as a library, experiment with it, before eventually upstreaming a concrete syntax. @lpw25 also proposed to add string interpolation to the language, which would interact well.
Examples are available in the testfile. Here are some examples.
Implementation Discussion
The implementation is fairly straighforward, but almost completely in the lexer. This means the mechanism to build the extension id is different from other extensions and does not have the same sensitivity to whitespaces (
[%M. bar. baz "hello" ]is allowed in extensions, but not in the new quoted extensions). I would tend to say such things should be forbidden.One more thorny discussion is where to place the locations for the internal string. Currently, the locations are mostly wrong. One proposition would be to change the type of
Pconst_stringto add a location which only spans the inside of the string. Having to do index computations to figure this out inside a ppx is fairly common, and rather annoying. This would solve the question once and for all.