Stylistic changes in the parser. by fpottier · Pull Request #2029 · ocaml/ocaml

fpottier · 2018-09-07T15:12:13Z

This PR introduces a number of generic definitions (mostly lists) in the parser and uses them to simplify some existing definitions. There should be no change in functionality. (Only one commit introduces a change by fixing a mistake in an error-handling production. The change is unlikely to be observed by anyone.)

There are probably more stylistic changes to come, but I am exposing this already.

gasche · 2018-09-07T17:15:48Z

@fpottier: Thanks! I believe that the changes go in the right direction of making the grammar more readable. I have looked at the PR and believe that it is generally correct, but not done a thorough review yet. I have some comments and remarks (some not specifically about this PR and for you, but about parser changes and for all of us). In no particular order (and cc @trefis, @let-def, @nojb):

The parser grammar uses a top-down rather than bottom-up style, trying to globally maintain an ordering from the most high-level syntactic categories to the more low-level grammar components. (In fact it is a tree, with local low-level rule right after their users.) In retrospect I think that it was probably a mistake to have the "generic rules" section before the rest of the grammar, it should probably be at the bottom to respect that reading order. The "macros" section could be moved as well, but (1) it responds to auxiliary definitions in the header file right above and (2) it is short and I hope it remains bounded in length. I'm worried about "generic rules" because they will keep growing, and may prevent novice users from easily finding where the "real grammar" is.

(This change can be done in an independent PR.)

Is there a way in Menhir to mention the propototype of a rule, without giving a definition? This could be moderately useful to document high-level generic rules before the grammar, with their implementation at the bottom.

Long-term maybe we want to use Menhir grammar-inclusion support and migrate generic definitions in a parser_aux.mly file. (We could even think of going further and separate, say, the type grammar from the expression grammar from the module grammar.)
I think we need a way to test AST-preserving parser changes to get assurance that silly mistakes (sometime hard for the human eye to spot) will be caught. I don't think I can resurrect the test technique that I used during the Menhir transition, because it relies on linking two different parser modules; that would mean asking users to play patchwork with an old and new grammar, and nobody wants that.

I have another idea which is to add a Makefile rule: %.ml.ast: %.ml, which would dump the -dparsetree output of the in-trunk compiler into the .ast file. (This is where I'm glad I kept the -stop-after parsing option in the Menhir-generated parser #292 patchset). The testing workflow is: (1) Before parser changes, produce .ast files for all .ml files in the compiler build, and commit them all in a separate git commit (to be removed later). (2) Hack on the parser. (3) At any point, reproduce .ast files and use git diff for debugging. (4) When you are done, remove the .ast commit.

(Edit: done in makefile rule to test parser changes by comparing ASTs stored in .ml.ast files #2030)
We still need to do the global $loc->$sloc migration in the grammar before more conflicts creep in. I'll try to do that shortly.

(Edit: done in parser.mly: consistently use $sloc over $loc #2031)
As someone who is not that knowledgeable or interested in the details of LR parsing, I would rather not think of having both "left lists" and "(right) lists", and have only one kind of lists, except maybe in a handful of very special conflict-avoidance-dance-done-by-experts cases.

I can tell that you just reused whichever style each production used, but would it be possible to experiment with just moving everything to one style or another (whichever you experts think is best), and having the "other" style just for special occasions? (If I want to do this myself, do I just need to drop the l in llist and check for conflicts, and no-conflicts means it is fine?).

I would also be interested in doing the same thing for the reversed_ stuff: it should be out of my view most of the time, and used only in exceptional cases.
I wonder if we could find nice mnemonics to handle the various delimiter-position possibilities that occur in the grammar, in order to have generic rules with a shorter name that preceded_separated_and_terminated_llist(BAR, row_field) -- I think shorter names may encourage us to use the rules inline in productions, instead of systematically giving auxiliary names, which may in fact result in more readable grammars.

gasche · 2018-09-07T17:24:41Z

Re. mnemonics, here would be an idea: have separated-list operators be of the form list<N>_<sepmode>(elem, sep), where <N> is the minimum number of elements of the list (list0, list1, list2) and <sepmode> specifies allowed separator placements, and is one of:

between: separator between each elements elem sep elem .. sep elem
before: separator before each element, first one optional: sep? elem sep elem ...
after: separator after each element, last one optional: ... elem sep elem sep?
around: separator between each element and around the list (one before, one after)
(I'm not sure we actually have any need for this one?)
before_strict, after_strict: like before, after, but the initial/final separator is not optional

Examples:

list1_before(clause, BAR)
list1_after(expr, SEMI)
list2_between(simple_type, STAR)

I think having the separation mode right before the arguments in that order vocalizes reasonably well: "before clauses, bars"; "after expressions, semicolons"; "between types, stars". Of course mixfix syntax for parametrized rules would be nicer :-'

xavierleroy · 2018-09-07T17:26:29Z

I believe that the changes go in the right direction of making the grammar less readable

Revealing slip of the pen? :-)

gasche · 2018-09-07T17:32:19Z

@xavierleroy: Argh. Edited. (I know it was a joke but let's counter-joke by replying seriously.)

There is a delicate balance to find between conciseness through abstraction and the complexity budget for human comprehension, so factorization through higher-order rules must be evaluated frankly.
But in that case I really meant the converse: I think the changes improve the grammar, through in particular the removal of List.rev in semantic actions (easy to forget) and a more systematic treatment of optional-or-not terminator-or-separator (for expr_comma_list for example).

(The current diff of the PR, starting with a not-short addition of subtly different generic rules with long-and-systematic names and long-and-comprehensive comments, does give a bit of an impression of "Our robot overlords have taken control of the parser today", hence the suggestion to move these parts further down in the grammar.)

xavierleroy · 2018-09-07T17:49:16Z

It was half joke, half Freudian observation... But, yes, I support the removal of List.rev from grammar actions. A missing reverse caused weird behavior in the otherwise very nice Menhir-Coq parser of CompCert: AbsInt/CompCert@7f6149a

fpottier · 2018-09-07T19:07:02Z

@gasche: thanks for your comments.

No, there is no currently way in Menhir of giving the type of a non-terminal symbol (although that would make sense).

Using multiple .mly files would make a lot of sense, I think.

Checking that parser changes preserve the ASTs sounds like a very good idea. Which code base to you intend to use to do this?

Regarding left-recursive vs. right-recursive lists, this can be important in order to avoid conflicts, and can also influence error reporting; I am not sure yet which of the two styles is preferable.

Regarding using rules inline, instead of systematically giving auxiliary names, I can see the temptation, but auxiliary symbols are actually quite useful, as they give a single point where a change can be applied (either a change that actually affects the language, or an internal change that affects the grammar but not the language).

Regarding your proposed mnemonics for lists, I agree that shorter names would be nice. Using 0-1-2 sounds good. I am less happy about before_strict and after_strict, which seem clunky. Of course the current naming scheme is much clunkier still!

gasche · 2018-09-07T19:27:06Z

My first proposal had before and obefore instead of before_strict and before. obefore is even clunkier! But I think we could maybe wait a bit, further factorize the grammar, and see what we really need.

I'll send a separate PR for ASTs testing.

gasche

Using #2030 I was able to check that this PR does not modify the parsed ASTs for the compiler distribution codebase.

gasche · 2018-09-07T21:49:00Z

check-typo is complaining because you went overlength in two places:

Checking ebef4e0415d39e8bad628883e852856e1df47b17: parsing/parser.mly
./parsing/parser.mly:770.81: [long-line] line is over 80 columns
./parsing/parser.mly:2000.81: [long-line] line is over 80 columns

These need to be fixed, and otherwise I think the PR can be merged. (It doesn't have a Changes entry,
but I think I will write a common entry for all base menhir-grammar-related PRs later.)

fpottier · 2018-09-08T11:12:27Z

Fixed the long lines.

gasche · 2018-09-08T11:13:48Z

There is a conflict in boot/menhir which prevents the continuous integration (CI) tests from running again on this branch. Maybe rebase on trunk?

gasche · 2018-09-08T22:16:58Z

(I went ahead and fixed the conflict.)

so as to indicate that they are left-recursive. (We reserve [list] for right-recursive lists, as in the Menhir standard library.)

Use it to eliminate [simple_labeled_expr_list].

…uction. The use of [expr_comma_list] seemed to be a mistake.

of a generic definition.

…ty_list].

Update the definition of [expr_semi_list] so that this is no longer necessary. This is prettier and saves 4 states.

This saves two states.

…fect).

This was referenced Sep 7, 2018

makefile rule to test parser changes by comparing ASTs stored in .ml.ast files #2030

Merged

parser.mly: consistently use $sloc over $loc #2031

Merged

gasche approved these changes Sep 7, 2018

View reviewed changes

gasche force-pushed the style branch from a57ef65 to 9914bbd Compare September 8, 2018 22:16

fpottier added 15 commits September 9, 2018 00:24

Use [llist] instead of [list] in the names of some nonterminal symbols,

53f5841

so as to indicate that they are left-recursive. (We reserve [list] for right-recursive lists, as in the Menhir standard library.)

Define [nonempty_llist].

fc6e822

Use it to eliminate [simple_labeled_expr_list].

Define [lident_list] as [mkrhs(LIDENT)+].

1b3139e

Replace [expr_comma_list] with [expr] in an unclosed-brace error prod…

4d6280a

…uction. The use of [expr_comma_list] seemed to be a mistake.

Define [separated_nonempty_llist].

4622d89

Replace the definition of [expr_comma_list] with an equivalent instance

94835f5

of a generic definition.

Introduce [separated_or_terminated_nonempty_list(delimiter, X)].

1644e41

Simplified definition of [lbl_expr_list].

3354ac7

An equivalent (shorter) definition of [separated_or_terminated_nonemp…

055c523

…ty_list].

Simplify the definition of [field_expr_list]. Saves one state.

5e8981d

Simplified definition of [expr_semi_list].

21a24b5

[expr_semi_list] was always followed with [opt_semi].

c1b9a90

Update the definition of [expr_semi_list] so that this is no longer necessary. This is prettier and saves 4 states.

Simplified definition of [pattern_semi_list].

bc63741

[pattern_semi_list] was always followed with [opt_semi]; simplified.

534b0aa

This saves two states.

Simplified definition of [optional_type_parameter_list].

d5fafa2

fpottier added 11 commits September 9, 2018 00:24

Typo.

f3cc928

Simplified definition of [type_parameter_list].

13ca153

Simplified definition of [typevar_list].

5d4b9f2

Simplified definitions of [poly_type] and [poly_type_no_attr].

5f8c4e8

Simplified definition of [row_field_list].

a296147

Simplified definition of [amper_type_list].

f687cc0

Definition of [preceded_or_separated_nonempty_llist], currently unused.

076fc7c

Simplified definition of [name_tag_list].

fe32d3f

Simplified definition of [core_type_list].

dac0cc9

Run [make promote-menhir] (so the previous parser changes now take ef…

7e8ad75

…fect).

Fix two long lines.

9914bbd

gasche merged commit a24ac53 into ocaml:trunk Sep 9, 2018

fpottier deleted the style branch September 10, 2018 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stylistic changes in the parser.#2029

Stylistic changes in the parser.#2029
gasche merged 26 commits intoocaml:trunkfrom
fpottier:style

fpottier commented Sep 7, 2018 •

edited

Loading

Uh oh!

gasche commented Sep 7, 2018 •

edited

Loading

Uh oh!

gasche commented Sep 7, 2018 •

edited

Loading

Uh oh!

xavierleroy commented Sep 7, 2018

Uh oh!

gasche commented Sep 7, 2018 •

edited

Loading

Uh oh!

xavierleroy commented Sep 7, 2018

Uh oh!

fpottier commented Sep 7, 2018

Uh oh!

gasche commented Sep 7, 2018

Uh oh!

gasche left a comment

Uh oh!

gasche commented Sep 7, 2018

Uh oh!

fpottier commented Sep 8, 2018

Uh oh!

gasche commented Sep 8, 2018

Uh oh!

gasche commented Sep 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fpottier commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Sep 7, 2018

Uh oh!

gasche commented Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Sep 7, 2018

Uh oh!

fpottier commented Sep 7, 2018

Uh oh!

gasche commented Sep 7, 2018

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

gasche commented Sep 7, 2018

Uh oh!

fpottier commented Sep 8, 2018

Uh oh!

gasche commented Sep 8, 2018

Uh oh!

gasche commented Sep 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fpottier commented Sep 7, 2018 •

edited

Loading

gasche commented Sep 7, 2018 •

edited

Loading

gasche commented Sep 7, 2018 •

edited

Loading

gasche commented Sep 7, 2018 •

edited

Loading