Skip to content

Stylistic changes in the parser.#2029

Merged
gasche merged 26 commits intoocaml:trunkfrom
fpottier:style
Sep 9, 2018
Merged

Stylistic changes in the parser.#2029
gasche merged 26 commits intoocaml:trunkfrom
fpottier:style

Conversation

@fpottier
Copy link
Copy Markdown
Contributor

@fpottier fpottier commented Sep 7, 2018

This PR introduces a number of generic definitions (mostly lists) in the parser and uses them to simplify some existing definitions. There should be no change in functionality. (Only one commit introduces a change by fixing a mistake in an error-handling production. The change is unlikely to be observed by anyone.)

There are probably more stylistic changes to come, but I am exposing this already.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 7, 2018

@fpottier: Thanks! I believe that the changes go in the right direction of making the grammar more readable. I have looked at the PR and believe that it is generally correct, but not done a thorough review yet. I have some comments and remarks (some not specifically about this PR and for you, but about parser changes and for all of us). In no particular order (and cc @trefis, @let-def, @nojb):

  1. The parser grammar uses a top-down rather than bottom-up style, trying to globally maintain an ordering from the most high-level syntactic categories to the more low-level grammar components. (In fact it is a tree, with local low-level rule right after their users.) In retrospect I think that it was probably a mistake to have the "generic rules" section before the rest of the grammar, it should probably be at the bottom to respect that reading order. The "macros" section could be moved as well, but (1) it responds to auxiliary definitions in the header file right above and (2) it is short and I hope it remains bounded in length. I'm worried about "generic rules" because they will keep growing, and may prevent novice users from easily finding where the "real grammar" is.

    (This change can be done in an independent PR.)

    Is there a way in Menhir to mention the propototype of a rule, without giving a definition? This could be moderately useful to document high-level generic rules before the grammar, with their implementation at the bottom.

    Long-term maybe we want to use Menhir grammar-inclusion support and migrate generic definitions in a parser_aux.mly file. (We could even think of going further and separate, say, the type grammar from the expression grammar from the module grammar.)

  2. I think we need a way to test AST-preserving parser changes to get assurance that silly mistakes (sometime hard for the human eye to spot) will be caught. I don't think I can resurrect the test technique that I used during the Menhir transition, because it relies on linking two different parser modules; that would mean asking users to play patchwork with an old and new grammar, and nobody wants that.

    I have another idea which is to add a Makefile rule: %.ml.ast: %.ml, which would dump the -dparsetree output of the in-trunk compiler into the .ast file. (This is where I'm glad I kept the -stop-after parsing option in the Menhir-generated parser  #292 patchset). The testing workflow is: (1) Before parser changes, produce .ast files for all .ml files in the compiler build, and commit them all in a separate git commit (to be removed later). (2) Hack on the parser. (3) At any point, reproduce .ast files and use git diff for debugging. (4) When you are done, remove the .ast commit.

    (Edit: done in makefile rule to test parser changes by comparing ASTs stored in .ml.ast files #2030)

  3. We still need to do the global $loc->$sloc migration in the grammar before more conflicts creep in. I'll try to do that shortly.

    (Edit: done in parser.mly: consistently use $sloc over $loc #2031)

  4. As someone who is not that knowledgeable or interested in the details of LR parsing, I would rather not think of having both "left lists" and "(right) lists", and have only one kind of lists, except maybe in a handful of very special conflict-avoidance-dance-done-by-experts cases.

    I can tell that you just reused whichever style each production used, but would it be possible to experiment with just moving everything to one style or another (whichever you experts think is best), and having the "other" style just for special occasions? (If I want to do this myself, do I just need to drop the l in llist and check for conflicts, and no-conflicts means it is fine?).

    I would also be interested in doing the same thing for the reversed_ stuff: it should be out of my view most of the time, and used only in exceptional cases.

  5. I wonder if we could find nice mnemonics to handle the various delimiter-position possibilities that occur in the grammar, in order to have generic rules with a shorter name that preceded_separated_and_terminated_llist(BAR, row_field) -- I think shorter names may encourage us to use the rules inline in productions, instead of systematically giving auxiliary names, which may in fact result in more readable grammars.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 7, 2018

Re. mnemonics, here would be an idea: have separated-list operators be of the form list<N>_<sepmode>(elem, sep), where <N> is the minimum number of elements of the list (list0, list1, list2) and <sepmode> specifies allowed separator placements, and is one of:

  • between: separator between each elements elem sep elem .. sep elem
  • before: separator before each element, first one optional: sep? elem sep elem ...
  • after: separator after each element, last one optional: ... elem sep elem sep?
  • around: separator between each element and around the list (one before, one after)
    (I'm not sure we actually have any need for this one?)
  • before_strict, after_strict: like before, after, but the initial/final separator is not optional

Examples:

list1_before(clause, BAR)
list1_after(expr, SEMI)
list2_between(simple_type, STAR)

I think having the separation mode right before the arguments in that order vocalizes reasonably well: "before clauses, bars"; "after expressions, semicolons"; "between types, stars". Of course mixfix syntax for parametrized rules would be nicer :-'

@xavierleroy
Copy link
Copy Markdown
Contributor

I believe that the changes go in the right direction of making the grammar less readable

Revealing slip of the pen? :-)

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 7, 2018

@xavierleroy: Argh. Edited. (I know it was a joke but let's counter-joke by replying seriously.)

There is a delicate balance to find between conciseness through abstraction and the complexity budget for human comprehension, so factorization through higher-order rules must be evaluated frankly.
But in that case I really meant the converse: I think the changes improve the grammar, through in particular the removal of List.rev in semantic actions (easy to forget) and a more systematic treatment of optional-or-not terminator-or-separator (for expr_comma_list for example).

(The current diff of the PR, starting with a not-short addition of subtly different generic rules with long-and-systematic names and long-and-comprehensive comments, does give a bit of an impression of "Our robot overlords have taken control of the parser today", hence the suggestion to move these parts further down in the grammar.)

@xavierleroy
Copy link
Copy Markdown
Contributor

It was half joke, half Freudian observation... But, yes, I support the removal of List.rev from grammar actions. A missing reverse caused weird behavior in the otherwise very nice Menhir-Coq parser of CompCert: AbsInt/CompCert@7f6149a

@fpottier
Copy link
Copy Markdown
Contributor Author

fpottier commented Sep 7, 2018

@gasche: thanks for your comments.

No, there is no currently way in Menhir of giving the type of a non-terminal symbol (although that would make sense).

Using multiple .mly files would make a lot of sense, I think.

Checking that parser changes preserve the ASTs sounds like a very good idea. Which code base to you intend to use to do this?

Regarding left-recursive vs. right-recursive lists, this can be important in order to avoid conflicts, and can also influence error reporting; I am not sure yet which of the two styles is preferable.

Regarding using rules inline, instead of systematically giving auxiliary names, I can see the temptation, but auxiliary symbols are actually quite useful, as they give a single point where a change can be applied (either a change that actually affects the language, or an internal change that affects the grammar but not the language).

Regarding your proposed mnemonics for lists, I agree that shorter names would be nice. Using 0-1-2 sounds good. I am less happy about before_strict and after_strict, which seem clunky. Of course the current naming scheme is much clunkier still!

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 7, 2018

My first proposal had before and obefore instead of before_strict and before. obefore is even clunkier! But I think we could maybe wait a bit, further factorize the grammar, and see what we really need.

I'll send a separate PR for ASTs testing.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #2030 I was able to check that this PR does not modify the parsed ASTs for the compiler distribution codebase.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 7, 2018

check-typo is complaining because you went overlength in two places:

Checking ebef4e0415d39e8bad628883e852856e1df47b17: parsing/parser.mly
./parsing/parser.mly:770.81: [long-line] line is over 80 columns
./parsing/parser.mly:2000.81: [long-line] line is over 80 columns

These need to be fixed, and otherwise I think the PR can be merged. (It doesn't have a Changes entry,
but I think I will write a common entry for all base menhir-grammar-related PRs later.)

@fpottier
Copy link
Copy Markdown
Contributor Author

fpottier commented Sep 8, 2018

Fixed the long lines.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2018

There is a conflict in boot/menhir which prevents the continuous integration (CI) tests from running again on this branch. Maybe rebase on trunk?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 8, 2018

(I went ahead and fixed the conflict.)

@gasche gasche merged commit a24ac53 into ocaml:trunk Sep 9, 2018
@fpottier fpottier deleted the style branch September 10, 2018 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants