Skip to content

Feature Request: Allowing grammar to filter out constants #76

@davaya

Description

@davaya

Arpeggio returns a parse tree containing every byte of the input text whether it is meaningful or not. Other grammar languages (e.g., https://lark-parser.readthedocs.io/en/latest/tree_construction/#shaping-the-tree) filter out constants in the grammar by default because they don't convey useful information. This is analogous to regular expressions that return match groups but don't return characters outside the groups. (https://docs.python.org/3/howto/regex.html#grouping)

Using https://github.com/textX/Arpeggio/tree/master/examples/simple as an example, if program.simple is modified to have a parameterlist with three symbols (function fak(n, m, x)), the resulting parse tree is:

simpleLanguage [1]: function fak
. parameterlist [13]: ( n , m , x )
. block [23]: {
. . statement [29]:
. . . ifstatement [29]: if (
. . . . expression [33]:
. . . . . operation [33]: n == 0 )
. . . . block [39]: {
. . . . . statement [79]:
. . . . . . returnstatement [79]: return
. . . . . . . expression [86]: 0 ; } else
. . . . block [100]: {
. . . . . statement [161]:
. . . . . . returnstatement [161]: return
. . . . . . . expression [168]:
. . . . . . . . operation [168]: n *
. . . . . . . . . functioncall [172]: fak (
. . . . . . . . . . expressionlist [176]:
. . . . . . . . . . . expression [176]:
. . . . . . . . . . . . operation [176]: n - 1 ) ; } ; }

parameterlist contains seven tokens (, n, ,, m, ,, x, and ), but the only information of interest is the three parameters n, m, and x; the punctuation is just noise bloating up the tree. Lark discards punctuation by default, but allows rules to preserve it by beginning them with an exclamation mark, noting:

Using the ! prefix is usually a "code smell", and may point to a flaw in your grammar design.

It is possible to use visitors to transform trees into a more useful format, but it shouldn't be necessary in common cases that can be controlled by the grammar.

Feature Request: Add the ability for grammar rules to filter unneeded tokens from the parse tree.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions