-
Notifications
You must be signed in to change notification settings - Fork 37
Syntax Guide
This is a guide to the basic syntax of Pegasus. For more advanced topics, see the "How Do I... ?" article.
A Pegasus grammar consists of a text file with two sections, in order:
- The "Settings" section.
- The "Rules" section.
Settings are specified in one of three ways:
-
@setting valueFor simple values, just write the setting value out. This is parsed as a type name. -
@setting { value }For more complex values, wrap the setting value in curly braces. This is parsed as a code section. -
@setting "value"An alternative to using curly braces is to use a string.
-
@namespaceSpecifies the namespace in which the parser class will be placed. -
@accessibilitySpecifies the accessibility of the generated class. -
@classnameSpecifies the name of the generated class. -
@ignorecaseSpecifies the default behavior of the parser with regards to case sensitivity. -
@resourcesSpecifies the resources class to be used for resource based strings. -
@startSpecifies the starting rule. Defaults to the first rule in the grammar. -
@traceEnables or disables tracing. Defaults to false. -
@usingAdds a using directive to the generated class file. (Multiple Allowed) -
@membersAllows for the definition of additional class members.
@namespace PegExamples.Foo
@accessibility internal
@classname MyParser
@ignorecase true
@resources MyProject.Properties.Resources
@start startingRule
@trace true
@using System.Linq
@using { Foo = System.String }
@members
{
private static bool HelperFunction()
{
}
}
The basic syntax of a rule is:
name = expression
By default, rules infer their return type. For sequence expressions this is string, but this can be modified by specifying a type for the rule, like so:
name <type> = expression { ... }
Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:
rule -flag = expression
rule <type> -flag = expression
-
-memoizeEnables memoization for the rule. -
-lexicalSpecifies that the rule should be included in thelexicalElementscollection whenever it is successfully parsed. -
-exportSpecifies that this rule will be included in this grammar's exported rules. Use this to make the rule available to other parsers in a convenient format. This is primarily used for#parse{}expressions. -
-publicSpecifies that a public entry point will be made for this rule. Use this if it makes sense to parse an entire string using this rule. This could be used to provide user-input validation for primitive values supported by your parser.
- String
'foo'or"bar": String expressions match a string literally. - Character Class
[a-z]or[a-z.,0-9]or[\x1f-\xfe\u0100-\u1fff]: Matches a single character that is within the character class. - Negative Character Class
[^a-z]or[^a-z.,0-9]or[^\x1f-\xfe\u0100-\u1fff]: Matches a single character that is not within the character class. - Wildcard
.: Wildcard expressions match any single character.
Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i. For example, "foo"i 'bar'i [baz]i Or, they can be marked as case-sensitive by suffixing the string or class with the letter s.
Strings can be read from resources by suffixing the string with the letter r. The string to be parsed is then read from the grammar's resources, specified via the @resources setting described above.
- Name
a: Name expressions refer to a rule by name. - Labeled
foo:a: Labeled expressions store a parse result for use in code assertions and expressions. - Sequence
a b c: Sequence expressions match each component consecutively. - Choice
a / b / c: Choice expressions provide options for parsing. They are evaluated consecutively. - Assertions
!a &b: Assertion expressions act as look-aheads. They peek at the parsing subject, and do not logically advance the cursor (although internally they do use a cursor). - Code Assertions
!{foo} &{bar}: Code assertions are similar to regular assertions, except they represent C# code that returns a Boolean value, rather than performing a look-ahead. - Repetition
a? b+ c* d<3> e<2,> f<1,5>: Repetition expressions allow another expression to be repeated.-
expr<3>matches an expression exactly three times. -
expr<2,>matches an expression two or more times. Greedy. -
expr<1,5>matches an expression one to five times. Greedy. -
expr?matches an expression one or zero times. Equivalent toexpr<0,1>. -
expr+matches an expression one or more times. Equivalent toexpr<1,>. -
expr*matches an expression zero or more times. Equivalent toexpr<0,>.
-
- Delimited Repetition
a<0,,",">: Repetition expressions also support a delimiter that will match (and consume) in between each repeated match. - Parenthesis
( ... ): Parenthesis are used to group expressions. - Type
(<type> ... ): Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.
- Code
{ code }: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence. - Error
#error{ code }: Error-type code expressions throw aSystem.FormatExceptionwith the error message specified by the code section. The exception that is thrown will also have theData["cursor"]property set, so that the location of the error can be determined. - State
#{ code; }: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify thestateobject in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition. - Parse
#parse{ code }: Parse-type code expressions not only allow mutation of the cursor like state expressions, but also return aParseResult<T>, allowing the integration of more complex parsing logic. The canonical example of this would be using an exported rule from another Pegasus parser.
-
/* ... */Multi-line comment -
// ...Single-line comment