-
Notifications
You must be signed in to change notification settings - Fork 808
doc: add macro guide, light reorganizing #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,384 @@ | ||
|
|
||
| # Macro Overview | ||
|
|
||
| The offical paper describing the mechanics behind Lean 4's macro system can be | ||
| found in [Beyond Notations: Hygienic Macro Expansion for Theorem Proving | ||
| Languages](https://arxiv.org/abs/2001.10490) by Sebastian Ullrich and Leonardo | ||
| de Moura, and the accompanying repo with example code can be found in the | ||
| paper's code [supplement](https://github.com/Kha/macro-supplement). The | ||
| supplement also includes a working implementation of the macro expander, so it's | ||
| a good case study for people interested in the details. | ||
|
|
||
| ## What is a macro in Lean? | ||
|
|
||
| A macro is a function that takes in a syntax tree and produces a new syntax | ||
| tree. Macros are useful for many reasons, but two of the big ones are a) | ||
| allowing users to extend the language with new syntactic constructs without | ||
| having to actually expand the core language, and b) allowing users to automate | ||
| tasks that would otherwise be extremely repetitive, time-consuming, and/or | ||
| error-prone. | ||
|
|
||
| A motivating example is set builder notation. We would like to be able to write | ||
| the set of natural numbers 0, 1, and 2 as just `{0, 1, 2}`. However, Lean does | ||
| not natively support this syntax, and the actual definition of a set in Mathlib | ||
| does not let us just declare sets in this manner; naively using the set API | ||
| would force us to write `Set.insert 1 (Set.insert 2 (Set.singleton 3))`. | ||
| Instead, we can teach Lean's macro system to recognize `{0, 1, 2}` as a | ||
| shorthand for a composition of existing methods and let it do the repetitive | ||
| work of creating the `Set.insert...` invocation for us. In this way, we can have | ||
| our more readable and more convenient syntax without having to extend Lean | ||
| itself, and while retaining the simple insert/singleton API. | ||
|
|
||
| ## How macros are handled | ||
|
|
||
| The general procedure is as follows: | ||
|
|
||
| 1. Lean parses a command, creating a Lean syntax tree which contains any | ||
| unexpanded macros. | ||
|
|
||
| 2. Lean repeats the cycle (elaboration ~> (macro hygiene and expansion) ~> | ||
| elaboration...) | ||
|
|
||
| The cycle in step 2 repeats until there are no more macros which need to be | ||
| expanded, and elaboration can finish normally. This repetition is required since | ||
| macros can expand to other macros, and may expand to code that needs information | ||
| from the elaborator. As you can see, the process of macro parsing and expansion | ||
| is interleaved with the parsing and elaboration of non-macro code. | ||
|
|
||
| By default, macros in Lean are hygienic, which means the system avoids | ||
| accidental name capture when reusing the same name inside and outside the macro. | ||
| Users may occasionally want to disable hygiene, which can be accomplished with | ||
| the command `set_option hygiene false`. More in-depth information about hygiene | ||
| and how it's implemented in the official paper and supplement linked at the top | ||
| of this guide. | ||
|
|
||
| ## Elements of "a" macro (important types) | ||
|
|
||
|
|
||
| In the big picture, a macro has two components that must be implemented by the | ||
| user, parsers and syntax transformers, where the latter is a function that says | ||
| what the input syntax should expand to. There is a third component, syntax | ||
| categories, such as `term`, `tactic`, and `command`, but declaring a new syntax | ||
| category is not always necessary. When we say "parser" in the context of a | ||
| macro, we refer to the core type `Lean.ParserDescr`, which parses elements of | ||
| type `Lean.Syntax`, where `Lean.Syntax` represents elements of a Lean syntax | ||
| tree. Syntax transformers are functions of type `Syntax -> MacroM Syntax`. Lean | ||
| has a synonym for this type, which is simply `Macro`. `MacroM` is a monad that | ||
| carries state needed for macro expansion to work nicely, including the info | ||
| needed to implement hygiene. | ||
|
|
||
| As an example, we again refer to Mathlib's set builder notation: | ||
| ``` | ||
| /- Declares a parser -/ | ||
| syntax (priority := high) "{" term,+ "}" : term | ||
|
|
||
| /- Declares two expansions/syntax transformers -/ | ||
| macro_rules | ||
| | `({$x}) => `(Set.singleton $x) | ||
| | `({$x, $xs:term,*}) => `(Set.insert $x {$xs,*}) | ||
|
|
||
| /- Provided `Set` has been imported (from Mathlib4), these are all we need for `{1, 2, 3}` to be valid notation to create a literal set -/ | ||
|
|
||
| ``` | ||
|
|
||
| This example should also make clear the reason why macros (and pretty much all | ||
| of Lean 4's metaprogramming facilities) are functions that take an argument of | ||
| type `Syntax` e.g. `Syntax -> MacroM Syntax`; the leading syntax element is the | ||
| thing that actually triggers the macro expansion by matching with the declared | ||
| parser, and as a user, you will almost always be interested in inspecting and | ||
| transforming that initial syntax element (though there are cases in which it can | ||
| just be ignored, as in the parameter-less exfalso tactic). | ||
|
|
||
| Returning briefly to the API provided by Lean, `Lean.Syntax`, is pretty much | ||
| what you would expect a basic syntax tree type to look like. Below is a slightly | ||
| simplified representation which omits details in the `atom` and `ident` | ||
| constructors; users can create atoms and idents which comport with this | ||
| simplified representation using the `mkAtom` and `mkIdent` methods provided in | ||
| the `Lean` namespace. | ||
| ``` | ||
| inductive Syntax where | ||
| | missing : Syntax | ||
| | node (kind : SyntaxNodeKind) (args : Array Syntax) : Syntax | ||
| | atom : String -> Syntax | ||
| | ident : Name -> Syntax | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| For those interested, `MacroM` is a `ReaderT`: | ||
| ``` | ||
| abbrev MacroM := ReaderT Macro.Context (EStateM Macro.Exception Macro.State) | ||
| ``` | ||
|
|
||
| The other relevant components are defined as follows: | ||
| ``` | ||
| structure Context where | ||
| methods : MethodsRef | ||
| mainModule : Name | ||
| currMacroScope : MacroScope | ||
| currRecDepth : Nat := 0 | ||
| maxRecDepth : Nat := defaultMaxRecDepth | ||
| ref : Syntax | ||
|
|
||
| inductive Exception where | ||
| | error : Syntax → String → Exception | ||
| | unsupportedSyntax : Exception | ||
|
|
||
| structure State where | ||
| macroScope : MacroScope | ||
| traceMsgs : List (Prod Name String) := List.nil | ||
| deriving Inhabited | ||
| ``` | ||
|
|
||
| As a review/checklist, the three (sometimes only two depending on whether you | ||
| need a new syntax category) components users need to be concerned with are: | ||
|
|
||
| 0. You may or may not need to declare a new syntax category using | ||
| `declare_syntax_cat` | ||
| 1. Declare a parser with either `syntax` or `macro` | ||
| 2. Declare an expansion/syntax transformer with either `macro_rules` or `macro` | ||
|
|
||
| Parsers and syntax transformers can be declared manually, but use of the pattern | ||
| language and `syntax`, `macro_rules`, and `macro` is recommended. | ||
|
|
||
| ## syntax categories with declare_syntax_cat | ||
|
|
||
| `declare_syntax_cat` declares a new syntax category, like `command`, `tactic`, | ||
| or mathlib4's `binderterm`. These are the different categories of things that | ||
| can be referred to in a quote/antiquote. `declare_syntax_cat` results in a call | ||
| to `registerParserCategory` and produces a new parser descriptor: | ||
|
|
||
| ``` | ||
| set_option trace.Elab.definition true in | ||
| declare_syntax_cat binderterm | ||
|
|
||
| /- | ||
| Output: | ||
|
|
||
| [Elab.definition.body] binderterm.quot : Lean.ParserDescr := | ||
| Lean.ParserDescr.node `Lean.Parser.Term.quot 1024 | ||
| (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.symbol "`(binderterm|") | ||
| (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.unary `incQuotDepth (Lean.ParserDescr.cat `binderterm 0)) | ||
| (Lean.ParserDescr.symbol ")"))) | ||
| -/ | ||
| ``` | ||
|
|
||
| Declaring a new syntax category like this one automatically declares a quotation | ||
| operator `` `(binderterm| ...)``. These pipe prefixes `<thing>|` are used in | ||
| syntax quotations to say what category a given quotation is expected to be an | ||
| element of. The pipe prefixes are *not* used for elements in the `term` and | ||
| `command` categories (since they're considered the default), but need to be used | ||
| for everything else. | ||
|
|
||
| ## Parsers and the `syntax` keyword | ||
|
|
||
| Internally, elements of type `Lean.ParserDescr` are implemented as parser | ||
| combinators. However, Lean offers the ability to write parsers using the | ||
| macro/pattern language by way of the `syntax` keyword. This is the recommended | ||
| means of writing parsers. As an example, the parser for the `rwa` (rewrite, then | ||
| use assumption) tactic is: | ||
|
|
||
| ``` | ||
| set_option trace.Elab.definition true in | ||
| syntax "rwa " rwRuleSeq (location)? : tactic | ||
|
|
||
| /- | ||
| which expands to: | ||
| [Elab.definition.body] tacticRwa__ : Lean.ParserDescr := | ||
| Lean.ParserDescr.node `tacticRwa__ 1022 | ||
| (Lean.ParserDescr.binary `andthen | ||
| (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "rwa " false) Lean.Parser.Tactic.rwRuleSeq) | ||
| (Lean.ParserDescr.unary `optional Lean.Parser.Tactic.location)) | ||
|
|
||
| -/ | ||
|
|
||
| ``` | ||
|
|
||
| Literals are written as double-quoted strings (`"rwa "` expects the literal | ||
| sequence of characters `rwa`, while the trailing space provides a hint to the | ||
| formatter that it should add a space after `rwa` when pretty printing this | ||
| syntax); `rwRuleSeq` and `location` are themselves `ParserDescr`s, and we finish | ||
| with `: tactic` specifying that the preceding parser is for an element in the | ||
| `tactic` syntax category. The parentheses around `(location)?` are necessary | ||
| (rather than `location?`) because Lean 4 allows question marks to be used in | ||
| identifiers, so `location?` is one single identifier that ends with a question | ||
| mark, which is not what we want. | ||
|
|
||
| The name `tacticRwa__` is automatically generated. You can name parser | ||
| descriptors declared with the `syntax` keyword like so: | ||
|
|
||
| ``` | ||
| set_option trace.Elab.definition true in | ||
| syntax (name := introv) "introv " (colGt ident)* : tactic | ||
|
|
||
| [Elab.definition.body] introv : Lean.ParserDescr := | ||
| Lean.ParserDescr.node `introv 1022 | ||
| (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.nonReservedSymbol "introv " false) | ||
| (Lean.ParserDescr.unary `many | ||
| (Lean.ParserDescr.binary `andthen (Lean.ParserDescr.const `colGt) (Lean.ParserDescr.const `ident)))) | ||
| ``` | ||
|
|
||
| ## The pattern language | ||
|
|
||
| Available quantifiers are `?` (one or zero occurrences, see note below), `*` | ||
| (zero or more occurrences), and `+` (one or more occurrences). | ||
|
|
||
| Keep in mind that Lean makes `?` available for use in identifiers, so if we want | ||
| a parser to look for an optional `location`, we would need to write | ||
| `(location)?` with parenthesis acting as a separator, since `location?` would | ||
| look for something under the identifier `location?` (where the `?` is part of | ||
| the identifier). | ||
|
|
||
| Parentheses can be used as delimiters. | ||
|
|
||
| Separated lists can be constructed like so: `$ts,*` for a comma separated list. | ||
|
|
||
| "extended splices" can be constructed as `$[..]`. See the official paper (p. 12) | ||
| for more details. | ||
|
|
||
| Literals are written as double-quoted strings. A literal may use trailing | ||
| whitespace (see e.g. the `rwa` or `introv` tactics) to tell the pretty-printer | ||
| how it should be displayed, but such whitespace will not prevent a literal with | ||
| no trailing whitespace from matching. The spaces are relevant, but not | ||
| interpreted literally. When the ParserDescr is turned into a Parser, the actual | ||
| token matcher [uses the .trim of the provided | ||
| string](https://github.com/leanprover/lean4/blob/53ec43ff9b8f55989b12c271e368287b7b997b54/src/Lean/Parser/Basic.lean#L1193), | ||
| but the generated formatter [uses the spaces as | ||
| specified](https://github.com/leanprover/lean4/blob/8d370f151f7c88a687152a5b161dcb484c446ce2/src/Lean/PrettyPrinter/Formatter.lean#L328), | ||
| that is, turning the atom "rwa" in the syntax into the string rwa as part of the | ||
| pretty printed output. | ||
|
|
||
| ## Syntax expansions with `macro_rules`, and how it desugars. | ||
|
|
||
| `macro_rules` lets you declare expansions for a given `Syntax` element using a | ||
| syntax simlar to a `match` statement. The left-hand side of a match arm is a | ||
| quotation (with a leading `<cat>|` for categories other than `term` and | ||
| `command`) in which users can specify the pattern they'd like to write an | ||
| expansion for. The right-hand side returns a syntax quotation which is the | ||
| output the user wants to expand to. | ||
|
|
||
| A feature of Lean's macro system is that if there are multiple expansions for a | ||
| particular match, Lean will try the most recently declared expansion first, and | ||
| will retry with other matching expansions if the previous attempt failed. This | ||
| is particularly useful for extending existing tactics. | ||
|
|
||
| The following example shows both the retry behavior, and the fact that macros | ||
| declared using the shorthand `macro` syntax can still have additional expansions | ||
| declared with `macro_rules`. This `transitivity` tactic is implemented such that | ||
| it will work for either Nat.le or Nat.lt. The Nat.lt version was declared "most | ||
| recently", so it will be tried first, but if it fails (for example, if the | ||
| actual term in question is Nat.le) the next potential expansion will be tried: | ||
| ``` | ||
| macro "transitivity" e:(colGt term) : tactic => `(tactic| apply Nat.le_trans (m := $e)) | ||
| macro_rules | ||
| | `(tactic| transitivity $e) => `(tactic| apply Nat.lt_trans (m := $e)) | ||
|
|
||
| example (a b c : Nat) (h0 : a < b) (h1 : b < c) : a < c := by | ||
| transitivity b <;> | ||
| assumption | ||
|
|
||
| example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : a <= c := by | ||
| transitivity b <;> | ||
| assumption | ||
|
|
||
| /- This will fail, but is interesting in that it exposes the "most-recent first" behavior, since the | ||
| error message complains about being unable to unify mvar1 <= mvar2, rather than mvar1 < mvar2. -/ | ||
| example (a b c : Nat) (h0 : a <= b) (h1 : b <= c) : False := by | ||
| transitivity b <;> | ||
| assumption | ||
| ``` | ||
|
|
||
| To see the desugared definition of the actual expansion, we can again use | ||
| `set_option trace.Elab.definition true in` and observe the output of the humble | ||
| `exfalso` tactic defined in Mathlib4: | ||
| ``` | ||
|
|
||
| set_option trace.Elab.definition true in | ||
| macro "exfalso" : tactic => `(apply False.elim) | ||
|
|
||
| /- | ||
| Results in the expansion: | ||
|
|
||
| [Elab.definition.body] _aux___macroRules_tacticExfalso_1 : Lean.Macro := | ||
| fun x => | ||
| let discr := x; | ||
| /- This is where Lean tries to actually identify that it's an invocation of the exfalso tactic -/ | ||
| if Lean.Syntax.isOfKind discr `tacticExfalso = true then | ||
| let discr := Lean.Syntax.getArg discr 0; | ||
| let x := discr; | ||
| do | ||
| /- Lean getting scope/meta info from the macro monad -/ | ||
| let info ← Lean.MonadRef.mkInfoFromRefPos | ||
| let scp ← Lean.getCurrMacroScope | ||
| let mainModule ← Lean.getMainModule | ||
| pure | ||
| (Lean.Syntax.node Lean.SourceInfo.none `Lean.Parser.Tactic.seq1 | ||
| #[Lean.Syntax.node Lean.SourceInfo.none `null | ||
| #[Lean.Syntax.node Lean.SourceInfo.none `Lean.Parser.Tactic.apply | ||
| #[Lean.Syntax.atom info "apply", | ||
| Lean.Syntax.ident info (String.toSubstring "False.elim") | ||
| (Lean.addMacroScope mainModule `False.elim scp) [(`False.elim, [])]]]]) | ||
| else | ||
| /- If this wasn't actually an invocation of the exfalso tactic, throw the "unsupportedSyntax" error -/ | ||
| let discr := x; | ||
| throw Lean.Macro.Exception.unsupportedSyntax | ||
| -/ | ||
| ``` | ||
|
|
||
| We can also create the syntax transformer declaration ourselves instead of using | ||
| `macro_rules`. We'll need to name our parser and use the attribute `@[macro | ||
| myExFalsoParser]` to associate our declaration with the parser: | ||
| ``` | ||
| syntax (name := myExfalsoParser) "myExfalso" : tactic | ||
|
|
||
| -- remember that `Macro` is a synonym for `Syntax -> TacticM Unit` | ||
| @[macro myExfalsoParser] def implMyExfalso : Macro := | ||
| fun stx => `(tactic| apply False.elim) | ||
|
|
||
| example (p : Prop) (h : p) (f : p -> False) : 3 = 2 := by | ||
| myExfalso | ||
| exact f h | ||
| ``` | ||
|
|
||
| In the above example, we're still using the sugar Lean provides for creating | ||
| quotations, as it feels more intuitive and saves us some work. It is possible to | ||
| forego the sugar altogether: | ||
| ``` | ||
| syntax (name := myExfalsoParser) "myExfalso" : tactic | ||
|
|
||
| @[macro myExfalsoParser] def implMyExfalso : Lean.Macro := | ||
| fun stx => Lean.mkNode `Lean.Parser.Tactic.apply | ||
| #[Lean.mkAtomFrom stx "apply", Lean.mkCIdentFrom stx ``False.elim] | ||
|
|
||
| example (p : Prop) (h : p) (f : p -> False) : 3 = 2 := by | ||
| myExfalso | ||
| exact f h | ||
| ``` | ||
|
|
||
| ## The `macro` keyword | ||
|
|
||
| `macro` is a shortcut which allows users to declare both a parser and an | ||
| expansion at the same time as a matter of convenience. Additional expansions for | ||
| the parser generated by the `macro` invocation can be added with a separate | ||
| `macro_rules` block (see the example in the `macro_rules` section). | ||
|
|
||
| ## Unexpanders | ||
|
|
||
| TODO; for now, see the unexpander in Mathlib.Set for an example. | ||
|
|
||
| ## More illustrative examples: | ||
|
|
||
| The | ||
| [Tactic.Basic](https://github.com/leanprover-community/mathlib4/blob/master/Mathlib/Tactic/Basic.lean) | ||
| file in Mathlib4 contains many good examples to learn from. | ||
|
|
||
| ## Practical tips: | ||
|
|
||
| You can observe the output of commands and functions that in some way use the | ||
| macro system by setting this option to true : `set_option trace.Elab.definition | ||
| true` | ||
|
|
||
| Lean also offers the option of limiting the region in which option is set with | ||
| the syntax `set_option ... in`): | ||
|
|
||
| Hygiene can be disabled with the command option `set_option hygiene false` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being very late to the conversation, but this wording seems awkward now that this documentation is part of the Lean core rather than Mathlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about replacing this with the
Listliteral macro in the prelude, but it's not as simple as theSetone. If you can find a better example from the prelude that demonstrates a similar idea you're welcome to change it; I think this isn't terrible since the macro is recreated in full below the text.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my point was more the mention of "the actual definition of a set in Mathlib" -- it is somewhat awkward to mention a third party library in the Lean core's documentation. Admittedly, the
Setpart is also a probably a little confusing to CS people as they would probably think of theRBSet/HashSetfromStdin the Lean core first, which is a very different kind of set than theSetfrom mathlib.I am little curious, though, why do you think the Set macro is simpler than the List one? They both can expand similarly:
{0,1,2}to(Set.inserrt 1 (Set.insert 2 (Set.insert 3 ∅)and[0,1,2]toList.cons 1 (List.cons 2 (List.cons 3 ∅).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They expand similarly, but the definition of the list one is more complex than the set one:
lean4/src/Init/Notation.lean
Line 190 in 653ff18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the actual implementation is more complex (for optimization reasons), but for demonstration purposes they could be written essentially the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the overall point you're making; if you want to swap the
Setthing out to talk about/demo a simplified version of the list macro I'm all for it.