Problem
The way we specify keywords right now has many issues:
- Subtle bugs in earlier versions where keywords can/cannot be distinguished from identifiers because
Identifier / YulIdentifier / NotAnIdentifierInAnyVersion / NotAnIdentifierInSomeVersions are not accurate/complete.
- The parser has to scan identifiers, and then scan again the union of all keywords to make sure it doesn't match, which is wasteful.
Suggestion
We can create a new Keyword production kind, where each keyword can specify the list of terminals it can match, and the list of identifiers it must not match. For example:
- name: "FixedBytesKeyword"
kind: "Keyword"
unversioned:
terminals:
- "bytes"
- "bytes1"
- "bytes2"
- "bytes3"
identifiers:
- "Identifier"
- "YulIdentifier"
For identifiers, we can scan the raw identifier, then do a quick string value comparison with all possible keywords it relates to (in each version) to make sure there is no match:
fn scan_identifier_0_6_0() {
let identifier = scan_raw_identifier();
match identifier {
"keyword1" | "keyword2" | "keyword3" => {
// list of keywords defined in 0.6.0
return false;
}
_ => return true,
};
}
For keywords, it will scan the same raw identifier, then match it with the list of terminals it has:
fn scan_bytes_keyword_0_4_11() {
let identifier = scan_raw_identifier();
match identifier {
"bytes1" | "bytes2" | "bytes3" | ....... => {
return true;
}
_ => return false,
};
}
Notes
- This structure allows us to have Solidity keywords referencing
Identifier, and Yul keywords referencing YulIdentifier, and common ones like IfKeyword referencing both.
- Since keywords reference terminals (not the other way around), they can be versioned as usual, and we won't need hacks like defining our own
Keyword, and YulKeyword. We can then define ReservedWord and YulReservedWord each as a single Keyword production on its own.
- This will remove the need for
trailingContext or difference, as we can replace both with existing expressions.
- After keywords are deprecated, they should be turned into reserved words.
- Not all keywords were reserved before they were first introduced, so some were parsed as identifiers, and some produced errors (reserved words). How would you handle that? Maybe start moving versioning to expressions, so that you can have it legal in
v1, reserved in v2, supported (as a keyword) in v3, and deprecated (back to a reserved word) in v4.
Problem
The way we specify keywords right now has many issues:
Identifier/YulIdentifier/NotAnIdentifierInAnyVersion/NotAnIdentifierInSomeVersionsare not accurate/complete.Suggestion
We can create a new
Keywordproduction kind, where each keyword can specify the list of terminals it can match, and the list of identifiers it must not match. For example:For identifiers, we can scan the raw identifier, then do a quick string value comparison with all possible keywords it relates to (in each version) to make sure there is no match:
For keywords, it will scan the same raw identifier, then match it with the list of terminals it has:
Notes
Identifier, and Yul keywords referencingYulIdentifier, and common ones likeIfKeywordreferencing both.Keyword, andYulKeyword. We can then defineReservedWordandYulReservedWordeach as a singleKeywordproduction on its own.trailingContextordifference, as we can replace both with existing expressions.v1, reserved inv2, supported (as a keyword) inv3, and deprecated (back to a reserved word) inv4.