Skip to content

add separate keyword productions #505

@OmarTawfik

Description

@OmarTawfik

Problem

The way we specify keywords right now has many issues:

  • Subtle bugs in earlier versions where keywords can/cannot be distinguished from identifiers because Identifier / YulIdentifier / NotAnIdentifierInAnyVersion / NotAnIdentifierInSomeVersions are not accurate/complete.
  • The parser has to scan identifiers, and then scan again the union of all keywords to make sure it doesn't match, which is wasteful.

Suggestion

We can create a new Keyword production kind, where each keyword can specify the list of terminals it can match, and the list of identifiers it must not match. For example:

- name: "FixedBytesKeyword"
  kind: "Keyword"
  unversioned:
    terminals:
      - "bytes"
      - "bytes1"
      - "bytes2"
      - "bytes3"
    identifiers:
      - "Identifier"
      - "YulIdentifier"

For identifiers, we can scan the raw identifier, then do a quick string value comparison with all possible keywords it relates to (in each version) to make sure there is no match:

fn scan_identifier_0_6_0() {
  let identifier = scan_raw_identifier();
  match identifier {
    "keyword1" | "keyword2" | "keyword3" => {
      // list of keywords defined in 0.6.0
      return false;
    }
    _ => return true,
  };
}

For keywords, it will scan the same raw identifier, then match it with the list of terminals it has:

fn scan_bytes_keyword_0_4_11() {
  let identifier = scan_raw_identifier();
  match identifier {
    "bytes1" | "bytes2" | "bytes3" | ....... => {
      return true;
    }
    _ => return false,
  };
}

Notes

  • This structure allows us to have Solidity keywords referencing Identifier, and Yul keywords referencing YulIdentifier, and common ones like IfKeyword referencing both.
  • Since keywords reference terminals (not the other way around), they can be versioned as usual, and we won't need hacks like defining our own Keyword, and YulKeyword. We can then define ReservedWord and YulReservedWord each as a single Keyword production on its own.
  • This will remove the need for trailingContext or difference, as we can replace both with existing expressions.
  • After keywords are deprecated, they should be turned into reserved words.
  • Not all keywords were reserved before they were first introduced, so some were parsed as identifiers, and some produced errors (reserved words). How would you handle that? Maybe start moving versioning to expressions, so that you can have it legal in v1, reserved in v2, supported (as a keyword) in v3, and deprecated (back to a reserved word) in v4.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions