Skip to content

fparser parser object #191

@pelson

Description

@pelson

Currently fparser does some clever magic to store some global state on fparser.two.Fortran2003.Base (a dictionary giving all subclasses of a given subclass name). I propose that rather than have global state, we have an object which encapsulates this information, and acts as the driver of the recursive parse process.

  • Each Base subclass (herein referred to as a parse element) MUST receive a Parser instance as well as the source to be parsed to a new Base.from_source class method. The existing parse-via-constructor will be deprecated, and the constructor will be used exclusively for constructing instances of that type given already constructed children (e.g. Type_Declaration_Stmt(declaration_type_spec_instance)). Existing repr of parser element instances will therefore continue to produce working code.

  • The parse element will defer its subelement parsing through the Parser instance using named references (e.g. parser.match('program-unit', source)). The Parser instance will use the equivalent of today's subclasses state to find the appropriate parse element and call its from_source or match class methods (as appropriate). This will break the existing direct linkage between classes, and will allow improved modularisation of the code, as well as the ability for parsers to be defined with specific extensions without the need for intrusive change to the class hierarchy.

  • The parser will have at least two parse methods:

    • parser.match(typename, source) - return a subclass of the given typename, or None (equivalent of today's parse element .match class method)
    • parser.parse(typename, source) - return a subclass of the given typename, or raise a NoMatchError (equivalent of today's parse element constructor)
  • Although out of scope for the initial implementation, it is envisaged that the parser will also hold state about the context of what has already been parsed. This will gradually allow issues where prior parse context is required, such as fparser2 needs more context to be able to parse correctly #190, to be addressed.

Backwards compatibility

Existing code such as Derived_Type_Stmt("type a") will only be possible with a parser instance, either directly via Derived_Type_Stmt.from_source("type a", parser) or indirectly via parser.parse(source, 'derived-type-stmt'). It is worth noting that for both pre- and post- examples it was necessary to have "created" a parser, but with the new interface it is more explicit, non-global, and allows greater flexibility with regards to parser customisation and future context information.

Benefits

  • Avoid global state, thus allowing multiple parsers to exist concurrently (e.g. f2003, f2008)
  • Explicit syntax separates parsing from constructors. Removes the surprising behaviour of Program_Unit(...) constructing a non-Program_Unit instance.
    • Allows the Parser to handle subtype recursion, thus removing the need for passing parent_cls when parsing, and avoiding the need for the parse elements to concern themselves with recursion errors (L269, L269)
  • Breaks the dependence on the actual class objects in the parse element definitions, allowing modularisation of definition (Fortran2003.py is currently ~9000 LoC), and a much easier process for swapping out definitions in custom Parser instances (e.g. f2003-strict vs f2003-with-standard-compiler-extensions)
    • Using named parse elements brings the code a step closer to the original Fortran standard definition (i.e. Data_Component_Def_Stmt -> data-component-def-stmt), and potentially allows the parse element to define subtype names in fewer places, rather than 3 in the current implementation (e.g. Declaration_Type_Spec in Data_Component_Def_Stmt)
  • Provides possibility of stateful context for addressing issues such as fparser2 needs more context to be able to parse correctly #190

Implementation plan

  • Telco to agree design and implementation plan
  • All parser elements to receive a from_source class method, and all .match methods to make use of .from_source rather than the explicit constructor form (existing form continues to be fully functional)
  • Creation of Parser class to hold the equivalent of the Base.subclasses state. All from_source calls to include the parser in the call arguments, even those that don't recurse/use the parser (e.g. terminal nodes)
  • All documentation to be updated to reflect the new interface
  • Parse-via-constructor to be deprecated
  • All appropriate unit-tests to use a new pytest fixture which is the parser context (most of the existing class constructor tests)
  • Global Base.subclasses state to be removed
  • Parse-via-constructor to be removed
  • Parse elements to gradually migrate to using parser.parse(source, type_name) or parser.match(source, type_name) within their .match class method

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions