Skip to content

DISC: state storage in parsing #414

@jbrockmendel

Description

@jbrockmendel

With #409 merged, the usage of mstridx is now uniform. There are three places with:

ymd.append(value)
assert mstridx == -1
mstridx = len(ymd)-1

I propose that mstridx should be an attribute of _ymd. My preferred implementation is a new _ymd method that gets called in some cases instead of append:

   def set_month(self, value, idx):
           assert self.mstridx is None # or -1 in in the current usage
           self.append(value)
           self.mstridx = len(self) - 1
           self._midx = idx  # This is new, will be discussed below

Then in _parse, the append + assert + set-mstridx becomes ymd.set_month(value, i). Advantages:

  • One fewer state variable to carry around the _parse loop. This will make it simpler to refactor that into smaller functions.
  • _ymd.resolve_ymd no longer needs mstridx as an argument.
  • Analogous methods for set_year and set_day allow for similar validation (e.g. calling ymd.append(info.convertyear(...)) to implicitly specify ystridx)

The new attribute _midx, and the analogous _yidx, _didx are there so that we can recover information later. Keeping track of absolute position in addition to relative position may be important in cases where we need to back-track for disambiguation (see #394). This may also be helpful for addressing #125.

Related: instead of carrying around skipped_tokens and last_skipped_token_i, use a set skipped_indices. The re-combining can be done at the end of the parsing, last_skipped_token_i is easy to get, and we retain information that can be useful in potential post-processing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions