Skip to content

Conversation

@jbrockmendel
Copy link
Contributor

Implement the intermediate representation as a full-fledged (easily-subclassable) class.

A subset of parser methods are focused on taking this intermediate representation and building the final representation. These methods are moved into the new class. As a result, they actually use self! (Well a couple don't, but I kept them as-is as a token of goodwill).

Note: tokens could pretty reasonably be an attribute of the intermediate representation.

@pganssle
Copy link
Member

What exactly is subclassable about this? It has no public methods and exists only in private modules.

@jbrockmendel
Copy link
Contributor Author

Good catch. _build_result should be public. Will update.

The bit that's relevant for subclass-ability is that _result is made (or kept) an attribute of the parser class. So if you want to subclass ParseState you just can tell the parser to use your subclass by overriding parser._result.

@jbrockmendel
Copy link
Contributor Author

BTW as a follow-up I'll make a PR to ensure we get coverage for the unhit lines in _build_taware and _build_tzinfo

@pganssle
Copy link
Member

For one thing, _result is also a private attribute, so I would not suggest anyone messing with it in a subclass, though I can imagine what you meant was we'd have a parser.ResultClass class or instance attribute.

That said, if the only thing that matters is that a class implements build_result, then ParseState doesn't need to be public (and probably shouldn't be), and shouldn't be subclassed from. We would just advertise "pass us a class that implements this public interface".

I think some of this is already being done in the parserinfo class, where are actively encouraged to implement your methods. Not quite sure if it's worth making a distinction between parserinfo and parsestate here.

@pganssle
Copy link
Member

One thing I'll note is that the reason why I wanted all these broken up methods to be private wasn't because they should be private forever, but because I want to give a hard think about how the parser interface works and whether we can design something with an elegant way of implementing custom parsing behaviors, and I'm not sure a simple refactoring of the existing logic will work for that. As a result, I don't really want to commit to any significant changes to the public interface until we have an idea of what the final version should look like.

I don't think this really solves the problem that I see with customization. You either have to rely on private interfaces or completely re-implement them, neither of which is ideal.

@jbrockmendel
Copy link
Contributor Author

Several small comments instead of one long one.

_result is also a private attribute, so I would not suggest anyone messing with it in a subclass,

_result is private in the PR mainly because it is private in the status quo. The current class-in-a-class definition of _result is not a pattern I see often; I can only guess that it is defined there so that it is accessible as res = self._result() instead of res = _result().

Even if we don't put post-parsing methods into the _result/ParseState class, defining it at the normal level of indentation and pinning it to the class will be an aesthetic improvement.

@jbrockmendel
Copy link
Contributor Author

Setting aside subclassing, the methods _build_naive and _build_tzaware are nice functions (i.e. I would advocate breaking versions of them them out of parse even if we keep them in parser). Based on the ways in which these _build_foo methods actually use self, they make a lot more sense as methods of _result/ParseState than of parser.

@jbrockmendel
Copy link
Contributor Author

I think some of this is already being done in the parserinfo class, where are actively encouraged to implement your methods

Not sure what you mean here. The _build_foo methods aren't part of parserinfo... That said, I do like the idea of centralizing customizable parsing behavior in parserinfo.

@jbrockmendel
Copy link
Contributor Author

It's probably premature to separate responsibilities out of parser. But suppose for the moment that you decided it was worthwhile (say a bunch of new parsing methods are added and the class gets larger than you like). In this context, a reasonable separation of responsibilities if "everything directly dealing with parsing the string" and "everything after that".

(Note: acknowledging that the refactoring is premature is part of an effort to meet you halfway on this)

@jbrockmendel
Copy link
Contributor Author

I don't really want to commit to any significant changes to the public interface until we have an idea of what the final version should look like.

Totally on board, and this has the added benefit of not feeling like the goal posts keep moving.

@jbrockmendel
Copy link
Contributor Author

I think focusing on subclassability was a mistake on my part: it's a red herring not really central here.

So how about the following:

_construction.ParseState --> _parser._ParseState, so there is no new module and nothing new in the public namespace. The nice qualities of _build_foo are retained.

Keep _build_result private for the time being to keep options open for the future.

Make skipped_idxs an attribute of _ParseState so its one less variable to keep around, and then self._recombine_tokens makes even more sense as a method.

@jbrockmendel jbrockmendel changed the title Implement subclassable ParseState Implement intermediate representation ParseState Nov 17, 2017
@jbrockmendel
Copy link
Contributor Author

Closing for now, may revisit later. #539 implements the less controversial bits of this.

@pganssle pganssle added this to the wontfix milestone Mar 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants