-
Notifications
You must be signed in to change notification settings - Fork 523
fix bug with incorrectly picking out 1-character tznames #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hm... This is probably OK, except that See Alpha Time from this list. I'm not entirely sure I want to foreclose these as a general rule. Plus, eventually we'll want to allow |
I've been giving this some thought (#541); catching e.g. "A.M." in the tokenization step would be really nice, not sure it would handle the other broken case here. Also if in the future we want to identify e.g. "US/Eastern" as a tzname, that may be easier to catch pre-tokenization since that would become
That's reasonable. Maybe if we pass |
|
@jbrockmendel Maybe take a look at how that last part is done in #227, since AFAICT that actually does work. I do want to apologize that mostly my attitude towards this parser stuff is to sorta reject a lot of these incremental changes. It's a weird situation because I have only a general sense about general techniques in parsing, and I think this sort of heavily contextualized parsing with ambiguous constraints and such is one of the harder kinds of parsing (compared to well-defined grammars). I haven't found the time to like.. read a book on parsing, so my answer to a lot of these strategy proposals is more or less, "I'm not sure how I'll feel about that after I have a reasonable framework to understand parsing problems in general, so it's probably not worth commenting on now." |
I totally understand where you're coming from, but appreciate that you get that it can be frustrating on this end. #227 looks like a reasonable approach (though parts of it would be easier if we caught some strings at tokenization-time). Is it high enough in the queue that I should wait for it to fix this bug? In it's current condition, this PR basically trades one bug for another. It avoids incorrectly identifying 1-character timezones, but will incorrectly reject Alpha Time in the (probably rare) case that a user chooses that. I think the thing to do is to pass |
|
As a general rule, when the bugs are both at least somewhat rare, I tend to put a good deal of weight on the status quo, but also reversibility (i.e. I'm more comfortable changing the behavior when you can get the old behavior by jumping through some simple hoops). Considering that I don't see any obvious seam where a user can reverse this behavior and we have some general, vague plans to fix this the right way later, I don't see any reason to make this particular trade-off. |
Absolutely agree. I'm currently working on updating this PR to fix this bug without introducing the new one. |
| def test_alpha_tzname_ampm(self): | ||
| # If the Alpha-style timezone is "A" it gets incorrectly | ||
| # identified as part of AM/PM | ||
| dstr = '2017-12-07 10:27:15A' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I would guess that 10:27:15A means AM/PM more often than it means Alpha, though I guess they did explicitly set the tzinfos...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah I'm not 100% sure what to do in this case, leaning towards agreeing on the "well if they were explicit...". Also I think "AM" and "A.M." are way more common than "A" so that should be reasonably safe.
| tzname is None and | ||
| tzoffset is None and | ||
| len(token) <= 5 and | ||
| all(x in string.ascii_uppercase for x in token)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right thing to do. If the user does pass a tzinfos, should we be strict in this branch and only accept token in tzinfos (and maybe UTC, GMT, Z or the parserinfo equivalents)? Should the token in tzinfo check also require the hour is not None and tzname is None and tzoffset is None?
I think with this fix, the errors exposed by #540 should be fixed.