fix bug with incorrectly picking out 1-character tznames #554

jbrockmendel · 2017-12-06T18:26:35Z

I think with this fix, the errors exposed by #540 should be fixed.

pganssle · 2017-12-06T19:06:37Z

Hm... This is probably OK, except that Z is actually an artifact of a specific way of specifying time zone offsets that's broader and always uses a single character.

See Alpha Time from this list. I'm not entirely sure I want to foreclose these as a general rule. Plus, eventually we'll want to allow tzinfo strategies to somehow advertise the valid time zone name ranges, and we don't want these other things to break if you supply some 1-length time zone abbreviations. Probably best to do this on the tokenization step maybe?

jbrockmendel · 2017-12-06T19:23:01Z

Probably best to do this on the tokenization step maybe?

I've been giving this some thought (#541); catching e.g. "A.M." in the tokenization step would be really nice, not sure it would handle the other broken case here.

Also if in the future we want to identify e.g. "US/Eastern" as a tzname, that may be easier to catch pre-tokenization since that would become ["US", "/", "Eastern"].

eventually we'll want to allow tzinfo strategies to somehow advertise the valid time zone name ranges, and we don't want these other things to break if you supply some 1-length time zone abbreviations

That's reasonable. Maybe if we pass tzinfos through so that the appropriate check can be done in _could_be_tzname ?

pganssle · 2017-12-06T23:14:38Z

@jbrockmendel Maybe take a look at how that last part is done in #227, since AFAICT that actually does work.

I do want to apologize that mostly my attitude towards this parser stuff is to sorta reject a lot of these incremental changes. It's a weird situation because I have only a general sense about general techniques in parsing, and I think this sort of heavily contextualized parsing with ambiguous constraints and such is one of the harder kinds of parsing (compared to well-defined grammars). I haven't found the time to like.. read a book on parsing, so my answer to a lot of these strategy proposals is more or less, "I'm not sure how I'll feel about that after I have a reasonable framework to understand parsing problems in general, so it's probably not worth commenting on now."

jbrockmendel · 2017-12-07T18:02:14Z

my attitude towards this parser stuff is to sorta reject a lot of these incremental changes

I totally understand where you're coming from, but appreciate that you get that it can be frustrating on this end.

#227 looks like a reasonable approach (though parts of it would be easier if we caught some strings at tokenization-time). Is it high enough in the queue that I should wait for it to fix this bug?

In it's current condition, this PR basically trades one bug for another. It avoids incorrectly identifying 1-character timezones, but will incorrectly reject Alpha Time in the (probably rare) case that a user chooses that. I think the thing to do is to pass tzinfos to _parse and then to _could_be_tzname and Do It Right. Thoughts?

pganssle · 2017-12-07T18:31:17Z

As a general rule, when the bugs are both at least somewhat rare, I tend to put a good deal of weight on the status quo, but also reversibility (i.e. I'm more comfortable changing the behavior when you can get the old behavior by jumping through some simple hoops).

Considering that I don't see any obvious seam where a user can reverse this behavior and we have some general, vague plans to fix this the right way later, I don't see any reason to make this particular trade-off.

jbrockmendel · 2017-12-07T18:34:51Z

I tend to put a good deal of weight on the status quo

Absolutely agree. I'm currently working on updating this PR to fix this bug without introducing the new one.

pganssle · 2017-12-07T22:55:40Z

dateutil/test/test_internals.py

+    def test_alpha_tzname_ampm(self):
+        # If the Alpha-style timezone is "A" it gets incorrectly
+        # identified as part of AM/PM
+        dstr = '2017-12-07 10:27:15A'


Hm.. I would guess that 10:27:15A means AM/PM more often than it means Alpha, though I guess they did explicitly set the tzinfos...

Yah I'm not 100% sure what to do in this case, leaning towards agreeing on the "well if they were explicit...". Also I think "AM" and "A.M." are way more common than "A" so that should be reasonably safe.

jbrockmendel · 2017-12-18T16:11:34Z

dateutil/parser/_parser.py

+                    tzname is None and
+                    tzoffset is None and
+                    len(token) <= 5 and
+                    all(x in string.ascii_uppercase for x in token))


Not sure if this is the right thing to do. If the user does pass a tzinfos, should we be strict in this branch and only accept token in tzinfos (and maybe UTC, GMT, Z or the parserinfo equivalents)? Should the token in tzinfo check also require the hour is not None and tzname is None and tzoffset is None?

fix bug with incorrectly picking out 1-character tznames

23ded5e

check for Alpha Tznames

1d14531

pganssle reviewed Dec 7, 2017

View reviewed changes

This was referenced Dec 8, 2017

Warn on unidentified tz #540

Merged

Prepocessing results of timelex.split #541

Closed

jbrockmendel commented Dec 18, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix bug with incorrectly picking out 1-character tznames #554

fix bug with incorrectly picking out 1-character tznames #554

Uh oh!

jbrockmendel commented Dec 6, 2017

Uh oh!

pganssle commented Dec 6, 2017

Uh oh!

jbrockmendel commented Dec 6, 2017

Uh oh!

pganssle commented Dec 6, 2017

Uh oh!

jbrockmendel commented Dec 7, 2017

Uh oh!

pganssle commented Dec 7, 2017

Uh oh!

jbrockmendel commented Dec 7, 2017

Uh oh!

pganssle Dec 7, 2017

Uh oh!

jbrockmendel Dec 7, 2017

Uh oh!

jbrockmendel Dec 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix bug with incorrectly picking out 1-character tznames #554

Are you sure you want to change the base?

fix bug with incorrectly picking out 1-character tznames #554

Uh oh!

Conversation

jbrockmendel commented Dec 6, 2017

Uh oh!

pganssle commented Dec 6, 2017

Uh oh!

jbrockmendel commented Dec 6, 2017

Uh oh!

pganssle commented Dec 6, 2017

Uh oh!

jbrockmendel commented Dec 7, 2017

Uh oh!

pganssle commented Dec 7, 2017

Uh oh!

jbrockmendel commented Dec 7, 2017

Uh oh!

pganssle Dec 7, 2017

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 7, 2017

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 18, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants