respect parsed tz instead of preferring tzlocal #319

jstewmon · 2016-11-09T17:43:22Z

if the computer is in a timezone where daylight savings is observed,
such as Europe/London, and a date string specifies a timezone, the
parsed datetime may be incorrect if the parsed datetime falls within
daylight savings but the date string specified the standard time

this is especially problematic for users in GMT/BST when parsing
dates in http or email headers, which are frequently specified as
GMT

fixes #318

if the computer is in a timezone where daylight savings is observed, such as Europe/London, and a date string specifies a timezone, the parsed datetime may be incorrect if the parsed datetime falls within daylight savings but the date string specified the standard time this is especially problematic for users in GMT/BST when parsing dates in http or email headers, which are frequently specified as GMT

pganssle · 2016-11-09T17:44:36Z

dateutil/parser.py

                    raise ValueError("Offset must be tzinfo subclass, "
                                     "tz string, or int offset.")
                ret = ret.replace(tzinfo=tzinfo)
-            elif res.tzname and res.tzname in time.tzname:


This would break compatibility, I'm pretty sure we shouldn't do this as implemented. The most I'd be comfortable with is moving this down the list as the last resort rather than one of the first.

I'm not sure how this case would be hit if it's moved to the bottom.

Is the compatibility case that res.tzname is not None, but res.tzoffset is None?

Can you provide a date string that I can use to add a test for the compatibility concern?

I think I meant more in the abstract, I didn't really think much about the implementation. Really, though, I think a better fix is going to be somewhat more complicated, as I mentioned in #318. Basically, I think what should be done is something like this (eliding over the details because I haven't though through the logic fully):

# Code to determine if it's supposed to be `tzlocal()` if isinstance(ret.tzinfo, tz.tzlocal): if ret.tzname() != res.tzname or not tz.datetime_exists(ret): if tz.datetime_ambiguous(ret) and tz.enfold(ret, fold=1).tzname() == res.tzname: ret = tz.enfold(ret, fold=1) elif res.tzoffset == 0: ret = ret.replace(tzinfo=tz.tzutc()) elif res.tzoffset: ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset))

It probably needs to be a bit more complicated than that to handle all the edge cases properly, and to disambiguate ambiguous dates, but the logic is at least somewhat clear:

If it could be parsed as tzlocal() but applying tzlocal() to the date doesn't give you the input tzname you expected (likely because of a DST mismatch), check if it's just an ambiguous date.

If it's an ambiguous date, keep tzlocal() and set the fold attribute to 1 to disambiguate per PEP-495.

If it's not an ambiguous date, try all the other possible ways to parse the date that aren't tzlocal(), and if you find a good one, apply it.

If you didn't find a good one, that's an unpleasant situation, but to maintain backwards compatibility, just leave tzlocal() attached (we can add a FutureWarning if need be on this).

I'm thinking the best implementation would be to instead of using if / elif, after the tzinfos part has been parsed, just start a queue of possible time zones, so like in this case the queue would be something like [tzlocal(),tzutc()`], then assign the first one in the list that creates a non-imaginary date, and if none of them create non-imaginary dates, return the first one in the list.

By the way, if you don't want to implement this, that's fine, I can branch off of your current PR and make the changes my self some time relatively soon, just let me know what you want to do.

Ok, thanks for the feedback. You have more context than me on the issue, so I'll defer to your expertise, but I thought about the priority a little differently:

If an offset was parsed, it should always be the exact offset in the tzinfo.

If the offset is 0, tzinfo=tzutc(), else tzinfo=tz.tzoffset(res.tzname, res.tzoffset)

Otherwise, if an name was parsed and the name is in time.tzname, use the offset for the explicitly named zone.

elif res.tzoffset == 0: ret = ret.replace(tzinfo=tz.tzutc()) elif res.tzoffset: ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset)) elif res.tzname: try: tzindex = time.tzname.index(res.tzname) except ValueError: pass else: offset = tzindex and time.altzone or time.timezone ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, -offset)) ret = ret.astimezone(tz.tzlocal())

This assumes the date is not ambiguous if the date string has a timezone.

If you don't think this is right, I'm happy to let you take over and resolve however you think is best.

@jstewmon I think that's a reasonable way to design things but I don't think it's the way dateutil was originally designed, which is a problem because a lot of people use dateutil and are often counting on even undocumented features. I don't love how many places dateutil depends on the user's environment (or, for example, that parser fills in missing values from datetime.now(), rather than with 1s or something), but it is what it is (at least for now), and I'm thinking that many of these features should be deprecated and switched to opt-in at some point.

Basically the problem with your approach is that if I'm some British guy in London who does something that generates strings of the form "2011-02-04 22:11:03 GMT" in the winter and "2011-08-09 11:12:03 BST", I might be counting on the fact that the parser currently will parse both of those to have the same time zone, tz.tzlocal(), and if we make your change, the original behavior will change. This seems like a somewhat far-fetched scenario, but I've been surprised before in many cases. I think we can safely infer, however, that if GMT is applied to a date that tzlocal() would never apply it to then they probably meant that to represent tz.tzutc().

I think in the dateutil 2.7.0 release, I will add a FutureWarning that some of the parser behaviors will change, and in an eventual 3.0.0 release make a bunch of backwards-incompatible changes, and de-prioritizing tzlocal() will be on that list for sure.

After thinking more about the problem, I agree with you.

I was originally thinking about the problem for scenarios where a date string is produced by a machine - in such cases, we can probably expect that a timezone in a date string is significant and should be honored. For example, the date string '2016-10-10T08:00:00 CST' means 2016-10-10 08:00:00-06:00 because CST is UTC-06:00.

However, in many cases the source of a date string may be from user input where the user considers the time to be authoritative and the timezone to be informative but not necessarily correct. I would call this a "colloquial" date string. For example, a user may write October 10, 8am CST, expecting the time to be 2016-10-10T08:00:00-05:00 because CST was given to mean US Central time, not exactly Central Standard Time.

My previous proposal would violate the expectations for colloquial date strings because CST would be taken explicitly.

I'm now wondering if the behavior of dateutil actually needs to change for this. Similarly to your suggestion on #318, the parse caller could generally enforce the offset of local timezones by passing tzinfos={time.tzname[0]: -time.timezone, time.tzname[1]: -time.altzone}. Having these examples in the documentation would probably be sufficient to avoid issues like the one that was observed by botocore users in the UK.

I think it's reasonable for the behavior to change just so that by default the closest to the "right" thing is inferred from whatever context we have, but I agree with you and would be very happy for any PRs that improve the documentation. If you don't have time, I will make sure that your suggested documentation goes somewhere in there - I think the docs are in need of a major re-organization and cleanup anyway, as they are somewhat hard to follow.

I'm going to leave this PR open for now because I'd like to branch off of it to implement the functionality I described above, and I can use a modified version of the test you added.

pganssle · 2016-11-09T17:47:46Z

dateutil/test/test_parser.py

                               ignoretz=True),
                         datetime(2003, 9, 25, 10, 36, 28))

+    def testDateCommandFormatRespectTz(self):


Pretty sure this will always fail on Windows, so this would need to be changed in the following ways:

Add a @skipIf to skip this test on Windows.

Use dateutil.tests._common.TZEnvContext for this, rather than just setting the os.environ directly like this.

Ah, thanks for the pointer here. I made the test choose the right context manager and timezone name by platform.

pganssle · 2017-11-13T21:55:55Z

New PR at #517

pganssle reviewed Nov 9, 2016

View reviewed changes

pganssle added the parser label Nov 9, 2016

use platform-specific context manager to change timezone

e3b56c4

pganssle mentioned this pull request Nov 12, 2017

Fix ambiguous parse #517

Merged

pganssle closed this Nov 13, 2017

pganssle added this to the wontfix milestone Mar 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

respect parsed tz instead of preferring tzlocal #319

respect parsed tz instead of preferring tzlocal #319

Uh oh!

jstewmon commented Nov 9, 2016

Uh oh!

pganssle Nov 9, 2016

Uh oh!

jstewmon Nov 9, 2016

Uh oh!

pganssle Nov 9, 2016

Uh oh!

pganssle Nov 9, 2016

Uh oh!

jstewmon Nov 9, 2016

Uh oh!

pganssle Nov 10, 2016

Uh oh!

jstewmon Nov 10, 2016

Uh oh!

pganssle Nov 10, 2016

Uh oh!

pganssle Nov 9, 2016

Uh oh!

jstewmon Nov 9, 2016

Uh oh!

pganssle commented Nov 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

respect parsed tz instead of preferring tzlocal #319

respect parsed tz instead of preferring tzlocal #319

Uh oh!

Conversation

jstewmon commented Nov 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pganssle commented Nov 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants