Skip to content

Conversation

@jstewmon
Copy link

@jstewmon jstewmon commented Nov 9, 2016

if the computer is in a timezone where daylight savings is observed,
such as Europe/London, and a date string specifies a timezone, the
parsed datetime may be incorrect if the parsed datetime falls within
daylight savings but the date string specified the standard time

this is especially problematic for users in GMT/BST when parsing
dates in http or email headers, which are frequently specified as
GMT

fixes #318

if the computer is in a timezone where daylight savings is observed,
such as Europe/London, and a date string specifies a timezone, the
parsed datetime may be incorrect if the parsed datetime falls within
daylight savings but the date string specified the standard time

this is especially problematic for users in GMT/BST when parsing
dates in http or email headers, which are frequently specified as
GMT
raise ValueError("Offset must be tzinfo subclass, "
"tz string, or int offset.")
ret = ret.replace(tzinfo=tzinfo)
elif res.tzname and res.tzname in time.tzname:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break compatibility, I'm pretty sure we shouldn't do this as implemented. The most I'd be comfortable with is moving this down the list as the last resort rather than one of the first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this case would be hit if it's moved to the bottom.

Is the compatibility case that res.tzname is not None, but res.tzoffset is None?

Can you provide a date string that I can use to add a test for the compatibility concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant more in the abstract, I didn't really think much about the implementation. Really, though, I think a better fix is going to be somewhat more complicated, as I mentioned in #318. Basically, I think what should be done is something like this (eliding over the details because I haven't though through the logic fully):

# Code to determine if it's supposed to be `tzlocal()`
if isinstance(ret.tzinfo, tz.tzlocal):
    if ret.tzname() != res.tzname or not tz.datetime_exists(ret):
        if tz.datetime_ambiguous(ret) and tz.enfold(ret, fold=1).tzname() == res.tzname:
            ret = tz.enfold(ret, fold=1)
        elif res.tzoffset == 0:
            ret = ret.replace(tzinfo=tz.tzutc())
        elif res.tzoffset:
            ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset))

It probably needs to be a bit more complicated than that to handle all the edge cases properly, and to disambiguate ambiguous dates, but the logic is at least somewhat clear:

  1. If it could be parsed as tzlocal() but applying tzlocal() to the date doesn't give you the input tzname you expected (likely because of a DST mismatch), check if it's just an ambiguous date.
  2. If it's an ambiguous date, keep tzlocal() and set the fold attribute to 1 to disambiguate per PEP-495.
  3. If it's not an ambiguous date, try all the other possible ways to parse the date that aren't tzlocal(), and if you find a good one, apply it.
  4. If you didn't find a good one, that's an unpleasant situation, but to maintain backwards compatibility, just leave tzlocal() attached (we can add a FutureWarning if need be on this).

I'm thinking the best implementation would be to instead of using if / elif, after the tzinfos part has been parsed, just start a queue of possible time zones, so like in this case the queue would be something like [tzlocal(),tzutc()`], then assign the first one in the list that creates a non-imaginary date, and if none of them create non-imaginary dates, return the first one in the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, if you don't want to implement this, that's fine, I can branch off of your current PR and make the changes my self some time relatively soon, just let me know what you want to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the feedback. You have more context than me on the issue, so I'll defer to your expertise, but I thought about the priority a little differently:

  1. If an offset was parsed, it should always be the exact offset in the tzinfo.
    • If the offset is 0, tzinfo=tzutc(), else tzinfo=tz.tzoffset(res.tzname, res.tzoffset)
  2. Otherwise, if an name was parsed and the name is in time.tzname, use the offset for the explicitly named zone.
            elif res.tzoffset == 0:
                ret = ret.replace(tzinfo=tz.tzutc())
            elif res.tzoffset:
                ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset))
            elif res.tzname:
                try:
                    tzindex = time.tzname.index(res.tzname)
                except ValueError:
                    pass
                else:
                    offset = tzindex and time.altzone or time.timezone
                    ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, -offset))
                    ret = ret.astimezone(tz.tzlocal())

This assumes the date is not ambiguous if the date string has a timezone.

If you don't think this is right, I'm happy to let you take over and resolve however you think is best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jstewmon I think that's a reasonable way to design things but I don't think it's the way dateutil was originally designed, which is a problem because a lot of people use dateutil and are often counting on even undocumented features. I don't love how many places dateutil depends on the user's environment (or, for example, that parser fills in missing values from datetime.now(), rather than with 1s or something), but it is what it is (at least for now), and I'm thinking that many of these features should be deprecated and switched to opt-in at some point.

Basically the problem with your approach is that if I'm some British guy in London who does something that generates strings of the form "2011-02-04 22:11:03 GMT" in the winter and "2011-08-09 11:12:03 BST", I might be counting on the fact that the parser currently will parse both of those to have the same time zone, tz.tzlocal(), and if we make your change, the original behavior will change. This seems like a somewhat far-fetched scenario, but I've been surprised before in many cases. I think we can safely infer, however, that if GMT is applied to a date that tzlocal() would never apply it to then they probably meant that to represent tz.tzutc().

I think in the dateutil 2.7.0 release, I will add a FutureWarning that some of the parser behaviors will change, and in an eventual 3.0.0 release make a bunch of backwards-incompatible changes, and de-prioritizing tzlocal() will be on that list for sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about the problem, I agree with you.

I was originally thinking about the problem for scenarios where a date string is produced by a machine - in such cases, we can probably expect that a timezone in a date string is significant and should be honored. For example, the date string '2016-10-10T08:00:00 CST' means 2016-10-10 08:00:00-06:00 because CST is UTC-06:00.

However, in many cases the source of a date string may be from user input where the user considers the time to be authoritative and the timezone to be informative but not necessarily correct. I would call this a "colloquial" date string. For example, a user may write October 10, 8am CST, expecting the time to be 2016-10-10T08:00:00-05:00 because CST was given to mean US Central time, not exactly Central Standard Time.

My previous proposal would violate the expectations for colloquial date strings because CST would be taken explicitly.

I'm now wondering if the behavior of dateutil actually needs to change for this. Similarly to your suggestion on #318, the parse caller could generally enforce the offset of local timezones by passing tzinfos={time.tzname[0]: -time.timezone, time.tzname[1]: -time.altzone}. Having these examples in the documentation would probably be sufficient to avoid issues like the one that was observed by botocore users in the UK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable for the behavior to change just so that by default the closest to the "right" thing is inferred from whatever context we have, but I agree with you and would be very happy for any PRs that improve the documentation. If you don't have time, I will make sure that your suggested documentation goes somewhere in there - I think the docs are in need of a major re-organization and cleanup anyway, as they are somewhat hard to follow.

I'm going to leave this PR open for now because I'd like to branch off of it to implement the functionality I described above, and I can use a modified version of the test you added.

ignoretz=True),
datetime(2003, 9, 25, 10, 36, 28))

def testDateCommandFormatRespectTz(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this will always fail on Windows, so this would need to be changed in the following ways:

  1. Add a @skipIf to skip this test on Windows.
  2. Use dateutil.tests._common.TZEnvContext for this, rather than just setting the os.environ directly like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the pointer here. I made the test choose the right context manager and timezone name by platform.

@pganssle pganssle added the parser label Nov 9, 2016
@pganssle pganssle mentioned this pull request Nov 12, 2017
@pganssle
Copy link
Member

New PR at #517

@pganssle pganssle closed this Nov 13, 2017
@pganssle pganssle added this to the wontfix milestone Mar 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC822 timestamp using GMT is parsed into BST if local time is UK time

2 participants