-
Notifications
You must be signed in to change notification settings - Fork 523
respect parsed tz instead of preferring tzlocal #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if the computer is in a timezone where daylight savings is observed, such as Europe/London, and a date string specifies a timezone, the parsed datetime may be incorrect if the parsed datetime falls within daylight savings but the date string specified the standard time this is especially problematic for users in GMT/BST when parsing dates in http or email headers, which are frequently specified as GMT
| raise ValueError("Offset must be tzinfo subclass, " | ||
| "tz string, or int offset.") | ||
| ret = ret.replace(tzinfo=tzinfo) | ||
| elif res.tzname and res.tzname in time.tzname: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would break compatibility, I'm pretty sure we shouldn't do this as implemented. The most I'd be comfortable with is moving this down the list as the last resort rather than one of the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this case would be hit if it's moved to the bottom.
Is the compatibility case that res.tzname is not None, but res.tzoffset is None?
Can you provide a date string that I can use to add a test for the compatibility concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I meant more in the abstract, I didn't really think much about the implementation. Really, though, I think a better fix is going to be somewhat more complicated, as I mentioned in #318. Basically, I think what should be done is something like this (eliding over the details because I haven't though through the logic fully):
# Code to determine if it's supposed to be `tzlocal()`
if isinstance(ret.tzinfo, tz.tzlocal):
if ret.tzname() != res.tzname or not tz.datetime_exists(ret):
if tz.datetime_ambiguous(ret) and tz.enfold(ret, fold=1).tzname() == res.tzname:
ret = tz.enfold(ret, fold=1)
elif res.tzoffset == 0:
ret = ret.replace(tzinfo=tz.tzutc())
elif res.tzoffset:
ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset))It probably needs to be a bit more complicated than that to handle all the edge cases properly, and to disambiguate ambiguous dates, but the logic is at least somewhat clear:
- If it could be parsed as
tzlocal()but applyingtzlocal()to the date doesn't give you the inputtznameyou expected (likely because of a DST mismatch), check if it's just an ambiguous date. - If it's an ambiguous date, keep
tzlocal()and set thefoldattribute to 1 to disambiguate per PEP-495. - If it's not an ambiguous date, try all the other possible ways to parse the date that aren't
tzlocal(), and if you find a good one, apply it. - If you didn't find a good one, that's an unpleasant situation, but to maintain backwards compatibility, just leave
tzlocal()attached (we can add aFutureWarningif need be on this).
I'm thinking the best implementation would be to instead of using if / elif, after the tzinfos part has been parsed, just start a queue of possible time zones, so like in this case the queue would be something like [tzlocal(),tzutc()`], then assign the first one in the list that creates a non-imaginary date, and if none of them create non-imaginary dates, return the first one in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, if you don't want to implement this, that's fine, I can branch off of your current PR and make the changes my self some time relatively soon, just let me know what you want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the feedback. You have more context than me on the issue, so I'll defer to your expertise, but I thought about the priority a little differently:
- If an offset was parsed, it should always be the exact offset in the
tzinfo.- If the offset is 0,
tzinfo=tzutc(), elsetzinfo=tz.tzoffset(res.tzname, res.tzoffset)
- If the offset is 0,
- Otherwise, if an name was parsed and the name is in
time.tzname, use the offset for the explicitly named zone.
elif res.tzoffset == 0:
ret = ret.replace(tzinfo=tz.tzutc())
elif res.tzoffset:
ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, res.tzoffset))
elif res.tzname:
try:
tzindex = time.tzname.index(res.tzname)
except ValueError:
pass
else:
offset = tzindex and time.altzone or time.timezone
ret = ret.replace(tzinfo=tz.tzoffset(res.tzname, -offset))
ret = ret.astimezone(tz.tzlocal())This assumes the date is not ambiguous if the date string has a timezone.
If you don't think this is right, I'm happy to let you take over and resolve however you think is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jstewmon I think that's a reasonable way to design things but I don't think it's the way dateutil was originally designed, which is a problem because a lot of people use dateutil and are often counting on even undocumented features. I don't love how many places dateutil depends on the user's environment (or, for example, that parser fills in missing values from datetime.now(), rather than with 1s or something), but it is what it is (at least for now), and I'm thinking that many of these features should be deprecated and switched to opt-in at some point.
Basically the problem with your approach is that if I'm some British guy in London who does something that generates strings of the form "2011-02-04 22:11:03 GMT" in the winter and "2011-08-09 11:12:03 BST", I might be counting on the fact that the parser currently will parse both of those to have the same time zone, tz.tzlocal(), and if we make your change, the original behavior will change. This seems like a somewhat far-fetched scenario, but I've been surprised before in many cases. I think we can safely infer, however, that if GMT is applied to a date that tzlocal() would never apply it to then they probably meant that to represent tz.tzutc().
I think in the dateutil 2.7.0 release, I will add a FutureWarning that some of the parser behaviors will change, and in an eventual 3.0.0 release make a bunch of backwards-incompatible changes, and de-prioritizing tzlocal() will be on that list for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking more about the problem, I agree with you.
I was originally thinking about the problem for scenarios where a date string is produced by a machine - in such cases, we can probably expect that a timezone in a date string is significant and should be honored. For example, the date string '2016-10-10T08:00:00 CST' means 2016-10-10 08:00:00-06:00 because CST is UTC-06:00.
However, in many cases the source of a date string may be from user input where the user considers the time to be authoritative and the timezone to be informative but not necessarily correct. I would call this a "colloquial" date string. For example, a user may write October 10, 8am CST, expecting the time to be 2016-10-10T08:00:00-05:00 because CST was given to mean US Central time, not exactly Central Standard Time.
My previous proposal would violate the expectations for colloquial date strings because CST would be taken explicitly.
I'm now wondering if the behavior of dateutil actually needs to change for this. Similarly to your suggestion on #318, the parse caller could generally enforce the offset of local timezones by passing tzinfos={time.tzname[0]: -time.timezone, time.tzname[1]: -time.altzone}. Having these examples in the documentation would probably be sufficient to avoid issues like the one that was observed by botocore users in the UK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable for the behavior to change just so that by default the closest to the "right" thing is inferred from whatever context we have, but I agree with you and would be very happy for any PRs that improve the documentation. If you don't have time, I will make sure that your suggested documentation goes somewhere in there - I think the docs are in need of a major re-organization and cleanup anyway, as they are somewhat hard to follow.
I'm going to leave this PR open for now because I'd like to branch off of it to implement the functionality I described above, and I can use a modified version of the test you added.
| ignoretz=True), | ||
| datetime(2003, 9, 25, 10, 36, 28)) | ||
|
|
||
| def testDateCommandFormatRespectTz(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this will always fail on Windows, so this would need to be changed in the following ways:
- Add a
@skipIfto skip this test on Windows. - Use
dateutil.tests._common.TZEnvContextfor this, rather than just setting theos.environdirectly like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the pointer here. I made the test choose the right context manager and timezone name by platform.
|
New PR at #517 |
if the computer is in a timezone where daylight savings is observed,
such as Europe/London, and a date string specifies a timezone, the
parsed datetime may be incorrect if the parsed datetime falls within
daylight savings but the date string specified the standard time
this is especially problematic for users in GMT/BST when parsing
dates in http or email headers, which are frequently specified as
GMT
fixes #318