Skip to content

Conversation

@pganssle
Copy link
Member

@pganssle pganssle commented Nov 7, 2017

This is more or less the strategy I've come up with for handling singleton zones. It's fairly efficient from my profiling with tzoffset.

The next obvious candidate is tzlocal(), but I think that one is slightly trickier to handle, I'll make an issue for discussion.

@pganssle pganssle added this to the 2.7.0 milestone Nov 7, 2017
@pganssle pganssle mentioned this pull request Nov 7, 2017
3 tasks
@jbrockmendel
Copy link
Contributor

I'm not totally clear what's happening here. Is the idea that you want multiple calls to tzutc() to return the same object?

@pganssle
Copy link
Member Author

pganssle commented Nov 8, 2017

@jbrockmendel Yes, but not only for tzutc. For UTC, this actually does the same thing that #497 did, but does so slightly more efficiently (the old version would call __init__ on every tzutc() call, even though a new object did not need to be initialized).

As mentioned by @abalkin in #497, there are several fast paths in the code that are activated only when dt1.tzinfo is dt2.tzinfo, even if dt1.tzinfo == dt2.tzinfo. From what he's told me in the past, the intended implementation of tzinfo instances is that there is a single instance per zone. I think there are some performance benefits to heavily memoizing these things anyway, though the trade off is that it's harder to make it thread safe.

Generally it won't come up, but there's actually a slight difference between inter-zone and intra-zone comparison semantics that comes up when you start working with ambiguous times and fold:

>>> from dateutil import tz
>>> from datetime import datetime
>>> NYC = tz.gettz('America/New_York')
>>> dt1 = datetime(2011, 11, 6, 1, 30, fold=1, tzinfo=NYC)
>>> dt2 = datetime(2011, 11, 6, 1, 30, tzinfo=NYC)
>>> dt1 == dt2
True
>>> dt1 == dt2.replace(tzinfo=tz.gettz('America/New_York'))
False

This is because intra-zone comparisons are done by comparing the naive portion of the datetime (fold is not considered), but inter-zone comparisons are done by, effectively, converting both datetimes to UTC and comparing that. Since dt1 and dt2 represent the same wall time but different absolute times, dt1 == dt2 will be true for intra-zone but false for inter-zone comparisons (and the difference between intra- and inter- is whether dt1.tzinfo is dt2.tzinfo).

In the end, having factory metaclasses that provide constructors that return the same instance to represent the same zone will bring us more in line with the preferred tzinfo implementation.

@abalkin
Copy link

abalkin commented Nov 8, 2017

.. though the trade off is that it's harder to make it thread safe.

Why is the thread safety an issue with timezone objects? Aren't they immutable? At least the simple ones such as UTC and fixed offset timezone should certainly be immutable.

@pganssle
Copy link
Member Author

pganssle commented Nov 8, 2017

@abalkin Normally yes, and certainly UTC and tzoffset will remain immutable, but one potential race condition even for the immutable classes (though I'm guessing with minor effects) at the creation of the initial object between checking if you need to create a new object and actually creating one.

For tzutc I think this won't be a problem because the first constructor call happens at import time when creating the UTC constant. For tzoffset I'm using setdefault, so I think the GIL will take care of the races there. I think a similar strategy can be employed for other tzinfo subclasses.

The bigger race condition problem I think will be if the more complex classes like tzfile are memoized. There's already a cache for tzicalvtz and I intend to add lru-style caches for the other time zone subclasses. The implementation there is more likely to have thread-safety problems (though, I imagine, with little consequence), and that would just be compounded if there were a single cache (as opposed to separate instances that may have been created on separate threads).

I don't think it's a real problem, just something to consider in the implementation.

@pganssle
Copy link
Member Author

pganssle commented Nov 8, 2017

There doesn't seem to be any controversy about the particular strategy here, so I'm merging.

@pganssle pganssle merged commit 64c6c85 into dateutil:master Nov 8, 2017
@jbrockmendel
Copy link
Contributor

A little bit late to the party... the pattern here makes sense to me, but the threadsafety issue is bothersome. What is the failure mode here?

will bring us more in line with the preferred tzinfo implementation.

+1

@pganssle
Copy link
Member Author

pganssle commented Nov 10, 2017

@jbrockmendel I suspect it would be a very unlikely confluence of things that would lead to anything resembling a bug from the way that this is currently implemented. Assuming that setdefault is thread-safe, there is no failure mode for tz.tzoffset. For tzutc there's basically no possibility of failure because the singleton is created at import time and import is thread-safe.

If we were to drop the tz.UTC construction in tz or someone were to build their own special hook that imports tz in a thread-unsafe way, then if there have been no previous calls to tz.tzutc() from any thread and one or more threads attempts to call tz.tzutc(), then there is a possible race condition here. Thread 1 could be about to call cls.__instance = super(_TzSingleton, cls).__call__() while thread 2 checks cls.__isinstance is None and gets True, and it would enter the inner scope of the if statement. At that point both threads have the following tasks to complete:

  1. Construct a _TzSingleton and assign it to cls.__instance
  2. Return cls.__instance

If they interleave [notation: (thread, step)] like so: (1, 1), (2, 1), (1, 2), (2, 2) or (2, 1) (1, 1) (2, 2), (1, 2), then there's no problem (whichever one finished the assignment first would construct an extra tzutc instance that would be deleted when its reference count gets to 0 on the completion of the second thread's assignment of cls.__instance), but if thread 1 or thread 2 completes both steps like (1, 1) (1, 2), (2, 1), (2, 2), then thread 1 and thread 2 will return two objects such that t1utc is not t2utc.

If for some reason the downstream consumer is relying on the property that tzutc() is tzutc() in all cases for some reason, that assumption would be invalidated if one of the tzutc instances is the one from the thread that finished the constructor first. All subsequent calls would still return the one from, say, thread 2. Otherwise the only consequence would be that datetime comparisons are considered inter- rather than intra- zone, which has no effect on the outcome for fixed-offset zones and might be slightly slower. It's similar to if one thread returned dateutil.tz.tzutc() and another one returned pytz.UTC except the two objects would also compare equal.

This can be fixed by just acquiring a lock around lines 10 and 11 in _TzSingleton, but given that every link in the chain that leads to a problem is incredibly unlikely, I think we're OK.

@pganssle pganssle mentioned this pull request Mar 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants