Skip to content

Add LRU cache to gettz and factories #691

@pganssle

Description

@pganssle

Now that #635 is resolved, gettz, tzoffset and tzstr satisfies the property that two tzinfo objects (tz1 and tz2) constructed with identical arguments will always be the same object (tz1 is tz2), but the cached objects expire whenever no instances of a given timezone exist outside of the cache. This satisfies the "timezone semantics" reasons for adding a cache and will have good performance characteristics so long as at least one object holds a strong reference to the time zone.

But this does sacrifice performance for people who would prefer a cache because they are creating the same object over and over again even though it goes out of scope every time, so for example:

def NY_timestamp(dt):
    dt.astimezone(tz.gettz('America/New_York')
    return dt.timestamp()

a = []
for i in range(10000):
    a.append(NY_timestamp(datetime.now()))

I think the above would needlessly create 10000 copies of tz.gettz('America/New_York'), when even a modest LRU cache would save the trouble.

I think we should leave the _instances dictionary as a WeakValueDictionary and add a second _lru_cache dictionary that holds a limited (and configurable) number of strong references to used time zones. Not sure the best implementation of the cache itself, preferably something where left-append, right-pop and reordering a single element are all O(1), so I think something implemented with a doubly-linked list (maybe OrderedDict).

Presumably we can take some cues from the implementation of functools.lru_cache, but keep in mind that the Python implementation may not have amazing performance, since CPython has a C implementation and may not have spent much time optimizing the Python implementation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions