7

As raised in cpython issue 88306, python WeakKeyDictionary fails for non hashable types. According to the discussion in the python issue above, this is an unnecessary restriction, using ids of the keys instead of hash would work just fine: In this special case ids are unique identifiers for the keys in the WeakKeyDictionary, because the keys are automatically removed when the original object is deleted. It is important to be aware that using ids instead of hashes is only feasible in this very special case.

We can tweak weakref.WeakKeyDictionary (see gist) to achieve the desired behaviour. In summary, this implementation wraps the weakref keys as follows:

class _IdKey:
    def __init__(self, key):
        self._id = id(key)

    def __hash__(self):
        return self._id

    def __eq__(self, other: typing_extensions.Self):
        return self._id == other._id

    def __repr__(self):
        return f"<_IdKey(_id={self._id})>"


class _IdWeakRef(_IdKey):
    def __init__(self, key, remove: typing.Callable[[typing.Any], None]):
        super().__init__(key)
        # hold weak ref to avoid garbage collection of the remove callback
        self._ref = weakref.ref(key, lambda _: remove(self))

    def __call__(self):
        # used in weakref.WeakKeyDictionary.__copy__
        return self._ref()

    def __repr__(self):
        return f"<_IdKey(_id={self._id},{self._ref})>"

class WeakKeyIdDictionary(weakref.WeakKeyDictionary):
   """
   overrides all methods involving dictionary access key 
   """
   ... https://gist.github.com/barmettl/b198f0cf6c22047df77483e8aa28f408

However, this depends on the details of the implementation of weakref.WeakKeyDictionary (using python3.10 here) and is likely to break in future (or even past) versions of python. Of course, alternatively one can just rewrite an entirely new class.

It is also possible to implement a custom __hash__ method for all classes, but this won't work when dealing with external code and will give unreliable hashes for use cases beyond weakref.WeakKeyDictionary. We can also monkey patch __hash__, but this is not possible in particular for built in classes and will have unintended effects in other parts of the code.

Thus the following question: How should one store non hashable items in a WeakKeyDictionary?

1 Answer 1

3

There is a way which does not rely on knowing the internals of WeakKeyDictionary:

from weakref import WeakKeyDictionary, WeakValueDictionary

class Id:
    def __init__(self, key):
        self._id = id(key)
    def __hash__(self):
        return self._id
    def __eq__(self, other):
        return self._id == other._id

class WeakUnhashableKeyDictionary:
    def __init__(self, *args, **kwargs):
        # TODO Do something to initialize given args and kwargs.
        self.keys = WeakValueDictionary()
        self.values = WeakKeyDictionary()

    def __getitem__(self, key):
        return self.values.__getitem__(Id(key))
    
    def __setitem__(self, key, value):
        _id = Id(key)
        # NOTE This works because key holds on _id iif key exists,
        # and _id holds on value iif _id exists. Transitivity. QED.
        # Because key is only stored as a value, it does not need to be hashable.
        self.keys.__setitem__(_id, key)
        self.values.__setitem__(_id, value)

    def __delitem__(self, key):
        self.keys.__delitem__(Id(key))
        self.values.__delitem__(Id(key))
        
    # etc. other methods should be relatively simple to implement.
    # TODO Might require some locks or care in the ordering of operations to work threaded.
    # TODO Add clean error handling.

This is just a generalization of my answer to a method caching problem.

Sign up to request clarification or add additional context in comments.

12 Comments

I think this can produce false positives from __getitem__ if a new object is allocated with an old key's ID while the corresponding Id instance and self.values entry are still around. That can happen if something like the iteration guard, (implemented here) delays cleanup of the self.keys entry.
Yup, got a false match in testing. Of course, this WeakUnhashableKeyDictionary class doesn't actually implement iteration, so I had to iterate over d.keys.values() manually to activate the iteration guard. Still, I think Id should probably hold a weak reference to the key and check self.keyref() is other.keyref() in __eq__ to avoid false matches.
With the extra validation added, I no longer get false matches.
I guess it's impossible to implement iteration on top of either of self.keys or self.values without triggering this iteration guard. If that's the case then yes we need this additional weakref to handle the case where entries are indirectly deleted while they are being iterated over. Are there any other situations were deletion is delayed?
You can try to cheat your way out by iterating over the keys of self.values but I fear circumventing the iteration guard is going to cause trouble eventually: ideone.com/8FURDf
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.