PEP 814: Add frozendict built-in type

You can read the PEP at: PEP 814 – Add frozendict built-in type.

Abstract

A new public immutable type frozendict is added to the builtins module.

We expect frozendict to be safe by design, as it prevents any unintended modifications. This addition benefits not only CPython’s standard library, but also third-party maintainers who can take advantage of a reliable, immutable dictionary type.


  • frozendict is not a dict subclass, but inherits directly from object.
  • The insertion order is preserved.
  • A frozendict can be hashed with hash(frozendict) if all keys and values can be hashed. Pseudo-code: hash(frozenset(frozendict.items())).

PEP 814 lists relations and differences with (old) PEP 416 (frozendict) and PEP 603 (collections.frozenmap). PEP 603 was not submitted to the Steering Council yet. While PEP 416 used sandboxing as a motivation, PEP 814 is more focused on concurrency (asyncio, concurrent.interpreters, free threading).

Donghee & Victor

86 Likes

I’m assuming frozendict would be serialisable by default with the json module (like tuple). Correct?

1 Like

Right, the json module can be modified to serialize a frozendict as a dict.

3 Likes

Will frozendict(some_dict) == some_dict be True in the same way that frozenset(some_set) == some_set is?

2 Likes

Right, you can compare frozendict to dict and it just works as expected:

>>> frozendict() == dict()
True
>>> frozendict(x=1) == dict(x=1)
True
>>> frozendict(x=1, y=2) == dict(x=1, y=2)
True
>>> frozendict(y=2, x=1) == dict(x=1, y=2)
True
12 Likes

That is maybe worth adding to the PEP itself.

3 Likes

Updates:

  • I published the reference implementation as PR gh-141508. Playing with the reference implementation may answer to some questions.
  • I implemented support for frozendict in json (marshal and pickle also support frozendict).
  • I prepared a PEP update to document frozendict == dict comparison.
15 Likes

Since collections.abc.MutableMapping inherits from collections.abc.Mapping, has it been considered to make dict inherit from frozendict instead?

Alternatively, has it been considered to add a is_frozen flag to dict instead so that an existing dict can be frozen in constant time after it’s iteratively built?

Couldn’t be happier to see this PEP this morning! Half of the magic wand wish I made in the PEP 603 discussion thread could come true!

1 Like

Great job, strongly in favor!

8 Likes

Both of these seem like negatives to implement. Having to check if a frozen type has actually been frozen yet defeats the concurrency safety use case. If we need an efficient builder for it later, the builder shouldn’t be the type itself. I’m sure people can create that without making it a switch on the type itself, or maybe even just an internal fast method when creating from a dict.

I also don’t see why you would want dict to inherit from frozendict. Anything that can take either should be able to sufficiently take any Mapping, and checking that instead works better with duck typing and permissive apis.

15 Likes

I’m glad to see this PEP. It kind of makes me pine again for PEP 351 - The freeze protocol.

Are there any places in the CPython implementation where a PyDict_Check() or PyDict_CheckExact() could reasonably take a PyFrozenDict_Type? Should there be C APIs that tests for and/or accepts read-only mappings that would accept both a dict and a frozendict?

6 Likes

Beautiful. Thank you!

Any chance for spoilers regarding planned performance optimisations in the future?

1 Like

Sure, I just modified the code for that. There are multiple places accepting a frozendict. Examples:

  • collections.defaultdict.__or__()
  • str.maketrans() (first argument)

Other functions already accept any mapping, I just added a fast-path for frozendict. Examples:

  • collections.OrderedDict.update()
  • list.extend()

The current implementation uses these 2 macros:

#define _PyAnyDict_CheckExact(ob) \
    (PyDict_CheckExact(ob) || PyFrozenDict_CheckExact(ob))
#define _PyAnyDict_Check(ob) \
    (PyDict_Check(ob) || PyFrozenDict_Check(ob))

Do you think that they should be made public? The “Any” name comes from similar functions for set and frozenset: PyAnySet_Check() and PyAnySet_CheckExact().

1 Like

I think so; it seems like it would be a commonly used API.

1 Like

I don’t know if this is relevant here, but I find this error message confusing ('tuple' as a set element):

>>> fd = frozendict(x=[1])
>>> hash(fd)
Traceback (most recent call last):
  File "<python-input-18>", line 1, in <module>
    hash(fd)
    ~~~~^^^^
TypeError: cannot use 'tuple' as a set element (unhashable type: 'list')
4 Likes

Is there a plan of letting pprint know about frozendict?

>>> import pprint
>>> d = { i : (i-1, i, i+1) for i in range(8) }
>>> pprint.pprint(d)
{0: (-1, 0, 1),
 1: (0, 1, 2),
 2: (1, 2, 3),
 3: (2, 3, 4),
 4: (3, 4, 5),
 5: (4, 5, 6),
 6: (5, 6, 7),
 7: (6, 7, 8)}
>>> pprint.pprint(frozendict(d))
frozendict({0: (-1, 0, 1), 1: (0, 1, 2), 2: (1, 2, 3), 3: (2, 3, 4), 4: (3, 4, 5), 5: (4, 5, 6), 6: (5, 6, 7), 7: (6, 7, 8)})
9 Likes

In the section comparing to PEP 603 the complexity for a copy on a frozendict is listed as 0(n). Why is that? For a (shallow) copy we can return a new reference.

3 Likes

Thanks for proposing this! I’d like to suggest something that, despite being a new feature, would benefit from being considered in the initial PEP (or at least initial implementation).

I assume an everyday use pattern for a frozendict would be: 1) prepare some data in a mutable dict. 2) convert to frozendict, discard the original. 3) spawn threads/interpreters/coroutines that read the frozen&shared version.

We need the mutable version in step 1 because we may want to build the data structure stepwise. Step 2 is potentially expensive in time and peak memory. Assuming it’s common to get rid of the mutable version (because after startup the concurrent system will only need the frozen one), you’re copying the entire thing and doubling the memory needs while that happens.

Something that would help with that is an in-place transformation from dict to frozendict. That could be possible by switching the ob_type from PyDict_Type to PyFrozenDict_Type… assuming that both have the same memory layout. In the current implementation that doesn’t happen, but it’s so close that I don’t think it would be hard to achieve. With that, you could have a freeze() method in e that mutates it into a frozendict (and you may want to have the same method in frozendict implemented as a no-op).

typical usage would be:

shared_data = build_complicated_dict()
shared_data.freeze()
start_serving_requests(shared_data)  # spawns threads/subinterpreters/etc

This would often halve memory usage on startup and would shave some startup time.
Would this make sense?

We can also make it a bit more generic and somehow check for the mapping protocol.

Without talking to the PEP authors I can guarantee that pprint would get updated if the PEP was accepted.

The potential issue with that is C code could do some shenanigans with the underlying data and invalidate that those two “copies” are the same. If you know you’re working with a frozendict you’re going to avoid bothering with copies anyway, so it shouldn’t be a big deal in the end.

But now you have made that dictionary frozen for everyone who holds a reference to it, which means side-effects at a distance in a way that could be unexpected (e.g. context switch in a thread and now suddenly you’re going to get an exception trying to mutate what was a dict a microsecond ago but is now frozen). That seems like asking for really nasty debugging issues just to optimize some creation time.

Remember this is not being pitched as a perf win, but as a way to lessen bugs in concurrent code because you can’t share immutable dicts write now. And we tend to try and design around programmer ergonomics more than perf and this suggestion seems to go against that general goal.

20 Likes