Skip to content

Should decode() offer best-effort per-label decoding? #248

@Kludex

Description

@Kludex

idna.decode() accepts a full multi-label domain and decodes it (e.g. idna.decode("www.xn--galit-femmes-hommes-9ybf.gouv.fr") -> "www.égalité-femmes-hommes.gouv.fr"), which is great. But it is all-or-nothing: if a single label is invalid, the whole call raises, even for the labels that are fine.

>>> idna.decode("a.b.c.xn--pokxncvks")
idna.core.InvalidCodepoint: Codepoint U+3253 at position 1 of '㉓㋎㋍㋓㋕㋘' not allowed

There seems to be no option to get a best-effort result (uts46=True, strict=False don't change this on 3.13).

This matters for "decode for display" use cases (HTTP clients exposing a Unicode host). The relevant standards specify per-label processing that keeps an invalid xn-- label unchanged rather than failing the whole domain:

Because decode() doesn't offer this, downstream projects need to re-implement the label-splitting themselves. Concretely, I got a PR in httpx2 (pydantic/httpx2#979) that implements the following:

labels = []
for label in host.split("."):
    if label.startswith("xn--"):
        try:
            label = idna.decode(label)
        except idna.IDNAError:
            pass
    labels.append(label)
host = ".".join(labels)

Since decode() already accepts a whole domain and owns label splitting internally, the "keep invalid label raw" behavior arguably belongs here (maybe a flag? e.g. ignore_invalid_punycode=True, mirroring UTS #46's IgnoreInvalidPunycode) rather than being re-implemented in every consumer.

Would the maintainers be open to such an option? Or is all-or-nothing the intended contract and per-label recovery deliberately left to callers?

AI Disclaimer

This issue was created with the assistance of Claude Code. I've edited accordingly.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions