idna.decode() accepts a full multi-label domain and decodes it (e.g. idna.decode("www.xn--galit-femmes-hommes-9ybf.gouv.fr") -> "www.égalité-femmes-hommes.gouv.fr"), which is great. But it is all-or-nothing: if a single label is invalid, the whole call raises, even for the labels that are fine.
>>> idna.decode("a.b.c.xn--pokxncvks")
idna.core.InvalidCodepoint: Codepoint U+3253 at position 1 of '㉓㋎㋍㋓㋕㋘' not allowed
There seems to be no option to get a best-effort result (uts46=True, strict=False don't change this on 3.13).
This matters for "decode for display" use cases (HTTP clients exposing a Unicode host). The relevant standards specify per-label processing that keeps an invalid xn-- label unchanged rather than failing the whole domain:
Because decode() doesn't offer this, downstream projects need to re-implement the label-splitting themselves. Concretely, I got a PR in httpx2 (pydantic/httpx2#979) that implements the following:
labels = []
for label in host.split("."):
if label.startswith("xn--"):
try:
label = idna.decode(label)
except idna.IDNAError:
pass
labels.append(label)
host = ".".join(labels)
Since decode() already accepts a whole domain and owns label splitting internally, the "keep invalid label raw" behavior arguably belongs here (maybe a flag? e.g. ignore_invalid_punycode=True, mirroring UTS #46's IgnoreInvalidPunycode) rather than being re-implemented in every consumer.
Would the maintainers be open to such an option? Or is all-or-nothing the intended contract and per-label recovery deliberately left to callers?
AI Disclaimer
This issue was created with the assistance of Claude Code. I've edited accordingly.
idna.decode()accepts a full multi-label domain and decodes it (e.g.idna.decode("www.xn--galit-femmes-hommes-9ybf.gouv.fr")->"www.égalité-femmes-hommes.gouv.fr"), which is great. But it is all-or-nothing: if a single label is invalid, the whole call raises, even for the labels that are fine.There seems to be no option to get a best-effort result (
uts46=True,strict=Falsedon't change this on 3.13).This matters for "decode for display" use cases (HTTP clients exposing a Unicode host). The relevant standards specify per-label processing that keeps an invalid
xn--label unchanged rather than failing the whole domain:errors, never fails outright: https://url.spec.whatwg.org/#concept-domain-to-unicode
x/net/idnaimplements exactly this — comment literally says "Spec says keep the old label":https://github.com/golang/net/blob/8c4c965e028475082408749b50ed7a686df0d265/idna/idna.go#L393-L401
Because
decode()doesn't offer this, downstream projects need to re-implement the label-splitting themselves. Concretely, I got a PR in httpx2 (pydantic/httpx2#979) that implements the following:Since
decode()already accepts a whole domain and owns label splitting internally, the "keep invalid label raw" behavior arguably belongs here (maybe a flag? e.g.ignore_invalid_punycode=True, mirroring UTS #46'sIgnoreInvalidPunycode) rather than being re-implemented in every consumer.Would the maintainers be open to such an option? Or is all-or-nothing the intended contract and per-label recovery deliberately left to callers?
AI Disclaimer
This issue was created with the assistance of Claude Code. I've edited accordingly.