Skip to content

Handle multi-frame zstd streams split across chunks#946

Merged
Kludex merged 1 commit into
pydantic:mainfrom
mbeijen:zstd.patch
May 16, 2026
Merged

Handle multi-frame zstd streams split across chunks#946
Kludex merged 1 commit into
pydantic:mainfrom
mbeijen:zstd.patch

Conversation

@mbeijen

@mbeijen mbeijen commented May 16, 2026

Copy link
Copy Markdown
Contributor

Fixes encode/httpx#3538

Without this, zstandard decompression is broken on some servers, such as for instance

Python 3.14.4 (main, Apr 14 2026, 14:26:14) [Clang 22.1.3 ] on linux
>>> import httpx2; httpx2.get('https://help.netflix.com/en/node/30081')
...
EOFError: Already at the end of a Zstandard frame.

This is extra important since httpx2 now automatically adds zstd content decoding because it uses built-in zstandard instead of only being active when the package is installed!

Comment thread src/httpx2/httpx2/_decoders.py
@Kludex

Kludex commented May 16, 2026

Copy link
Copy Markdown
Member

@mbeijen I'm confused. Do you agree with the suggestion or not? 👀

@mbeijen

mbeijen commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

Sorry for the confusion @Kludex — I went the opposite direction on the squash and should have explained.

Your suggestion (top-of-decode() check with if self.decompressor.eof and data:) handles the multi-frame-across-calls case, but it doesn't cover one scenario: an empty decode(b"") call after EOF. The and data guard skips the swap, then decompressor.decompress(b"") is called on the EOF'd decompressor and raises EOFError: Already at the end of a Zstandard frame. on stdlib compression.zstd (verified on 3.14.4). That happens internally via MultiDecoder.flush(), which calls child.decode(b"") to drain residue across stacked encodings.

The post-decompress swap avoids this because the decompressor is always fresh when the next decode() is entered, regardless of what data is. I also dropped the (now-unreachable) top check in the squashed version, which is why the diff looks different from what you reviewed.

Happy to switch to a top-check + if not data: return b"" early-return instead if you prefer that shape — it's the same number of lines and avoids the seen_eof flag. Let me know.

Fixes encode/httpx#3538

Co-authored-by: Marcelo Trylesinski <marcelotryle@gmail.com>
@mbeijen

mbeijen commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

HAHAHAHA Claude answered for me while I was checking to see if or how we can go each way. Anyway you read the text, we can avoid seen_eof flag but we must handle empty frames AND eof both

@Kludex Kludex changed the title bugfix for zstd decompressobj reuse Handle multi-frame zstd streams split across chunks May 16, 2026
@Kludex Kludex merged commit b0dc1ea into pydantic:main May 16, 2026
8 checks passed
@Kludex Kludex mentioned this pull request May 17, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants