fix: Content-Encoding: gzip along with Transfer-Encoding: chunked sometimes terminates early#1608
Merged
BenWhitehead merged 3 commits intomainfrom Mar 29, 2022
Merged
fix: Content-Encoding: gzip along with Transfer-Encoding: chunked sometimes terminates early#1608BenWhitehead merged 3 commits intomainfrom
Content-Encoding: gzip along with Transfer-Encoding: chunked sometimes terminates early#1608BenWhitehead merged 3 commits intomainfrom
Conversation
… sometimes terminates early #### The issue When `GZIPInputStream` completes processing an individual member it will call `InputStream#available()` to determine if there is more stream to try and process. If the call to `available()` returns 0 `GZIPInputStream` will determine it has processed the entirety of the underlying stream. This is spurious, as `InputStream#available()` is allowed to return 0 if it would require blocking in order for more bytes to be available. When `GZIPInputStream` is reading from a `Transfer-Encoding: chunked` response, if the chunk boundary happens to align closely enough to the member boundary `GZIPInputStream` won't consume the whole response. #### The fix Add new `OptimisticAvailabilityInputStream`, which provides an optimistic "estimate" of the number of `available()` bytes in the underlying stream. When instantiating a `GZIPInputStream` for a response, automatically decorate the provided `InputStream` with an `OptimisticAvailabilityInputStream`. #### Verification This scenario isn't unique to processing of chunked responses, and can be replicated reliably using a `java.io.SequenceInputStream` with two underlying `java.io.ByteArrayInputStream`. See GzipSupportTest.java for a reproduction. The need for this class has been verified for the following JVMs: * ``` openjdk version "1.8.0_292" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode) ``` * ``` openjdk version "11.0.14.1" 2022-02-08 OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1) OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode) ``` * ``` openjdk version "17" 2021-09-14 OpenJDK Runtime Environment Temurin-17+35 (build 17+35) OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing) ```
Neenu1995
approved these changes
Mar 28, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The issue
When
GZIPInputStreamcompletes processing an individual member it will callInputStream#available()to determine if there is more stream to try and process. If the call to
available()returns 0GZIPInputStreamwill determine it has processed the entirety of the underlying stream. This isspurious, as
InputStream#available()is allowed to return 0 if it would require blocking in orderfor more bytes to be available. When
GZIPInputStreamis reading from aTransfer-Encoding: chunkedresponse, if the chunk boundary happens to align closely enough to the member boundary
GZIPInputStreamwon't consume the whole response.The fix
Add new
OptimisticAvailabilityInputStream, which provides an optimistic "estimate" of the number ofavailable()bytes in the underlying stream. When instantiating aGZIPInputStreamfor a response,automatically decorate the provided
InputStreamwith anOptimisticAvailabilityInputStream.Verification
This scenario isn't unique to processing of chunked responses, and can be replicated reliably using
a
java.io.SequenceInputStreamwith two underlyingjava.io.ByteArrayInputStream. SeeGzipSupportTest.java for a reproduction.
The need for this class has been verified for the following JVMs: