Fix HTTP startline validation#16022
Conversation
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes netty#16020
codec-http/src/main/java/io/netty/handler/codec/http/HttpUtil.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Jonas Konrad <me@yawk.at>
|
As for performance, previously we had: And in this PR: 😑 |
|
Yea, not ideal and I'm sure this can still be optimized, it should be a relatively common task... But let's get the fix out first. |
|
|
||
| private static long charMask(CharSequence token, int i) { | ||
| char c = token.charAt(i); | ||
| return c < 64 ? 1L << c : 0; |
There was a problem hiding this comment.
Why not using s lookup table for this or
long mask = ((c - 64) >> 63);
return mask & (1L << c);
Assuming c to be >= 0.
I made it by phone so could be wrong but I think you got the idea; mask should use the msb to create a full 0s or 1s mask to produce the actual value. In this way there won't be any comparison.
A lookup table could be used similarly by computing an index to lookup into a table with N + 1 elements
| charMask(token, i + 1) | | ||
| charMask(token, i + 2) | | ||
| charMask(token, i + 3); | ||
| if ((chars & ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK) != 0) { |
There was a problem hiding this comment.
I am not sure batching would pay off here: validations are a strange affair!
If we assume is rare to return false, computing the 4 ors and one and with the illegal request line is still not ideal!
Just writing straight comparison chart per char (no need to or and batch) should perform 4 more ands (which are bad) and 4 comparisons, but will make the loads independent..in short comparison outcomes are processed even before CPU does fetch and decode and keeping the 4 char loads independent would make the pipeline able to keep on speculating and processing more chars.
There was a problem hiding this comment.
It could payoff only If we could use VarHandle and read from a byte[] 4 chars in parallel into a long, saving 3 loads to happen.
Clearly charMask should use the branchless version I explained and be made batchy as well
There was a problem hiding this comment.
In short, I would try just the simple loop case here (using the branchless code I have provided)
|
@franz1981 I tried your suggestions but I'm not finding them profitable on my system. Batching and using the conditional remains the fastest approach so far. |
|
With the masking from #16022 (comment) With conditional in charMask but no batching: With both suggestions together: |
|
I will check on my Ryzen next week but my guess is that you haven't enough random samples w a distribution of chars below/above 64. If the validation is really stressed, branch mispredict should be rather dominant when happen, in my XP |
|
Indeed HttpRequestEncoderInsertBenchmark is just way too predictable to represent a valid benchmark (if we assume in the real world chars really can be both below/above 64 - which separate numbers from letters, for example) |
|
Let me pull it in as it is for now to fix the regression. We can work on perf improvements as a followup |
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 Co-authored-by: Chris Vest <christianvest_hansen@apple.com> Co-authored-by: Jonas Konrad <me@yawk.at>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 Co-authored-by: Chris Vest <christianvest_hansen@apple.com> Co-authored-by: Jonas Konrad <me@yawk.at>
### What changes were proposed in this pull request? This PR aims to upgrade Netty to `4.2.9.Final` by overriding the transitive one from the Apache Spark 4.1.x. ### Why are the changes needed? To bring the latest bug fixed version in Apache Spark K8s Operator like the Apache Spark 4.2.x. - https://netty.io/news/2025/12/11/4-2-8.html - [CVE-2025-67735](GHSA-84h7-rjj3-6jx4) - https://netty.io/news/2025/12/15/4-2-9.html - netty/netty#16022 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. **BEFORE** ``` $ gradle spark-operator:dependencyInsight --configuration compileClasspath --dependency io.netty | grep '^io.netty' | awk -F: '{print $NF}' | sort | uniq -c 2 4.1.119.Final -> 4.2.7.Final 9 4.1.130.Final -> 4.2.7.Final 50 4.2.7.Final ``` **AFTER** ``` $ gradle spark-operator:dependencyInsight --configuration compileClasspath --dependency io.netty | grep '^io.netty' | awk -F: '{print $NF}' | sort | uniq -c 2 4.1.119.Final -> 4.2.9.Final 9 4.1.130.Final -> 4.2.9.Final 3 4.2.7.Final -> 4.2.9.Final 22 4.2.9.Final 25 4.2.9.Final (selected by rule) ``` ### Was this patch authored or co-authored using generative AI tooling? Yes (`Opus 4.5` on `Claude Code v2.1.5`) Closes #477 from dongjoon-hyun/SPARK-55288. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Motivation:
The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs.
Modification:
Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted.
Result:
Fixes #16020