Fix an unfortunate bug with Visual Studio 2015#1862
Conversation
WalkthroughThis pull request updates the Changes
Sequence Diagram(s)sequenceDiagram
participant F as chunkmemset_16
participant C as Compiler Version Checker
participant D as Data Loading Routine
F->>C: Check MSVC compiler version
alt Compiler is old (<= threshold)
C-->>F: Old compiler detected
F->>D: Load half-chunk from memory
D-->>F: Return half-chunk
F->>D: Insert half-chunk via alternate instruction (_mm256_inserti128_si256)
else Compiler is new
C-->>F: New compiler detected
F->>D: Load and broadcast half-chunk (_mm256_broadcastsi128_si256)
end
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
ee685a4 to
5100a4b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1862 +/- ##
========================================
Coverage 80.35% 80.36%
========================================
Files 139 139
Lines 11176 11176
Branches 2867 2867
========================================
+ Hits 8981 8982 +1
- Misses 1221 1228 +7
+ Partials 974 966 -8 ☔ View full report in Codecov by Sentry. |
Evidently this instruction, despite the intrinsic having a register operand, is a memory-register instruction. There seems to be no alignment requirement for the source operand. Because of this, compilers when not optimized are doing the unaligned load and then dumping back to the stack to do the broadcasting load. In doing this, MSVC seems to be dumping to the stack with an aligned move at an unaligned address, causing a segfault. GCC does not seem to make this mistake, as it stashes to an aligned address. If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128 bit load followed by a vinserti128. This _should_ fix this (issue zlib-ng#1861).
5100a4b to
14324cf
Compare
…5 Debug x86 builds. Re-enable AVX2 on MSVC 2015. References: - zlib-ng/zlib-ng#1861 - zlib-ng/zlib-ng#1862
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
arch/x86/chunkset_avx2.c (1)
35-35: Enhance the comment with more details about the bug.The comment could be more descriptive about the Visual Studio 2015 bug and its impact. Consider adding details about the segfault that occurs due to unaligned loads and aligned moves.
- /* See explanation in chunkset_avx512.c */ + /* Visual Studio 2015 has a compiler bug where unaligned loads are written back + * to the stack using an aligned move at an unaligned address, causing segfaults. + * See detailed explanation in chunkset_avx512.c */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
arch/x86/chunkset_avx2.c(1 hunks)arch/x86/chunkset_avx512.c(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
- GitHub Check: Ubuntu GCC ARM HF No NEON ASAN
- GitHub Check: Ubuntu GCC ARM HF No ACLE ASAN
- GitHub Check: Ubuntu GCC ARM HF ASAN
🔇 Additional comments (2)
arch/x86/chunkset_avx2.c (1)
36-41: LGTM! The workaround is well-implemented.The conditional compilation block correctly implements the workaround for Visual Studio 2015's compiler bug by using a longer sequence with a 128-bit load followed by
vinserti128. This should prevent the segfault caused by unaligned loads.arch/x86/chunkset_avx512.c (1)
49-59: LGTM! Well-documented and consistent implementation.The comment provides excellent context about the Visual Studio 2015 compiler bug, its performance impact, and the rationale for the workaround. The implementation is consistent with the AVX2 version.
Evidently this instruction, despite the intrinsic having a register operand,
is a memory-register instruction. There seems to be no alignment requirement
for the source operand. Because of this, compilers when not optimized are doing
the unaligned load and then dumping back to the stack to do the broadcasting load.
In doing this, MSVC seems to be dumping to the stack with an aligned move at an
unaligned address, causing a segfault. GCC does not seem to make this mistake, as
it stashes to an aligned address.
If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
bit load followed by a vinserti128. This should fix this (issue #1861).
Summary by CodeRabbit