Use aligned loads in the chorba portions of the clmul crc routines#2019
Use aligned loads in the chorba portions of the clmul crc routines#2019Dead2 merged 1 commit intozlib-ng:developfrom
Conversation
We go through the trouble to do aligned loads, we may as well let the compiler know this is certain in doing so. We can't guarantee an aligned store but at least with an aligned load the compiler can elide a load with a subsequent xor multiplication when not copying.
324ff19 to
6b5aac9
Compare
WalkthroughModified CRC32 folding implementation in x86 SIMD code to replace unaligned loads ( Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (14)📓 Common learnings📚 Learning: 2025-02-21T01:42:40.488ZApplied to files:
📚 Learning: 2024-10-07T21:18:37.806ZApplied to files:
📚 Learning: 2025-02-21T01:41:50.358ZApplied to files:
📚 Learning: 2024-10-08T21:51:45.330ZApplied to files:
📚 Learning: 2025-02-21T01:44:03.996ZApplied to files:
📚 Learning: 2025-02-23T16:51:54.545ZApplied to files:
📚 Learning: 2025-02-23T16:49:52.043ZApplied to files:
📚 Learning: 2025-02-21T01:41:10.063ZApplied to files:
📚 Learning: 2024-10-29T02:22:55.489ZApplied to files:
📚 Learning: 2024-10-29T02:18:25.966ZApplied to files:
📚 Learning: 2024-10-29T02:22:52.846ZApplied to files:
📚 Learning: 2025-06-10T07:38:03.297ZApplied to files:
📚 Learning: 2025-01-23T22:01:53.422ZApplied to files:
🧬 Code graph analysis (1)arch/x86/crc32_fold_pclmulqdq_tpl.h (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (178)
🔇 Additional comments (1)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I did confirm that this is in fact happening on the non-copying variant in several places: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2019 +/- ##
===========================================
- Coverage 82.23% 81.24% -1.00%
===========================================
Files 163 163
Lines 12863 12863
Branches 3171 3171
===========================================
- Hits 10578 10450 -128
- Misses 1243 1372 +129
+ Partials 1042 1041 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
We go through the trouble to do aligned loads, we may as well let the compile know this is certain in doing so. We can't guarantee an aligned store but at least with an aligned load the compiler can elide a load with an subsequent xor multiplication when not copying.