Allow the compiler to inline chunkcopy_safe more readily#1781
Allow the compiler to inline chunkcopy_safe more readily#1781Dead2 merged 1 commit intozlib-ng:developfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1781 +/- ##
===========================================
- Coverage 33.32% 33.16% -0.16%
===========================================
Files 66 66
Lines 5504 5490 -14
Branches 1227 1225 -2
===========================================
- Hits 1834 1821 -13
+ Misses 3410 3407 -3
- Partials 260 262 +2 ☔ View full report in Codecov by Sentry. |
|
I'm little worried that the small lengths might be the most common ones in some compressed files, thus this end up being heavily data dependent. Support for |
Even so I think the inlining calculus still wins from this overall. Restrict should be fairly portable by this point and I don't think we'd be the only zlib fork using that keyword. |
MSVC 2019 CI is just completely failing as |
10eb361 to
c1706b9
Compare
|
@KungFuJesus This needs a rebase |
As it turns out, trying to peel off the remainder with so many branches caused the code size to inflate a bit too much that this function wouldn't inline without some fairly aggressive optimization flags. Only catching vector sized chunks here makes the loop body small enough and having the byte by byte copy idiom at the bottom gives the compiler some flexibility that it is likely to do something there.
c1706b9 to
03ee4bc
Compare
|
Rebased. In my testing, at least with your cov-analysis + minigzip benchmark, this is ~10% faster (without -O3): It probably needs wider testing before any conclusions, though. |
By simplifying this copy ladder a bit we can shrink the code size enough that less aggressive optimizations still manage to make this function inline. By leveraging the restrict keyword, we tell the compiler that for the byte by byte copy idiom at the bottom of the ladder we have no overlap so it's safe to attempt to substitute its own safe copy sequence for those bytes that don't involve a byte at a time. I believe this should be a good compromise here that doesn't require -O3 but should still compliment it.