Skip to content

Allow the compiler to inline chunkcopy_safe more readily#1781

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:fix_inline_chunkcopy
Sep 26, 2024
Merged

Allow the compiler to inline chunkcopy_safe more readily#1781
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:fix_inline_chunkcopy

Conversation

@KungFuJesus
Copy link
Copy Markdown
Collaborator

By simplifying this copy ladder a bit we can shrink the code size enough that less aggressive optimizations still manage to make this function inline. By leveraging the restrict keyword, we tell the compiler that for the byte by byte copy idiom at the bottom of the ladder we have no overlap so it's safe to attempt to substitute its own safe copy sequence for those bytes that don't involve a byte at a time. I believe this should be a good compromise here that doesn't require -O3 but should still compliment it.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.16%. Comparing base (a689e10) to head (03ee4bc).
Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1781      +/-   ##
===========================================
- Coverage    33.32%   33.16%   -0.16%     
===========================================
  Files           66       66              
  Lines         5504     5490      -14     
  Branches      1227     1225       -2     
===========================================
- Hits          1834     1821      -13     
+ Misses        3410     3407       -3     
- Partials       260      262       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Sep 15, 2024

I'm little worried that the small lengths might be the most common ones in some compressed files, thus this end up being heavily data dependent. Support for restrict keyword is still quite new and limited to pure C compilers (gcc, maybe other less common), ruling out Visual C++ and Clang. C++ compilers have different meaning for restrict keyword.

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

I'm little worried that the small lengths might be the most common ones in some compressed files, thus this end up being heavily data dependent. Support for restrict keyword is still quite new and limited to pure C compilers (gcc, maybe other less common), ruling out Visual C++ and Clang. C++ compilers have different meaning for restrict keyword.

Even so I think the inlining calculus still wins from this overall. Restrict should be fairly portable by this point and I don't think we'd be the only zlib fork using that keyword.

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Sep 15, 2024

I'm little worried that the small lengths might be the most common ones in some compressed files, thus this end up being heavily data dependent. Support for restrict keyword is still quite new and limited to pure C compilers (gcc, maybe other less common), ruling out Visual C++ and Clang. C++ compilers have different meaning for restrict keyword.

Even so I think the inlining calculus still wins from this overall. Restrict should be fairly portable by this point and I don't think we'd be the only zlib fork using that keyword.

MSVC 2019 CI is just completely failing as restricted isn't supported yet...

@KungFuJesus KungFuJesus force-pushed the fix_inline_chunkcopy branch 3 times, most recently from 10eb361 to c1706b9 Compare September 16, 2024 21:49
@Dead2 Dead2 added Rebase needed Please do a 'git rebase develop yourbranch' optimization labels Sep 18, 2024
@Dead2
Copy link
Copy Markdown
Member

Dead2 commented Sep 20, 2024

@KungFuJesus This needs a rebase

As it turns out, trying to peel off the remainder with so many branches
caused the code size to inflate a bit too much that this function
wouldn't inline without some fairly aggressive optimization flags. Only
catching vector sized chunks here makes the loop body small enough and
having the byte by byte copy idiom at the bottom gives the compiler some
flexibility that it is likely to do something there.
@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

Rebased. In my testing, at least with your cov-analysis + minigzip benchmark, this is ~10% faster (without -O3):

# a689e107
adam@eggsbenedict ~/scratch/zlib-ng/build-nocompflags $ perf stat -r 10 -d -- ./minigzip -c -d -k cov-analysis-linux64-2023.6.2.tar.gz > /dev/null

 Performance counter stats for './minigzip -c -d -k cov-analysis-linux64-2023.6.2.tar.gz' (10 runs):

          5,522.80 msec task-clock                       #    1.000 CPUs utilized               ( +-  0.03% )
                13      context-switches                 #    2.354 /sec                        ( +-  5.96% )
                 3      cpu-migrations                   #    0.543 /sec                        ( +- 20.96% )
               187      page-faults                      #   33.860 /sec                        ( +-  0.13% )
    22,562,134,750      cycles                           #    4.085 GHz                         ( +-  0.03% )  (62.47%)
    35,954,472,621      instructions                     #    1.59  insn per cycle              ( +-  0.01% )  (74.98%)
     4,037,091,326      branches                         #  730.987 M/sec                       ( +-  0.01% )  (74.99%)
       177,107,826      branch-misses                    #    4.39% of all branches             ( +-  0.04% )  (75.01%)
     7,053,311,112      L1-dcache-loads                  #    1.277 G/sec                       ( +-  0.01% )  (75.02%)
       281,874,403      L1-dcache-load-misses            #    4.00% of all L1-dcache accesses   ( +-  0.07% )  (75.02%)
        13,441,014      LLC-loads                        #    2.434 M/sec                       ( +-  0.76% )  (49.99%)
         9,197,534      LLC-load-misses                  #   68.43% of all LL-cache accesses    ( +-  0.84% )  (49.97%)

           5.52109 +- 0.00158 seconds time elapsed  ( +-  0.03% )
# 03ee4bc5b
 Performance counter stats for './minigzip -c -d -k cov-analysis-linux64-2023.6.2.tar.gz' (10 runs):

          5,095.01 msec task-clock                       #    0.996 CPUs utilized               ( +-  0.48% )
               242      context-switches                 #   47.497 /sec                        ( +- 93.64% )
                 5      cpu-migrations                   #    0.981 /sec                        ( +- 44.53% )
               188      page-faults                      #   36.899 /sec                        ( +-  0.22% )
    20,808,417,599      cycles                           #    4.084 GHz                         ( +-  0.45% )  (62.47%)
    36,090,040,181      instructions                     #    1.73  insn per cycle              ( +-  0.44% )  (74.98%)
     4,075,138,765      branches                         #  799.830 M/sec                       ( +-  0.81% )  (75.00%)
       176,667,610      branch-misses                    #    4.34% of all branches             ( +-  0.12% )  (75.01%)
     7,404,718,059      L1-dcache-loads                  #    1.453 G/sec                       ( +-  0.63% )  (75.02%)
       284,674,157      L1-dcache-load-misses            #    3.84% of all L1-dcache accesses   ( +-  0.26% )  (75.02%)
        14,164,805      LLC-loads                        #    2.780 M/sec                       ( +-  2.81% )  (49.98%)
         8,427,790      LLC-load-misses                  #   59.50% of all LL-cache accesses    ( +- 10.73% )  (49.96%)

            5.1170 +- 0.0474 seconds time elapsed  ( +-  0.93% )

It probably needs wider testing before any conclusions, though.

@Dead2 Dead2 removed the Rebase needed Please do a 'git rebase develop yourbranch' label Sep 25, 2024
Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit b80eb4c into zlib-ng:develop Sep 26, 2024
@Dead2 Dead2 mentioned this pull request Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants