Change compiler optimizations to -O3 -flto #11207

markovamaria · 2022-08-29T17:46:24Z

Optimization update from -O2 to -O3 -flto gives up to 5% performance gain in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build

enjoy-binbin · 2022-08-30T02:30:06Z

thanks, where do you get the performance gain result, i can recall this one: #9601 (comment)

markovamaria · 2022-08-30T10:00:43Z

thanks, where do you get the performance gain result, i can recall this one: #9601 (comment)

Hi @enjoy-binbin ,
I've measured data on intel-redis joint lab (bare-metal / biredis, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz )

Comparison was performed for Jun'29 commit built with GCC 9.4.0, O3 flto options (through REDIS_*FLAGS) vs. the same redis bin built with GCC 9.4.0 with default options.

Measurement showed:
+5,13% in geomean for tests form 'redis-benchmarks-spec-client-runner'
21 test gives more that 5% gain, from them - 4 tests gives more 10% gain
2 tests showed about -1% to -2% in comparison with default GCC. (memtier_benchmark-1key-set-10-elements-smembers,
memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values)

markovamaria · 2022-08-30T10:17:36Z

Adding @filipecosta90 to review

yossigo · 2022-08-30T14:34:49Z

@markovamaria @filipecosta90 Do you know which specific optimization options contribute to this improvement? Given the broad spectrum of architectures/compilers/compiler versions used, I assume -O3 may also have some negative impact, whereas -O2 with additional, specific optimizations could reduce the chance of that.

oranagra · 2022-08-31T05:52:35Z

@markovamaria thanks. please post more details on the benchmarks you conducted and their results.
IIRC there are some regressions too.

I'm a little bit paranoid about bugs resulting from eager optimizations, maybe combined with some places where redis assumes things that the compiler concludes that are invalid. something like this.

p.s. i see a compilation error on centos, can you please look into it?

markovamaria · 2022-08-31T14:09:15Z

@markovamaria @filipecosta90 Do you know which specific optimization options contribute to this improvement? Given the broad spectrum of architectures/compilers/compiler versions used, I assume -O3 may also have some negative impact, whereas -O2 with additional, specific optimizations could reduce the chance of that.

-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:

-fgcse-after-reload 
-fipa-cp-clone
-floop-interchange 
-floop-unroll-and-jam 
-fpeel-loops 
-fpredictive-commoning 
-fsplit-loops 
-fsplit-paths 
-ftree-loop-distribution 
-ftree-partial-pre 
-funswitch-loops 
-fvect-cost-model=dynamic 
-fversion-loops-for-strides

Taken from : https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I didn't analyzed these options separately, so I can't answer which is then really helped.
Confirm that biggest jump happened between O2 and O3 moving. Flto with combination with O3 gives less that 1% extra gain.

markovamaria · 2022-08-31T17:58:21Z

@markovamaria thanks. please post more details on the benchmarks you conducted and their results. IIRC there are some regressions too.

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS).
W/ dependencies behavior is suggested in PR through optimization flag.

for_PR11207.xlsx

p.s. i see a compilation error on centos, can you please look into it?

Sure, I'll check

Warning appeared with O3, on CentOS during inlininig procedure

filipecosta90 · 2022-09-26T21:52:02Z

@markovamaria please check PR markovamaria#1 to your branch. It should fix the failing CI and we can then retake this effort forward.

markovamaria · 2022-09-26T22:20:45Z

@markovamaria please check PR markovamaria#1 to your branch. It should fix the failing CI and we can then retake this effort forward.

Hi @filipecosta90, thanks for your suggestion. I've considered such change, but decided that it can hide potential issue because of function "lpGetEdgeStreamID" includes scenario with 'returns 0' and then last_id is not defined. In positive cases function returns 1 and last_id is defined.
I suggest fix with processing of return code for "lpGetEdgeStreamID". Please review if this approach appropriate or not from code's logic point of view.

filipecosta90 · 2022-09-27T10:40:14Z

WRT:

I suggest fix with processing of return code for "lpGetEdgeStreamID". Please review if this approach appropriate or not from code's logic point of view.

pinging @guybe7 if he believes that we should cover that scenario or if the lp will never be null and we can just move along with the test fix. It's also worth noting that on lpGetEdgeStreamID half of the code is never used on any cases ( probably worth cleaning in a different PR @guybe7 ? )

guybe7 · 2022-09-27T11:55:43Z

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax)
i think an assertion is in order instead of an if

re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

filipecosta90 · 2022-09-27T13:23:52Z

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax) i think an assertion is in order instead of an if

re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

@markovamaria given the above I suggest we move forward with PR markovamaria#1 . agree?

markovamaria · 2022-09-27T15:25:42Z

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax) i think an assertion is in order instead of an if
re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

@markovamaria given the above I suggest we move forward with PR markovamaria#1 . agree?

done

oranagra · 2022-09-27T19:12:33Z

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS). W/ dependencies behavior is suggested in PR through optimization flag.

for_PR11207.xlsx

So if I understand correctly, when building deps with the more aggressive optimizations, we had slightly better results, but also some regressions, we it was eventually decided to apply these just on redis' src dir?

markovamaria · 2022-09-30T15:37:13Z

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS). W/ dependencies behavior is suggested in PR through optimization flag.
for_PR11207.xlsx

So if I understand correctly, when building deps with the more aggressive optimizations, we had slightly better results, but also some regressions, we it was eventually decided to apply these just on redis' src dir?

something like this. Result are similar. There is choice.
I suggest start with easiest 'src'.

oranagra · 2022-10-02T07:16:31Z

@yossigo i guess now is a good time to merge this? anything you wanna check before we proceed?

enjoy-binbin · 2022-10-03T03:58:20Z

look like there is a build issue in test-sanitizer-address (clang): https://github.com/redis/redis/actions/runs/3170623367/jobs/5163322525

adlist.o: file not recognized: file format not recognized
clang: error: linker command failed with exit code 1 (use -v to see invocation)

in my local env, by doing this, can make it work (but I'm not familiar with this part, so just raise it, it also passed in the GH daily)

ifdef SANITIZER
ifeq ($(SANITIZER),address)
	MALLOC=libc
	CFLAGS+=-fsanitize=address -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
ifeq ($(SANITIZER),undefined)
	MALLOC=libc
	CFLAGS+=-fsanitize=undefined -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
ifeq ($(SANITIZER),thread)
	CFLAGS+=-fsanitize=thread -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
    $(error "unknown sanitizer=${SANITIZER}")
endif
endif
endif
endif

Optimization update from -O2 to -O3 -flto gives up to 5% performance gain in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build * Fix for false-positive warning in bitops.c Warning appeared with O3, on CentOS during inlininig procedure * Fixed unitialized streamID within streamTrim() (#1) Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>

Optimization update

2127d18

Optimization update from -O2 to -O3 -flto gives up to 5% performance gain in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build

filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Sep 19, 2022

Fix for false-positive warning in bitops.c

98cd4da

Warning appeared with O3, on CentOS during inlininig procedure

markovamaria force-pushed the def_options_update branch 2 times, most recently from 5a69986 to 98cd4da Compare September 19, 2022 20:01

filipecosta90 self-requested a review September 26, 2022 21:52

filipecosta90 assigned oranagra and unassigned oranagra Sep 26, 2022

filipecosta90 requested review from oranagra and yossigo September 26, 2022 21:52

filipecosta90 requested a review from guybe7 September 27, 2022 10:40

markovamaria force-pushed the def_options_update branch from 1caca3d to 98cd4da Compare September 27, 2022 14:50

Fixed unitialized streamID within streamTrim() (#1)

0094f9e

oranagra approved these changes Sep 27, 2022

View reviewed changes

filipecosta90 approved these changes Sep 29, 2022

View reviewed changes

oranagra changed the title ~~Optimization update~~ Change compiler optimizations to -O3 -flto Oct 2, 2022

oranagra merged commit 3469c65 into redis:unstable Oct 2, 2022

tezc mentioned this pull request Oct 3, 2022

Pass -flto flag to the linker #11350

Merged

filipecosta90 added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Nov 14, 2022

Change compiler optimizations to -O3 -flto #11207

Change compiler optimizations to -O3 -flto #11207

Uh oh!

Conversation

markovamaria commented Aug 29, 2022

Uh oh!

enjoy-binbin commented Aug 30, 2022

Uh oh!

markovamaria commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markovamaria commented Aug 30, 2022

Uh oh!

yossigo commented Aug 30, 2022

Uh oh!

oranagra commented Aug 31, 2022

Uh oh!

markovamaria commented Aug 31, 2022

Uh oh!

markovamaria commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipecosta90 commented Sep 26, 2022

Uh oh!

markovamaria commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipecosta90 commented Sep 27, 2022

Uh oh!

guybe7 commented Sep 27, 2022

Uh oh!

filipecosta90 commented Sep 27, 2022

Uh oh!

markovamaria commented Sep 27, 2022

Uh oh!

oranagra commented Sep 27, 2022

Uh oh!

markovamaria commented Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Oct 2, 2022

Uh oh!

enjoy-binbin commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

markovamaria commented Aug 30, 2022 •

edited

Loading

markovamaria commented Aug 31, 2022 •

edited

Loading

markovamaria commented Sep 26, 2022 •

edited

Loading

markovamaria commented Sep 30, 2022 •

edited

Loading

enjoy-binbin commented Oct 3, 2022 •

edited

Loading