Skip to content

Conversation

@markovamaria
Copy link
Contributor

Optimization update from -O2 to -O3 -flto gives up to 5% performance gain in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build

Optimization update from -O2 to -O3 -flto gives up to 5% performance gain in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build
@enjoy-binbin
Copy link
Contributor

thanks, where do you get the performance gain result, i can recall this one: #9601 (comment)

@markovamaria
Copy link
Contributor Author

markovamaria commented Aug 30, 2022

thanks, where do you get the performance gain result, i can recall this one: #9601 (comment)

Hi @enjoy-binbin ,
I've measured data on intel-redis joint lab (bare-metal / biredis, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz )

Comparison was performed for Jun'29 commit built with GCC 9.4.0, O3 flto options (through REDIS_*FLAGS) vs. the same redis bin built with GCC 9.4.0 with default options.

Measurement showed:
+5,13% in geomean for tests form 'redis-benchmarks-spec-client-runner'
21 test gives more that 5% gain, from them - 4 tests gives more 10% gain
2 tests showed about -1% to -2% in comparison with default GCC. (memtier_benchmark-1key-set-10-elements-smembers,
memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values)

@markovamaria
Copy link
Contributor Author

Adding @filipecosta90 to review

@yossigo
Copy link
Collaborator

yossigo commented Aug 30, 2022

@markovamaria @filipecosta90 Do you know which specific optimization options contribute to this improvement? Given the broad spectrum of architectures/compilers/compiler versions used, I assume -O3 may also have some negative impact, whereas -O2 with additional, specific optimizations could reduce the chance of that.

@oranagra
Copy link
Member

@markovamaria thanks. please post more details on the benchmarks you conducted and their results.
IIRC there are some regressions too.

I'm a little bit paranoid about bugs resulting from eager optimizations, maybe combined with some places where redis assumes things that the compiler concludes that are invalid. something like this.

p.s. i see a compilation error on centos, can you please look into it?

@markovamaria
Copy link
Contributor Author

@markovamaria @filipecosta90 Do you know which specific optimization options contribute to this improvement? Given the broad spectrum of architectures/compilers/compiler versions used, I assume -O3 may also have some negative impact, whereas -O2 with additional, specific optimizations could reduce the chance of that.

-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:

-fgcse-after-reload 
-fipa-cp-clone
-floop-interchange 
-floop-unroll-and-jam 
-fpeel-loops 
-fpredictive-commoning 
-fsplit-loops 
-fsplit-paths 
-ftree-loop-distribution 
-ftree-partial-pre 
-funswitch-loops 
-fvect-cost-model=dynamic 
-fversion-loops-for-strides

Taken from : https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

I didn't analyzed these options separately, so I can't answer which is then really helped.
Confirm that biggest jump happened between O2 and O3 moving. Flto with combination with O3 gives less that 1% extra gain.

@markovamaria
Copy link
Contributor Author

markovamaria commented Aug 31, 2022

@markovamaria thanks. please post more details on the benchmarks you conducted and their results. IIRC there are some regressions too.

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS).
W/ dependencies behavior is suggested in PR through optimization flag.

for_PR11207.xlsx

p.s. i see a compilation error on centos, can you please look into it?

Sure, I'll check

@filipecosta90 filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Sep 19, 2022
Warning appeared with O3, on CentOS during inlininig procedure
@markovamaria markovamaria force-pushed the def_options_update branch 2 times, most recently from 5a69986 to 98cd4da Compare September 19, 2022 20:01
@filipecosta90
Copy link
Contributor

@markovamaria please check PR markovamaria#1 to your branch. It should fix the failing CI and we can then retake this effort forward.

@markovamaria
Copy link
Contributor Author

markovamaria commented Sep 26, 2022

@markovamaria please check PR markovamaria#1 to your branch. It should fix the failing CI and we can then retake this effort forward.

Hi @filipecosta90, thanks for your suggestion. I've considered such change, but decided that it can hide potential issue because of function "lpGetEdgeStreamID" includes scenario with 'returns 0' and then last_id is not defined. In positive cases function returns 1 and last_id is defined.
I suggest fix with processing of return code for "lpGetEdgeStreamID". Please review if this approach appropriate or not from code's logic point of view.

@filipecosta90
Copy link
Contributor

WRT:

I suggest fix with processing of return code for "lpGetEdgeStreamID". Please review if this approach appropriate or not from code's logic point of view.

pinging @guybe7 if he believes that we should cover that scenario or if the lp will never be null and we can just move along with the test fix. It's also worth noting that on lpGetEdgeStreamID half of the code is never used on any cases ( probably worth cleaning in a different PR @guybe7 ? )

@guybe7
Copy link
Collaborator

guybe7 commented Sep 27, 2022

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax)
i think an assertion is in order instead of an if

re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

@filipecosta90
Copy link
Contributor

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax) i think an assertion is in order instead of an if

re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

@markovamaria given the above I suggest we move forward with PR markovamaria#1 . agree?

@markovamaria
Copy link
Contributor Author

@filipecosta90 lpGetEdgeStreamID will return 0 if lp is NULL (which is impossible in that case, otherwise it would have crashed in lpFirst(lp)) or if lp is empty (which should also be impossible, any empty listpacks should be trimmed from the rax) i think an assertion is in order instead of an if
re the cleanup: yes, i'm aware there's some dead code there, it's there for symmetry (we can comment it out but i don't think it disturbs anyone)

@markovamaria given the above I suggest we move forward with PR markovamaria#1 . agree?

done

@oranagra
Copy link
Member

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS). W/ dependencies behavior is suggested in PR through optimization flag.

for_PR11207.xlsx

So if I understand correctly, when building deps with the more aggressive optimizations, we had slightly better results, but also some regressions, we it was eventually decided to apply these just on redis' src dir?

@markovamaria
Copy link
Contributor Author

markovamaria commented Sep 30, 2022

Please find .xlsx with details attached. File contains raw data collected on bare metal from joint lab (biredis03), test by test comparison and summary conclusions for GCC 9.4.0 + O3 flto vs. GCC 9.4.0 with details. Data includes 2 variations (w/ and w/o dependencies - *FLAGS or REDIS_*FLAGS). W/ dependencies behavior is suggested in PR through optimization flag.
for_PR11207.xlsx

So if I understand correctly, when building deps with the more aggressive optimizations, we had slightly better results, but also some regressions, we it was eventually decided to apply these just on redis' src dir?

something like this. Result are similar. There is choice.
I suggest start with easiest 'src'.

@oranagra
Copy link
Member

oranagra commented Oct 2, 2022

@yossigo i guess now is a good time to merge this? anything you wanna check before we proceed?

@oranagra oranagra changed the title Optimization update Change compiler optimizations to -O3 -flto Oct 2, 2022
@oranagra oranagra merged commit 3469c65 into redis:unstable Oct 2, 2022
@enjoy-binbin
Copy link
Contributor

enjoy-binbin commented Oct 3, 2022

look like there is a build issue in test-sanitizer-address (clang): https://github.com/redis/redis/actions/runs/3170623367/jobs/5163322525

adlist.o: file not recognized: file format not recognized
clang: error: linker command failed with exit code 1 (use -v to see invocation)

in my local env, by doing this, can make it work (but I'm not familiar with this part, so just raise it, it also passed in the GH daily)

ifdef SANITIZER
ifeq ($(SANITIZER),address)
	MALLOC=libc
	CFLAGS+=-fsanitize=address -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
ifeq ($(SANITIZER),undefined)
	MALLOC=libc
	CFLAGS+=-fsanitize=undefined -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
ifeq ($(SANITIZER),thread)
	CFLAGS+=-fsanitize=thread -fno-sanitize-recover=all -fno-omit-frame-pointer
-       LDFLAGS+=-fsanitize=address
+       LDFLAGS+=-fsanitize=address -flto -fuse-ld=gold
else
    $(error "unknown sanitizer=${SANITIZER}")
endif
endif
endif
endif

@filipecosta90 filipecosta90 added action:run-benchmark Triggers the benchmark suite for this Pull Request and removed action:run-benchmark Triggers the benchmark suite for this Pull Request labels Nov 14, 2022
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
Optimization update from -O2 to -O3 -flto gives up to 5% performance gain
in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build

* Fix for false-positive warning in bitops.c
   Warning appeared with O3, on CentOS during inlininig procedure
* Fixed unitialized streamID within streamTrim() (#1)

Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Optimization update from -O2 to -O3 -flto gives up to 5% performance gain
in 'redis-benchmarks-spec-client-runner' tests geomean where GCC 9.4.0 is used for build

* Fix for false-positive warning in bitops.c
   Warning appeared with O3, on CentOS during inlininig procedure
* Fixed unitialized streamID within streamTrim() (#1)

Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants