Compiling the run-time system at higher levels of C optimization by xavierleroy · Pull Request #226 · ocaml/ocaml

xavierleroy · 2015-08-04T12:59:09Z

Currently, the run-time system and the C stub code are compiled at a conservative -O (= -O1) optimization level. This pull request investigates the use of higher optimization levels.

Optimization level

High optimization levels are risky, because optimizing C compilers assume that the C source is free of undefined behaviors w.r.t. the ISO C standards, and optimize accordingly. There are two main sources of such undefined behaviors in the OCaml sources:

Overflow in signed integer arithmetic is undefined in ISO C. OCaml (esp. the bytecode interpreter) assumes defined behavior, with wrap around on overflow, like in Java.
Wild pointer casts. For conforming programs, the C compiler can assume that e.g. a double * pointer and an int * pointer never alias. This may not be safe for the OCaml runtime system, because it makes a lot of casts between the value type and various pointer types.

Consequently, I propose to use -O3 optimization level but tame it by selecting

-fwrapv to tell the compiler we expect signed integer arithmetic to overflow and to wrap around;
-fno-strict-aliasing to turn off type-based aliasing analysis and tolerate wild pointer casts.

These options are supported by Clang and by GCC (since version 3.4). On other compilers, the configure script selects -O optimization, like before.

Performance impact

The impact of these optimizations on OCaml performance is important for the bytecode interpreter, and barely noticeable for natively-compiled programs.

Compiling ocamlrun with gcc -O3 instead of gcc -O speeds up bytecode interpretation by a factor 1.2 to 2.2 (!) on small bytecoded benchmarks, 1.3 on a larger program (CompCert). This is primarily due to better code being generated for the Next macro of the bytecode interpreter. (gcc -O pessimizes this macro; higher optimization levels do the right thing.) With gcc -O2, performance is almost as good.

The same small set of benchmarks compiled in native code show no significant speedup or slowdown when the runtime system is compiled with gcc -O2, and a barely significant 1.03 speedup when the runtime system is compiled with gcc -O3.

My small set of benchmarks doesn't exercise much of the runtime system. Other measurements are needed.

Other proposed changes

These changes are unrelated to optimization but affect the configure script as well.

Fail if GCC version 1.x or 2.x is used. We used to detect the 2.x versions that are known to cause problems. 1.x and 2.x are too old anyway.
Put GCC in gnu99 (= ISO C99 + GNU extensions) mode instead of the default gnu89 (= ANSI C + GNU extensions) mode. I want to encourage the use of C99. Besides, clang is gnu99 or even gnu11 by default, so GCC/Clang portability is increased. As to why GNU extensions and not strict ISO C mode, some useful libc functions are not declared in standard includes in strict ISO C mode.
For GCC and Clang, add -Werror (treat warnings as errors). The warnings emitted by GCC and Clang in -Wall mode are almost always indicative of a C programming error. I'm tired to see these warnings ignored by OCaml developers who touch the runtime system.
In accordance with Doligez's law of warnings, -Werror is selected only for development versions of OCaml (with +dev in the version string), but is turned off for released versions. We must fix warnings during OCaml development, but released versions may be compiled in nonstandard C environments that we never tried, so they should have a chance to compile even with some warnings.

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16328 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

…e optimizations prudently turned off. Auxiliary changes: - Put GCC in gnu99 mode (= C99 + GNU extensions). - Check C99 conformance, warn if not. - Reject if gcc is too old ( < 3.0 ) - Stop C compilation on warnings if this is a development version of OCaml. (I'm tired of C warnings being ignored.) git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16329 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

The intent is to produce fewer warnings when configuring with -verbose. Note that the warning on "implicit declaration of function" remains, for relatively good reasons. git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16330 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

mshinwell · 2015-08-04T13:18:51Z

I am in favour. Previously, when I looked at this, I also came to the conclusion that -fwrapv -fno-strict-aliasing seemed to be the correct set of options to use.

I seem to recall that my experiments didn't show much of an increase in performance for native code. We have some better benchmarking infrastructure at Jane Street these days though, and we can try the tests again.

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16331 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

adrien-n · 2015-08-05T07:03:47Z

I like this too.

Note that there's another ticket somewhere about using SipHash in the stdlib. It has similar performance characteristics compared to the current hash function if the code is compiled with one specific optimization. In other words, there's a use for that in native code too but currently it's waiting for higher level of optimizations to be used to build the runtime.

I'd prefer to see more warnings enabled rather than have -Werror. For instance, -Wextra and -Wwrite-strings are very interesting (and another dozen ones which I don't have off-hand but should be debated in any case) . However I fear that adding -Werror now will hinder enabling additional warnings: noone will want 40 new warnings and -Werror both at once. -Werror could be added a couple months after the warnings for instance.

An alternative could be -Werror=Wall if that works for "meta" warning options (i.e. that it doesn't require specific warnings to be effective).

Should we have some kind of benchmark in the source tree for that? Optimizations in GCC and LLVM are very variable wrt. versions or C code and it would be good to be able to track this easily.

I also fear that js_of_ocaml people will complain because the bytecode interpreter is getting faster compared to jsoo's output in web browser. :P (just kidding obviously)

Your benchmarks show a nice gain for the interpreter but that doesn't mean the same gains exist for C stubs. Unfortunately this change will trickle down to other peoples' bindings and I'm a worried about the number of possible failures there (sometimes for code that hasn't been actively maintained for years). Do you think it would make sense to have something less aggressive like -O2 (and warnings) for stubs instead?

edit: GCC is c11 or gnu11 now too; surprisingly it seems to not break many things (the switch to C++11 however...)

bobzhang · 2015-08-06T00:08:06Z

how will it affect the compile time of building a compiler, can we turn -O when in dev mode?

mshinwell · 2015-08-06T06:24:58Z

Xavier: (prompted by what adrien-n writes) if I remember correctly, it is possible to pass C files to the OCaml compilers, which will then invoke the C compiler. Does your change affect the optimization flags that will be used in such a situation? I'm not sure whether we should do that. Although one might argue that if these flags are safe for the runtime, they are very likely safe for Joe Bloggs's C stubs.

adrien-n · 2015-08-06T07:08:18Z

Mark: I don't believe the optimization passes of GCC are buggy, I believe the Joe Blogg's C stubs are. :P

As for the compile time, -O3 can be quite slow but this mostly happens with large(r) compilation units. If you want to save time on that front, you should try ccache: it's a small (5% or so) overhead with an empty cache and then it runs 50 to 100 times faster than an actual compile. In other words you won't spend more time on the initial build but then it will be much faster.

xavierleroy · 2015-08-06T07:22:20Z

Concerning -Werror, I agree the ideal situation would be to stop on errors of the "default" or "-Wall" categories, and keep other warnings as warnings. I don't know how to do that in GCC or Clang, however. -Werror=all doesn't seem to do anything, only -Werror=xxx where xxx is a specific warning name. If you know the correct incantation, let me know.

At any rate, I am adamant that no C code that triggers a -Wall warning should enter the Caml sources, and I've seen too many counterexamples recently. So, I prefer a catch-all -Werror to nothing. Remember that this is just for the working sources: releases will not activate -Werror.

xavierleroy · 2015-08-06T07:30:53Z

Concerning C stubs in third-party libraries, there are two ways to build them:

gcc -I`ocamlc -where` <my options> -c mystub.c

or

ocamlc -c mystub.c

In the first case, the third party is in control of flags. In the second case, the C compiler flags used to compile OCaml (technically, $(BYTECCCOMPFLAGS)) are automatically applied. With my proposal, this would include gnu99 mode, higher optimization, and -Werror if a development version of OCaml is used.

Higher optimization does expose bugs in C stub code. Indeed I found one this way in the Unix library: 0178ea4 It's not a bad thing: latent bugs are bugs; the earlier they are exposed, the earlier they are fixed.

The one thing I'm uneasy about is to force -Werror on users of method #2 if a development version of OCaml is used. This could hinder large-scale testing of dev versions via OPAM. I'll look into removing -Werror from the flags used by ocamlc -c for C sources.

xavierleroy · 2015-08-06T07:33:07Z

Concerning warnings beyond -Wall, e.g. -Wextra, I'm open to suggestions and results of experimentation. My experience with using -Wextra in Zarith is that it's a bit "noisy". In particular, the warning "comparison between signed and unsigned integers" will trigger often in the OCaml runtime sources, for code that is correct (because the signed integer is positive and the unsigned integer is a word count, with top bit = 0).

xavierleroy · 2015-08-06T07:36:10Z

Concerning the time it takes to compile the OCaml sources, I noticed no significant slowdown by using -O3. This said, OCaml sources compile quite quickly with a modern machine and parallel make, so this is absolutely not a concern for me.

As to using -O for development and -O3 for releases, this is a recipe for disaster. Higher optimizations will expose bugs in the C code that must be seen during development.

xavierleroy · 2015-08-06T07:42:18Z

Concerning -O2 versus -O3, I'm on the fence. For the bytecode interpreter, -O2 gives impressive speedups already, -O3 adds little. For the ocamlopt runtime system, -O2 has no effect and -O3 has a tiny effect, barely significant.

-O2 is aggressive enough already to break buggy C code that happens to work with -O.

Among the optimizations that -O3 adds, some (e.g. auto-vectorization) may not be beneficial for the runtime system. Typically, loops are short (iterating over the fields of a memory block), and I fear that XMM vectorization could make them slower because of startup and teardown overheads.

xavierleroy · 2015-08-06T07:45:23Z

Concerning benchmarks, some of the tests in testsuite/ are actually suitable as micro-benchmarks. But what we want is macro-benchmarks obtained on real applications. Since real applications have dependencies, I think it's a job for OPAM: just like some packages have test suites, it should be possible for a package to expose a "make benchmark" script that OPAM could call.

… compiling a C source file. The risk of breakage of 3rd-party libraries is too high. There might be cleaner ways to achieve this effect, e.g. split BYTECCCOMPOPTS into BYTECCCOMPOPTS and BYTECCEXTRAWARNINGS. git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16337 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

bschommer · 2015-08-06T09:03:12Z

I don't know if this is the right place to start warning but using -o3 is a bad idea since it is known to break different packages (thats the reason why for example gentoo recommends using -o2) so using -o3 for the stubs might break different third party stubs. Also a lot of people are very reluctant to fix bugs cause by people reporting bugs when they compile with o3. And last but not least, even if the newer gcc versions do not expose that many optimization bugs there have been a lot of them in the older gcc versions which are still used.

xavierleroy · 2015-08-06T09:32:23Z

Yes, it is exactly the place. Thanks for the info. I'm fine with being prudent and use -O2, at least at first. Will update the PR accordingly.

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16338 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

avsm · 2015-08-06T11:08:14Z

I had a look at the difference in -O2 and -O3 in clang, and it looks like there's just one significant optimisation pass that's added in 3.5. More info here: http://stackoverflow.com/questions/15548023/clang-optimization-levels

The new pass in -O3 is "argpromotion", which looks like an alias analysis to turn pointer arguments into values, to eliminate unnecessary alloca. It doesn't seem to be a significant win in native code, but I didn't test the bytecode interpreter.

The other worry is -O3 is on non-x86 architectures, where GCC's bugginess rises dramatically. -O2 has been reasonably safe in my experience on most OpenBSD arches (where the system compiler is still gcc 4.2.1, and large parts of the base system are compiled at this level of optimisation).

xavierleroy · 2015-08-06T16:56:37Z

Sobering news about gcc -O3, and reassuring news about gcc -O2.

If we decide to go the way of this PR, we should merge it on trunk early, well ahead of the next release, so that it gets widely tested.

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16375 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

configure: uninitialized variable $nativeccprofopts git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16377 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

xavierleroy · 2015-08-25T14:50:07Z

I went ahead and merged in trunk, SVN commit 16379, so that it can get extensive CI testing. In case of problem, there are 2 lines to change in configure to go back to the old behavior.

oandrieu · 2015-09-21T07:45:54Z

Commenting a bit late on this (just learned about it on caml-list).

Are you aware that -fwrapv has problems on some early gcc-4 versions ?
http://thiemonagel.de/2010/01/signed-integer-overflow/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28230

Another option exists, -fno-strict-overflow that does more or less the same thing and might be better.
http://www.airs.com/blog/archives/120

xavierleroy · 2015-10-11T11:00:53Z

Thanks for the info about -fwrapv. After chasing the links you give, it seems to me that -fno-strict-overflow was introduced (in gcc 4.2) at the same time as the -fwrapv problems were fixed... So, I don't see the benefit of using -fno-strict-overflow instead of -fwrapv if gcc's version is >= 4.2.

More generally, ocaml's runtime system, especially the bytecode interpreter, require twos-complement wrap-around on overflow to function properly. So it makes sense to inform gcc of this fact by putting -fwrapv.

The remaining question is: should we reduce optimization level (to -O1 instead of -O2) if GCC is older than 4.2? That could make sense.

gasche · 2017-01-07T14:17:29Z

MPR#7452 (Compiler segfault on large generated source code) seems related to runtime optimization levels: the segfault cannot be reproduced with an O1-compiled runtime.

xavierleroy · 2017-01-08T09:09:51Z

Evidence for MPR#7452 points to a hardware issue.

Multicore-safe implementation of Lazy

* also: upgrade node version now that Vercel has node version 14.x available

- Remove uninteresting or not well defined complexities (length, finding an element) - Rename "accessing cell `i`" into "random access" - The "adding an element" complexity for Hashtbl and Buffer was wrong

xavierleroy added 3 commits August 4, 2015 12:08

Experiment: configure gcc and clang with higher optimization levels.

2cb6ed2

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16328 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

Remove autoconf tests that are no longer used.

3b9aaea

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16331 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

dbuenzli mentioned this pull request Aug 6, 2015

Add {build,run}-benchmark and a benchmark filter ? ocaml/opam#2278

Closed

Be prudent, select -O2 optimization instead of -O3.

bdf3fe8

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16338 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

xavierleroy added 2 commits August 25, 2015 13:40

Update wrt trunk (r16374).

4ee7472

git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16375 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

asmrun/Makefile: Remove hard-wired '-O' options

ee9d50e

configure: uninitialized variable $nativeccprofopts git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16377 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02

bactrian merged commit ee9d50e into trunk Aug 25, 2015

xavierleroy deleted the cc-optim branch October 27, 2015 13:38

gasche mentioned this pull request Dec 6, 2015

Report from the Downstream trenches #333

Closed

gasche mentioned this pull request Nov 15, 2016

Clarify the use of C compiler related variables in the build system. #911

Merged

mshinwell mentioned this pull request Mar 19, 2019

Position [Lprologue] correctly #2292

Merged

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Feb 20, 2020

Merge pull request ocaml#226 from ocamllabs/multicore-lazy

e5b5d6b

Multicore-safe implementation of Lazy

mshinwell added a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020

Phantom let flag + other fix (ocaml#226)

26cc314

lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020

Phantom let flag + other fix (ocaml#226)

2eaa1b6

lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020

Phantom let flag + other fix (ocaml#226)

1158bac

lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 24, 2020

Phantom let flag + other fix (ocaml#226)

15ae0e9

dra27 mentioned this pull request Oct 9, 2020

Update HACKING.adoc #9966

Merged

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Sep 9, 2021

Preparing comments for ocamlformat (ocaml#226)

b725caf

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Purge practice pages from old site imitation (ocaml#226)

b2a7e36

* also: upgrade node version now that Vercel has node version 14.x available

Conversation

xavierleroy commented Aug 4, 2015

Optimization level

Performance impact

Other proposed changes

Uh oh!

mshinwell commented Aug 4, 2015

Uh oh!

adrien-n commented Aug 5, 2015

Uh oh!

bobzhang commented Aug 6, 2015

Uh oh!

mshinwell commented Aug 6, 2015

Uh oh!

adrien-n commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

bschommer commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

avsm commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 6, 2015

Uh oh!

xavierleroy commented Aug 25, 2015

Uh oh!

oandrieu commented Sep 21, 2015

Uh oh!

xavierleroy commented Oct 11, 2015

Uh oh!

gasche commented Jan 7, 2017

Uh oh!

xavierleroy commented Jan 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants