Skip to content

Compiling the run-time system at higher levels of C optimization#226

Merged
bactrian merged 8 commits intotrunkfrom
cc-optim
Aug 25, 2015
Merged

Compiling the run-time system at higher levels of C optimization#226
bactrian merged 8 commits intotrunkfrom
cc-optim

Conversation

@xavierleroy
Copy link
Copy Markdown
Contributor

Currently, the run-time system and the C stub code are compiled at a conservative -O (= -O1) optimization level. This pull request investigates the use of higher optimization levels.

Optimization level

High optimization levels are risky, because optimizing C compilers assume that the C source is free of undefined behaviors w.r.t. the ISO C standards, and optimize accordingly. There are two main sources of such undefined behaviors in the OCaml sources:

  • Overflow in signed integer arithmetic is undefined in ISO C. OCaml (esp. the bytecode interpreter) assumes defined behavior, with wrap around on overflow, like in Java.
  • Wild pointer casts. For conforming programs, the C compiler can assume that e.g. a double * pointer and an int * pointer never alias. This may not be safe for the OCaml runtime system, because it makes a lot of casts between the value type and various pointer types.

Consequently, I propose to use -O3 optimization level but tame it by selecting

  • -fwrapv to tell the compiler we expect signed integer arithmetic to overflow and to wrap around;
  • -fno-strict-aliasing to turn off type-based aliasing analysis and tolerate wild pointer casts.

These options are supported by Clang and by GCC (since version 3.4). On other compilers, the configure script selects -O optimization, like before.

Performance impact

The impact of these optimizations on OCaml performance is important for the bytecode interpreter, and barely noticeable for natively-compiled programs.

Compiling ocamlrun with gcc -O3 instead of gcc -O speeds up bytecode interpretation by a factor 1.2 to 2.2 (!) on small bytecoded benchmarks, 1.3 on a larger program (CompCert). This is primarily due to better code being generated for the Next macro of the bytecode interpreter. (gcc -O pessimizes this macro; higher optimization levels do the right thing.) With gcc -O2, performance is almost as good.

The same small set of benchmarks compiled in native code show no significant speedup or slowdown when the runtime system is compiled with gcc -O2, and a barely significant 1.03 speedup when the runtime system is compiled with gcc -O3.

My small set of benchmarks doesn't exercise much of the runtime system. Other measurements are needed.

Other proposed changes

These changes are unrelated to optimization but affect the configure script as well.

  • Fail if GCC version 1.x or 2.x is used. We used to detect the 2.x versions that are known to cause problems. 1.x and 2.x are too old anyway.
  • Put GCC in gnu99 (= ISO C99 + GNU extensions) mode instead of the default gnu89 (= ANSI C + GNU extensions) mode. I want to encourage the use of C99. Besides, clang is gnu99 or even gnu11 by default, so GCC/Clang portability is increased. As to why GNU extensions and not strict ISO C mode, some useful libc functions are not declared in standard includes in strict ISO C mode.
  • For GCC and Clang, add -Werror (treat warnings as errors). The warnings emitted by GCC and Clang in -Wall mode are almost always indicative of a C programming error. I'm tired to see these warnings ignored by OCaml developers who touch the runtime system.
  • In accordance with Doligez's law of warnings, -Werror is selected only for development versions of OCaml (with +dev in the version string), but is turned off for released versions. We must fix warnings during OCaml development, but released versions may be compiled in nonstandard C environments that we never tried, so they should have a chance to compile even with some warnings.

…e optimizations prudently turned off.

Auxiliary changes:
- Put GCC in gnu99 mode (= C99 + GNU extensions).
- Check C99 conformance, warn if not.
- Reject if gcc is too old ( < 3.0 )
- Stop C compilation on warnings if this is a development version of OCaml.
  (I'm tired of C warnings being ignored.)


git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16329 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
The intent is to produce fewer warnings when configuring with -verbose.
Note that the warning on "implicit declaration of function" remains,
for relatively good reasons.


git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16330 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
@mshinwell
Copy link
Copy Markdown
Contributor

I am in favour. Previously, when I looked at this, I also came to the conclusion that -fwrapv -fno-strict-aliasing seemed to be the correct set of options to use.

I seem to recall that my experiments didn't show much of an increase in performance for native code. We have some better benchmarking infrastructure at Jane Street these days though, and we can try the tests again.

@adrien-n
Copy link
Copy Markdown
Contributor

adrien-n commented Aug 5, 2015

I like this too.

Note that there's another ticket somewhere about using SipHash in the stdlib. It has similar performance characteristics compared to the current hash function if the code is compiled with one specific optimization. In other words, there's a use for that in native code too but currently it's waiting for higher level of optimizations to be used to build the runtime.

I'd prefer to see more warnings enabled rather than have -Werror. For instance, -Wextra and -Wwrite-strings are very interesting (and another dozen ones which I don't have off-hand but should be debated in any case) . However I fear that adding -Werror now will hinder enabling additional warnings: noone will want 40 new warnings and -Werror both at once. -Werror could be added a couple months after the warnings for instance.

An alternative could be -Werror=Wall if that works for "meta" warning options (i.e. that it doesn't require specific warnings to be effective).

Should we have some kind of benchmark in the source tree for that? Optimizations in GCC and LLVM are very variable wrt. versions or C code and it would be good to be able to track this easily.

I also fear that js_of_ocaml people will complain because the bytecode interpreter is getting faster compared to jsoo's output in web browser. :P (just kidding obviously)

Your benchmarks show a nice gain for the interpreter but that doesn't mean the same gains exist for C stubs. Unfortunately this change will trickle down to other peoples' bindings and I'm a worried about the number of possible failures there (sometimes for code that hasn't been actively maintained for years). Do you think it would make sense to have something less aggressive like -O2 (and warnings) for stubs instead?

edit: GCC is c11 or gnu11 now too; surprisingly it seems to not break many things (the switch to C++11 however...)

@bobzhang
Copy link
Copy Markdown
Member

bobzhang commented Aug 6, 2015

how will it affect the compile time of building a compiler, can we turn -O when in dev mode?

@mshinwell
Copy link
Copy Markdown
Contributor

Xavier: (prompted by what adrien-n writes) if I remember correctly, it is possible to pass C files to the OCaml compilers, which will then invoke the C compiler. Does your change affect the optimization flags that will be used in such a situation? I'm not sure whether we should do that. Although one might argue that if these flags are safe for the runtime, they are very likely safe for Joe Bloggs's C stubs.

@adrien-n
Copy link
Copy Markdown
Contributor

adrien-n commented Aug 6, 2015

Mark: I don't believe the optimization passes of GCC are buggy, I believe the Joe Blogg's C stubs are. :P

As for the compile time, -O3 can be quite slow but this mostly happens with large(r) compilation units. If you want to save time on that front, you should try ccache: it's a small (5% or so) overhead with an empty cache and then it runs 50 to 100 times faster than an actual compile. In other words you won't spend more time on the initial build but then it will be much faster.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning -Werror, I agree the ideal situation would be to stop on errors of the "default" or "-Wall" categories, and keep other warnings as warnings. I don't know how to do that in GCC or Clang, however. -Werror=all doesn't seem to do anything, only -Werror=xxx where xxx is a specific warning name. If you know the correct incantation, let me know.

At any rate, I am adamant that no C code that triggers a -Wall warning should enter the Caml sources, and I've seen too many counterexamples recently. So, I prefer a catch-all -Werror to nothing. Remember that this is just for the working sources: releases will not activate -Werror.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning C stubs in third-party libraries, there are two ways to build them:

gcc -I`ocamlc -where` <my options> -c mystub.c

or

ocamlc -c mystub.c

In the first case, the third party is in control of flags. In the second case, the C compiler flags used to compile OCaml (technically, $(BYTECCCOMPFLAGS)) are automatically applied. With my proposal, this would include gnu99 mode, higher optimization, and -Werror if a development version of OCaml is used.

Higher optimization does expose bugs in C stub code. Indeed I found one this way in the Unix library: 0178ea4 It's not a bad thing: latent bugs are bugs; the earlier they are exposed, the earlier they are fixed.

The one thing I'm uneasy about is to force -Werror on users of method #2 if a development version of OCaml is used. This could hinder large-scale testing of dev versions via OPAM. I'll look into removing -Werror from the flags used by ocamlc -c for C sources.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning warnings beyond -Wall, e.g. -Wextra, I'm open to suggestions and results of experimentation. My experience with using -Wextra in Zarith is that it's a bit "noisy". In particular, the warning "comparison between signed and unsigned integers" will trigger often in the OCaml runtime sources, for code that is correct (because the signed integer is positive and the unsigned integer is a word count, with top bit = 0).

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning the time it takes to compile the OCaml sources, I noticed no significant slowdown by using -O3. This said, OCaml sources compile quite quickly with a modern machine and parallel make, so this is absolutely not a concern for me.

As to using -O for development and -O3 for releases, this is a recipe for disaster. Higher optimizations will expose bugs in the C code that must be seen during development.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning -O2 versus -O3, I'm on the fence. For the bytecode interpreter, -O2 gives impressive speedups already, -O3 adds little. For the ocamlopt runtime system, -O2 has no effect and -O3 has a tiny effect, barely significant.

-O2 is aggressive enough already to break buggy C code that happens to work with -O.

Among the optimizations that -O3 adds, some (e.g. auto-vectorization) may not be beneficial for the runtime system. Typically, loops are short (iterating over the fields of a memory block), and I fear that XMM vectorization could make them slower because of startup and teardown overheads.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Concerning benchmarks, some of the tests in testsuite/ are actually suitable as micro-benchmarks. But what we want is macro-benchmarks obtained on real applications. Since real applications have dependencies, I think it's a job for OPAM: just like some packages have test suites, it should be possible for a package to expose a "make benchmark" script that OPAM could call.

… compiling a C source file.

The risk of breakage of 3rd-party libraries is too high.
There might be cleaner ways to achieve this effect, e.g. split BYTECCCOMPOPTS into BYTECCCOMPOPTS and BYTECCEXTRAWARNINGS.


git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16337 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
@bschommer
Copy link
Copy Markdown
Contributor

I don't know if this is the right place to start warning but using -o3 is a bad idea since it is known to break different packages (thats the reason why for example gentoo recommends using -o2) so using -o3 for the stubs might break different third party stubs. Also a lot of people are very reluctant to fix bugs cause by people reporting bugs when they compile with o3. And last but not least, even if the newer gcc versions do not expose that many optimization bugs there have been a lot of them in the older gcc versions which are still used.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Yes, it is exactly the place. Thanks for the info. I'm fine with being prudent and use -O2, at least at first. Will update the PR accordingly.

@avsm
Copy link
Copy Markdown
Member

avsm commented Aug 6, 2015

I had a look at the difference in -O2 and -O3 in clang, and it looks like there's just one significant optimisation pass that's added in 3.5. More info here: http://stackoverflow.com/questions/15548023/clang-optimization-levels

The new pass in -O3 is "argpromotion", which looks like an alias analysis to turn pointer arguments into values, to eliminate unnecessary alloca. It doesn't seem to be a significant win in native code, but I didn't test the bytecode interpreter.

The other worry is -O3 is on non-x86 architectures, where GCC's bugginess rises dramatically. -O2 has been reasonably safe in my experience on most OpenBSD arches (where the system compiler is still gcc 4.2.1, and large parts of the base system are compiled at this level of optimisation).

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Sobering news about gcc -O3, and reassuring news about gcc -O2.

If we decide to go the way of this PR, we should merge it on trunk early, well ahead of the next release, so that it gets widely tested.

configure: uninitialized variable $nativeccprofopts


git-svn-id: http://caml.inria.fr/svn/ocaml/branches/cc-optim@16377 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
@bactrian bactrian merged commit ee9d50e into trunk Aug 25, 2015
@xavierleroy
Copy link
Copy Markdown
Contributor Author

I went ahead and merged in trunk, SVN commit 16379, so that it can get extensive CI testing. In case of problem, there are 2 lines to change in configure to go back to the old behavior.

@oandrieu
Copy link
Copy Markdown
Contributor

Commenting a bit late on this (just learned about it on caml-list).

Are you aware that -fwrapv has problems on some early gcc-4 versions ?
http://thiemonagel.de/2010/01/signed-integer-overflow/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28230

Another option exists, -fno-strict-overflow that does more or less the same thing and might be better.
http://www.airs.com/blog/archives/120

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Thanks for the info about -fwrapv. After chasing the links you give, it seems to me that -fno-strict-overflow was introduced (in gcc 4.2) at the same time as the -fwrapv problems were fixed... So, I don't see the benefit of using -fno-strict-overflow instead of -fwrapv if gcc's version is >= 4.2.

More generally, ocaml's runtime system, especially the bytecode interpreter, require twos-complement wrap-around on overflow to function properly. So it makes sense to inform gcc of this fact by putting -fwrapv.

The remaining question is: should we reduce optimization level (to -O1 instead of -O2) if GCC is older than 4.2? That could make sense.

@xavierleroy xavierleroy deleted the cc-optim branch October 27, 2015 13:38
@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 7, 2017

MPR#7452 (Compiler segfault on large generated source code) seems related to runtime optimization levels: the segfault cannot be reproduced with an O1-compiled runtime.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

Evidence for MPR#7452 points to a hardware issue.

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Feb 20, 2020
Multicore-safe implementation of Lazy
mshinwell added a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
@dra27 dra27 mentioned this pull request Oct 9, 2020
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Sep 9, 2021
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* also: upgrade node version now that Vercel has node version 14.x available
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
- Remove uninteresting or not well defined complexities (length, finding
  an element)
- Rename "accessing cell `i`" into "random access"
- The "adding an element" complexity for Hashtbl and Buffer was wrong
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants