Skip to content

make: optionally build with gcc's link time optimization#1927

Merged
thomaseichinger merged 2 commits intoRIOT-OS:masterfrom
kaspar030:add_lto_flags
Jan 19, 2015
Merged

make: optionally build with gcc's link time optimization#1927
thomaseichinger merged 2 commits intoRIOT-OS:masterfrom
kaspar030:add_lto_flags

Conversation

@kaspar030
Copy link
Copy Markdown
Contributor

This commit enables building using link time optimization when
specifying "LTO=yes" in the application's Makefile (or compile with LTO=yes make ...).

Size comparisions:

native w/o lto native w lto msba2 w/o lto msba2 w lto
hello_world 29814 24817 48332 43948
default 71171 59282 118448 107108

Tested only with native and msba2.
On msba2, the syscalls needed attribute(used), causing default to use 16byte more in .text when not using LTO.

@kaspar030 kaspar030 added Area: build system Area: Build system Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation labels Nov 3, 2014
@N8Fear
Copy link
Copy Markdown

N8Fear commented Nov 3, 2014

IIRC lto needs at least gcc-4.7.x while gcc-4.9.x is recommended. Does this fail for lower gcc versions? Or is it just quietly ignored?

@kaspar030
Copy link
Copy Markdown
Contributor Author

@N8Fear Well, it's disabled by default. You have to manually add LTO=yes to your Makefile or add it to make's command line. Didn't try with unsupported compilers, but if specified the flags wil probably break the build.

@N8Fear
Copy link
Copy Markdown

N8Fear commented Nov 3, 2014

If it can be defined in the applications Makefile it could lead to unnecessary errors that a user may not understand correctly. Wouldn't it make sense to check for the compilerversion and drop LTO if not supported?
For example the linux kernel does this (<kernel-sources>/scripts/gcc-version.sh).

@Kijewski
Copy link
Copy Markdown
Contributor

Kijewski commented Nov 3, 2014

The best would be if we find out per board if -flto works correctly. Same as #1784.

Do -flto and --gc-sections work together? Does this PR affect #1782 which does some linker magic, too?

@kaspar030
Copy link
Copy Markdown
Contributor Author

IMHO we should add the option (LTO=yes) now for us to play with it if we want to. Took me some fiddling out the options / the syscall attributes to get a compile.

As soon as we have more insight on implications / compatibility stuff etc. we can automate LTO=yes/no. No need for a perfect solution with the first commit.

@kaspar030
Copy link
Copy Markdown
Contributor Author

@Kijewski AFAIK -flto will drop anything not needed for the link. That would implicitly include --gc-sections.

@thomaseichinger
Copy link
Copy Markdown
Member

I did compile some applications with these options and --gc-sections please compare below.
Although the result with LTO looks promising, the binary does not run on the iot-lab_M3 board. On the other two platforms LTO produces executable binaries.
@kaspar030 do you know if LTO does have any requirements on the linkerscript or similar? (I didn't look into it yet)

iot-lab_M3
no options   
   text    data     bss     dec     hex filename
  33016     168    8436   41620    a294 RIOT/examples/default/bin/iot-lab_M3/default.elf

-flto -ffat-lto-objects
   text    data     bss     dec     hex filename
   9904     108     784   10796    2a2c RIOT/examples/default/bin/iot-lab_M3/default.elf

--gc-sections
   text    data     bss     dec     hex filename
  28304     168    8432   36904    9028 RIOT/examples/default/bin/iot-lab_M3/default.elf

-flto -ffat-lto-objects --gc-sections
   text    data     bss     dec     hex filename
      0       0     512     512     200 RIOT/examples/default/bin/iot-lab_M3/default.elf

stm32f0discovery
no options
   text    data     bss     dec     hex filename
  19096     148    4804   24048    5df0 RIOT/RIOT/examples/default/bin/stm32f0discovery/default.elf

--gc-sections
   text    data     bss     dec     hex filename
  15244     148    4804   20196    4ee4 RIOT/examples/default/bin/stm32f0discovery/default.elf

-flto -ffat-lto-objects
   text    data     bss     dec     hex filename
  12880     104    2840   15824    3dd0 RIOT/examples/default/bin/stm32f0discovery/default.elf

samr21-xpro
no options
   text    data     bss     dec     hex filename
  24040     152   10472   34664    8768 RIOT/examples/default/bin/samr21-xpro/default.elf

--gc-sections
   text    data     bss     dec     hex filename
  17744     152   10472   28368    6ed0 RIOT/examples/default/bin/samr21-xpro/default.elf

-flto -ffat-lto-objects
   text    data     bss     dec     hex filename
  23992     152   10472   34616    8738 RIOT/examples/default/bin/samr21-xpro/default.elf

@jnohlgard
Copy link
Copy Markdown
Member

@thomaseichinger I would guess the failures on iot-lab_M3 is related to the fact that the binary was completely stripped of all code when compiled with -flto -ffat-lto-objects --gc-sections (sizeof(.text) == 0)

Try setting __attribute__((used)) on some key functions, maybe the reset vector and/or the board/cpu startup functions and see if the .text is kept from stripping.

Here's a related post about similar problems when using LTO: http://www.coocox.org/forum/topic.php?id=3002

@thomaseichinger
Copy link
Copy Markdown
Member

@gebart The combination of -flto -ffat-lto-objects -Wl,--gc-sections does not produce executable binaries for any tested platform.(sizeof(.text) == 0) But using -flto -ffat-lto-objects only, works on all other cortex-m platforms but the iot-lab_M3. As far as I checked they all share the same linkerscript so I was wondering what makes the difference here.
Thanks for the pointers, will try to read more into this topic.

@kaspar030
Copy link
Copy Markdown
Contributor Author

Rebased, added iot-lab_M3 support:
Without LTO:
text data bss dec hex
44700 292 8448 53440 d0c0

With LTO:
text data bss dec hex
37908 284 8448 46640 b630

@kaspar030
Copy link
Copy Markdown
Contributor Author

@thomaseichinger The table holding pointers to interrupt vectors needed to be marked ((used)).

@jnohlgard
Copy link
Copy Markdown
Member

Is this fully funtional now?
I have been experimenting with LTO on my side as well (Mulle platform) but I am having trouble with the gold linker, some of the ldscript commands and constructs that I have been using gives syntax errors in ld.gold, but works fine with ld.bfd (classic BFD linker).
Are you using the Gold linker too? How do you tell the linker to place a section at a specific address?

The following syntax is ok using the BFD linker, but not with GOLD:
.sectionname 0xdeadbeef : {
KEEP(.text*)
} > flash

The problem is that GOLD does not understand the address specification after the section name.

@kaspar030
Copy link
Copy Markdown
Contributor Author

@gebart This is functional on native and iot-lab_M3.

@kaspar030
Copy link
Copy Markdown
Contributor Author

bump
Any reason not to include this change so we can add support to more platforms?
Mind you, this is fully optional and will only be used if activated by specifying LTO=yes in the applications Makefile.

@kaspar030
Copy link
Copy Markdown
Contributor Author

rebased

@OlegHahm
Copy link
Copy Markdown
Member

Sounds good to me.

@jnohlgard
Copy link
Copy Markdown
Member

Is it necessary to mark all syscalls with __attribute__((used))?
I would think the linker would see that they are used if they are used since there will be calls to them from the kernel or the C library.

The interrupt vector table on the other hand is only accessed by hardware IRQs which makes it necessary to tell the linker that it is used.

@thomaseichinger
Copy link
Copy Markdown
Member

I'm ok with adding this optionally although for me it currently bloats the binary. I didn't test with any 4.9.* gcc version though.

% make clean flash BOARD=iot-lab_M3 LTO=yes
...
   text    data     bss     dec     hex filename
  46440     172    8444   55056    d710 RIOT/examples/default/bin/iot-lab_M3/default.elf
% make clean flash BOARD=iot-lab_M3 LTO=no
...
  text    data     bss     dec     hex filename
  33120     168    8444   41732    a304 RIOT/examples/default/bin/iot-lab_M3/default.elf

@jnohlgard
Copy link
Copy Markdown
Member

@thomaseichinger I tried it with GCC 4.9.2 and newlib master from git (2014-12-12 20:00 CET-ish):

% arm-none-eabi-gcc --version
arm-none-eabi-gcc (Gentoo 4.9.2 p1.0, pie-0.6.1) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% make clean all BOARD=iot-lab_M3 LTO=yes 
   text    data     bss     dec     hex filename
  48148    2560    9184   59892    e9f4 .../riot/examples/default/bin/iot-lab_M3/default.elf
% make clean all BOARD=iot-lab_M3 LTO=no 
   text    data     bss     dec     hex filename
  54964    2568    9188   66720   104a0 .../riot/examples/default/bin/iot-lab_M3/default.elf

and with 4.8.3:

% arm-none-eabi-gcc --version
arm-none-eabi-gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% make clean all BOARD=iot-lab_M3 LTO=no 
   text    data     bss     dec     hex filename
  54884    2568    9188   66640   10450 /home/kim/eistec/src/riot/examples/default/bin/iot-lab_M3/default.elf

I don't know why I get such a large file for LTO=no compared to your results, I'm guessing newlib configuration differences.

Curiously, I can not link with LTO using GCC 4.8.3:

% make clean all BOARD=iot-lab_M3 LTO=yes 
...
arm-none-eabi-gcc \
.../src/riot/examples/default/bin/iot-lab_M3/cpu/syscalls.o \
.../src/riot/examples/default/bin/iot-lab_M3/cpu/startup.o -o \
.../src/riot/examples/default/bin/iot-lab_M3/default.elf \
-Wl,--start-group \
.../src/riot/examples/default/bin/iot-lab_M3/at86rf231.a \
.../src/riot/examples/default/bin/iot-lab_M3/auto_init.a \
.../src/riot/examples/default/bin/iot-lab_M3/core.a \
.../src/riot/examples/default/bin/iot-lab_M3/cortex-m3_common.a \
.../src/riot/examples/default/bin/iot-lab_M3/cpu.a \
.../src/riot/examples/default/bin/iot-lab_M3/ieee802154.a \
.../src/riot/examples/default/bin/iot-lab_M3/isl29020.a \
.../src/riot/examples/default/bin/iot-lab_M3/l3g4200d.a \
.../src/riot/examples/default/bin/iot-lab_M3/lps331ap.a \
.../src/riot/examples/default/bin/iot-lab_M3/lsm303dlhc.a \
.../src/riot/examples/default/bin/iot-lab_M3/netdev_802154.a \
.../src/riot/examples/default/bin/iot-lab_M3/netdev_base.a \
.../src/riot/examples/default/bin/iot-lab_M3/periph.a \
.../src/riot/examples/default/bin/iot-lab_M3/posix.a \
.../src/riot/examples/default/bin/iot-lab_M3/ps.a \
.../src/riot/examples/default/bin/iot-lab_M3/shell.a \
.../src/riot/examples/default/bin/iot-lab_M3/shell_commands.a \
.../src/riot/examples/default/bin/iot-lab_M3/sys.a \
.../src/riot/examples/default/bin/iot-lab_M3/timex.a \
.../src/riot/examples/default/bin/iot-lab_M3/transceiver.a \
.../src/riot/examples/default/bin/iot-lab_M3/uart0.a \
.../src/riot/examples/default/bin/iot-lab_M3/vtimer.a \
.../src/riot/examples/default/bin/iot-lab_M3/iot-lab_M3_base.a \
.../src/riot/examples/default/bin/iot-lab_M3/default.a  -lm \
-Wl,--end-group \
-Wl,-Map=.../src/riot/examples/default/bin/iot-lab_M3/default.map \
-ggdb -g3 -std=gnu99 -mcpu=cortex-m3  -mlittle-endian -static -lgcc -mthumb \
-mthumb-interwork -nostartfiles \
-T.../src/riot/cpu/stm32f1/stm32f103re_linkerscript.ld \
-specs=nano.specs -lc -lnosys -flto -ffat-lto-objects

/tmp/cchmhSnE.s: Assembler messages:
/tmp/cchmhSnE.s:3585: Error: offset out of range
/tmp/cchmhSnE.s:3646: Error: offset out of range
/tmp/cchmhSnE.s:3665: Error: offset out of range
lto-wrapper: /usr/x86_64-pc-linux-gnu/arm-none-eabi/gcc-bin/4.8.3/arm-none-eabi-gcc returned 1 exit status
/usr/libexec/gcc/arm-none-eabi/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status
.../src/riot/examples/default/../../Makefile.include:153: recipe for target 'all' failed
make: *** [all] Error 1

@kaspar030
Copy link
Copy Markdown
Contributor Author

Try some benchmarks. Synthetic messaging tests get >25% faster.

@OlegHahm OlegHahm added this to the Release 2014.12 milestone Dec 17, 2014
@OlegHahm
Copy link
Copy Markdown
Member

So, anyone objecting this? @thomaseichinger, @gebart, do you agree to enable this for testing purposes for a while before we think about how to integrate it into mainline in a proper way?

@thomaseichinger
Copy link
Copy Markdown
Member

Fine for me.

@jnohlgard
Copy link
Copy Markdown
Member

I am fine with even if it may break some builds this since it is an optional extra functionality.
I would like to hear a comment on my previous question "Is it necessary to mark all syscalls with __attribute__((used))?" on msba2, it seems like a hack. I noticed that if I compile without LTO but with -ffunction-sections -fdata-sections and -Wl,--gc-sections on msba2 results in a 4 byte .text and 0 byte .data

@jnohlgard
Copy link
Copy Markdown
Member

I found a bug/oversight in the lpc2387 linker script, will post a PR later today. __attribute__((used)) is not necessary in msba2.

@OlegHahm OlegHahm modified the milestones: Release 2014.12, Release NEXT MAJOR Jan 6, 2015
@thomaseichinger
Copy link
Copy Markdown
Member

@kaspar030 Needs rebase.

@kaspar030 kaspar030 force-pushed the add_lto_flags branch 2 times, most recently from 361f195 to 9fffd99 Compare January 8, 2015 14:59
This commit enables building using link time optimization when
specifying "LTO=yes" in the application's Makefile.
Enables gcc LTO for stm32f1 based boards
@kaspar030
Copy link
Copy Markdown
Contributor Author

rebased, dropped LPC2387 stuff.

@kaspar030
Copy link
Copy Markdown
Contributor Author

So, do we agree on integrating this (optionally) now and enabling it for toolchain/architecture combinations later?

@jnohlgard
Copy link
Copy Markdown
Member

ACK from my side, @thomaseichinger ?

I would like to revisit this at a later date when LTO support within the various toolchains is more mature and widespread, but I'm fine with the current changes for the time being since it is nothing intrusive.

@thomaseichinger
Copy link
Copy Markdown
Member

Fine with me. Let's see how this evolves. ACK & Go

thomaseichinger added a commit that referenced this pull request Jan 19, 2015
make: optionally build with gcc's link time optimization
@thomaseichinger thomaseichinger merged commit 5ae38d6 into RIOT-OS:master Jan 19, 2015
@kaspar030 kaspar030 deleted the add_lto_flags branch January 19, 2015 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: build system Area: Build system Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants