Today we’re in the happy position to pick between severel high quality C++ compilers. Focusing on ARM targets (which is pretty much what everybody uses these days) we’re left with the following options:
- Arm compiler (armcc or more recently armclang)
- IAR (icc)
- GCC (g++)
- Clang (clang++)
Out of those four compilers the first two are proprietary and pretty expensive so I sadly can’t bench them since I don’t own a license. We can console us with the fact that neither IAR Systems nor Keil seem to have the manpower to keep up with the rapidly evolving language so both compilers don’t have full C++17 support yet. If you like to play with the latest language and library additions like I do you’re better off with GCC or Clang anyway.
Benchmarking compilers for embedded platforms is always about binary size (.text and .data sections) so thats what I was aiming for as well. The most interesting optimization options are therefor -Os (or -Oz in the case of Clang) and -flto which enables link-time optimization.
I created a git project currently containing six test cases running on a STM32F4Discovery board which features a STM32F407VG Cortex-M4 processor.
Test cases
- BLINKY
The classic “hello world” programm of the embedded world. It simply toggles all 4 leds on the board in an interall of 1s. - HELLO_WORLD
Printing a semihosted “Hello World\n” message to the debugger console. - I2C_SLAVE
Utilizes the first I2C peripheral (I2C1) of the board to slave transmit some bytes. Each transmission is started by resetting Pin7 of GPIOC. - ENCODER
Initializes TIM3 to run in encoder mode. Pin6 and 7 of GPIOA are used as encoder inputs. The current value of TIM3 is output to the debugger console. - TYPE_TREE_MENU
Uses a binary tree made with std::pair to create a browsable menu. The user input is continuously red from serial port USART3 and allows browsing the menu by sending the following characters- ‘4’ -> left node
- ‘6’ -> right node
- ‘8’ -> previous node
- ‘0’ -> exit
The current position in the binary tree is sent back on the serial port.
- LUA
Embedds Lua 5.3.4 and runs code I shamelessly copied from some examples.
Results
- BLINKY
Compiler Optimizations Text Data GCC 7.3.0 -O2 4316 20 GCC 7.3.0 -Os 4036 20 GCC 7.3.0 -O2 -flto 2520 12 GCC 7.3.0 -Os -flto 2480 12 GCC 8.2.0 -O2 4320 20 GCC 8.2.0 -Os 4044 20 GCC 8.2.0 -O2 -flto 2524 12 GCC 8.2.0 -Os -flto 2480 12 Clang 6.0.1 -O2 4208 20 Clang 6.0.1 -Os 4160 20 Clang 6.0.1 -Oz 4008 20 Clang 6.0.1 -O2 -flto 2784 12 Clang 6.0.1 -Os -flto 2504 12 Clang 6.0.1 -Oz -flto 2412 12 - HELLO_WORLD
Compiler Optimizations Text Data GCC 7.3.0 -O2 6984 120 GCC 7.3.0 -Os 6696 120 GCC 7.3.0 -O2 -flto 5196 112 GCC 7.3.0 -Os -flto 5140 112 GCC 8.2.0 -O2 6996 120 GCC 8.2.0 -Os 6712 120 GCC 8.2.0 -O2 -flto 5208 112 GCC 8.2.0 -Os -flto 5148 112 Clang 6.0.1 -O2 6892 120 Clang 6.0.1 -Os 6840 120 Clang 6.0.1 -Oz 6668 120 Clang 6.0.1 -O2 -flto 5444 112 Clang 6.0.1 -Os -flto 5164 112 Clang 6.0.1 -Oz -flto 5060 112 - I2C_SLAVE
Compiler Optimizations Text Data GCC 7.3.0 -O2 8080 120 GCC 7.3.0 -Os 7736 120 GCC 7.3.0 -O2 -flto 6552 112 GCC 7.3.0 -Os -flto 6360 112 GCC 8.2.0 -O2 8092 120 GCC 8.2.0 -Os 7760 120 GCC 8.2.0 -O2 -flto 6568 112 GCC 8.2.0 -Os -flto 6368 112 Clang 6.0.1 -O2 8232 120 Clang 6.0.1 -Os 8140 120 Clang 6.0.1 -Oz 7904 120 Clang 6.0.1 -O2 -flto 6880 112 Clang 6.0.1 -Os -flto 6404 112 Clang 6.0.1 -Oz -flto 6432 112 - ENCODER
Compiler Optimizations Text Data GCC 7.3.0 -O2 7624 120 GCC 7.3.0 -Os 7296 120 GCC 7.3.0 -O2 -flto 5504 112 GCC 7.3.0 -Os -flto 5444 112 GCC 8.2.0 -O2 7600 120 GCC 8.2.0 -Os 7312 120 GCC 8.2.0 -O2 -flto 5520 112 GCC 8.2.0 -Os -flto 5448 112 Clang 6.0.1 -O2 7584 120 Clang 6.0.1 -Os 7516 120 Clang 6.0.1 -Oz 7264 120 Clang 6.0.1 -O2 -flto 6040 112 Clang 6.0.1 -Os -flto 5740 112 Clang 6.0.1 -Oz -flto 5596 112 - TYPE_TREE_MENU
Compiler Optimizations Text Data GCC 7.3.0 -O2 10552 124 GCC 7.3.0 -Os 10332 124 GCC 7.3.0 -O2 -flto 8128 116 GCC 7.3.0 -Os -flto 8112 116 GCC 8.2.0 -O2 10676 124 GCC 8.2.0 -Os 10344 124 GCC 8.2.0 -O2 -flto 8108 116 GCC 8.2.0 -Os -flto 8068 116 Clang 6.0.1 -O2 10456 124 Clang 6.0.1 -Os 10292 124 Clang 6.0.1 -Oz 10116 124 Clang 6.0.1 -O2 -flto 8932 116 Clang 6.0.1 -Os -flto 8424 116 Clang 6.0.1 -Oz -flto 8240 116 - LUA
Compiler Optimizations Text Data GCC 7.3.0 -O2 139344 584 GCC 7.3.0 -Os 127704 584 GCC 7.3.0 -O2 -flto 137824 580 GCC 7.3.0 -Os -flto 125768 580 GCC 8.2.0 -O2 140156 576 GCC 8.2.0 -Os 128460 576 GCC 8.2.0 -O2 -flto 138444 580 GCC 8.2.0 -Os -flto 126612 580 Clang 6.0.1 -O2 169728 580 Clang 6.0.1 -Os 137184 580 Clang 6.0.1 -Oz 132208 580 Clang 6.0.1 -O2 -flto 255728 576 Clang 6.0.1 -Os -flto 144600 576 Clang 6.0.1 -Oz -flto 132084 576
All in all I’d say GCC and Clang is currently pretty much on par with probably the slightest advantage when using GCCs LTO optimization.