GCC vs Clang Cortex-M4 benchmarks

Today we’re in the happy position to pick between severel high quality C++ compilers. Focusing on ARM targets (which is pretty much what everybody uses these days) we’re left with the following options:

  • Arm compiler (armcc or more recently armclang)
  • IAR (icc)
  • GCC (g++)
  • Clang (clang++)

Out of those four compilers the first two are proprietary and pretty expensive so I sadly can’t bench them since I don’t own a license. We can console us with the fact that neither IAR Systems nor Keil seem to have the manpower to keep up with the rapidly evolving language so both compilers don’t have full C++17 support yet. If you like to play with the latest language and library additions like I do you’re better off with GCC or Clang anyway.

Benchmarking compilers for embedded platforms is always about binary size (.text and .data sections) so thats what I was aiming for as well. The most interesting optimization options are therefor -Os (or -Oz in the case of Clang) and -flto which enables link-time optimization.

I created a git project currently containing six test cases running on a STM32F4Discovery board which features a STM32F407VG Cortex-M4 processor.

Test cases

  1. BLINKY
    The classic “hello world” programm of the embedded world. It simply toggles all 4 leds on the board in an interall of 1s.
  2. HELLO_WORLD
    Printing a semihosted “Hello World\n” message to the debugger console.
  3. I2C_SLAVE
    Utilizes the first I2C peripheral (I2C1) of the board to slave transmit some bytes. Each transmission is started by resetting Pin7 of GPIOC.
  4. ENCODER
    Initializes TIM3 to run in encoder mode. Pin6 and 7 of GPIOA are used as encoder inputs. The current value of TIM3 is output to the debugger console.
  5. TYPE_TREE_MENU
    Uses a binary tree made with std::pair to create a browsable menu. The user input is continuously red from serial port USART3 and allows browsing the menu by sending the following characters

    • ‘4’ -> left node
    • ‘6’ -> right node
    • ‘8’ -> previous node
    • ‘0’ -> exit

    The current position in the binary tree is sent back on the serial port.

  6. LUA
    Embedds Lua 5.3.4 and runs code I shamelessly copied from some examples.

 

Results

  1. BLINKY
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 4316 20
    GCC 7.3.0 -Os 4036 20
    GCC 7.3.0 -O2 -flto 2520 12
    GCC 7.3.0 -Os -flto 2480 12
    GCC 8.2.0 -O2 4320 20
    GCC 8.2.0 -Os 4044 20
    GCC 8.2.0 -O2 -flto 2524 12
    GCC 8.2.0 -Os -flto 2480 12
    Clang 6.0.1 -O2 4208 20
    Clang 6.0.1 -Os 4160 20
    Clang 6.0.1 -Oz 4008 20
    Clang 6.0.1 -O2 -flto 2784 12
    Clang 6.0.1 -Os -flto 2504 12
    Clang 6.0.1 -Oz -flto 2412 12

     

  2. HELLO_WORLD
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 6984 120
    GCC 7.3.0 -Os 6696 120
    GCC 7.3.0 -O2 -flto 5196 112
    GCC 7.3.0 -Os -flto 5140 112
    GCC 8.2.0 -O2 6996 120
    GCC 8.2.0 -Os 6712 120
    GCC 8.2.0 -O2 -flto 5208 112
    GCC 8.2.0 -Os -flto 5148 112
    Clang 6.0.1 -O2 6892 120
    Clang 6.0.1 -Os 6840 120
    Clang 6.0.1 -Oz 6668 120
    Clang 6.0.1 -O2 -flto 5444 112
    Clang 6.0.1 -Os -flto 5164 112
    Clang 6.0.1 -Oz -flto 5060 112

     

  3. I2C_SLAVE
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 8080 120
    GCC 7.3.0 -Os 7736 120
    GCC 7.3.0 -O2 -flto 6552 112
    GCC 7.3.0 -Os -flto 6360 112
    GCC 8.2.0 -O2 8092 120
    GCC 8.2.0 -Os 7760 120
    GCC 8.2.0 -O2 -flto 6568 112
    GCC 8.2.0 -Os -flto 6368 112
    Clang 6.0.1 -O2 8232 120
    Clang 6.0.1 -Os 8140 120
    Clang 6.0.1 -Oz 7904 120
    Clang 6.0.1 -O2 -flto 6880 112
    Clang 6.0.1 -Os -flto 6404 112
    Clang 6.0.1 -Oz -flto 6432 112

     

  4. ENCODER
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 7624 120
    GCC 7.3.0 -Os 7296 120
    GCC 7.3.0 -O2 -flto 5504 112
    GCC 7.3.0 -Os -flto 5444 112
    GCC 8.2.0 -O2 7600 120
    GCC 8.2.0 -Os 7312 120
    GCC 8.2.0 -O2 -flto 5520 112
    GCC 8.2.0 -Os -flto 5448 112
    Clang 6.0.1 -O2 7584 120
    Clang 6.0.1 -Os 7516 120
    Clang 6.0.1 -Oz 7264 120
    Clang 6.0.1 -O2 -flto 6040 112
    Clang 6.0.1 -Os -flto 5740 112
    Clang 6.0.1 -Oz -flto 5596 112

     

  5. TYPE_TREE_MENU
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 10552 124
    GCC 7.3.0 -Os 10332 124
    GCC 7.3.0 -O2 -flto 8128 116
    GCC 7.3.0 -Os -flto 8112 116
    GCC 8.2.0 -O2 10676 124
    GCC 8.2.0 -Os 10344 124
    GCC 8.2.0 -O2 -flto 8108 116
    GCC 8.2.0 -Os -flto 8068 116
    Clang 6.0.1 -O2 10456 124
    Clang 6.0.1 -Os 10292 124
    Clang 6.0.1 -Oz 10116 124
    Clang 6.0.1 -O2 -flto 8932 116
    Clang 6.0.1 -Os -flto 8424 116
    Clang 6.0.1 -Oz -flto 8240 116

     

  6. LUA
    Compiler Optimizations Text Data
    GCC 7.3.0 -O2 139344 584
    GCC 7.3.0 -Os 127704 584
    GCC 7.3.0 -O2 -flto 137824 580
    GCC 7.3.0 -Os -flto 125768 580
    GCC 8.2.0 -O2 140156 576
    GCC 8.2.0 -Os 128460 576
    GCC 8.2.0 -O2 -flto 138444 580
    GCC 8.2.0 -Os -flto 126612 580
    Clang 6.0.1 -O2 169728 580
    Clang 6.0.1 -Os 137184 580
    Clang 6.0.1 -Oz 132208 580
    Clang 6.0.1 -O2 -flto 255728 576
    Clang 6.0.1 -Os -flto 144600 576
    Clang 6.0.1 -Oz -flto 132084 576

 

All in all I’d say GCC and Clang is currently pretty much on par with probably the slightest advantage when using GCCs LTO optimization.