Hardware CFLAGS

This guide describes how to determine the appropriate machine (-m*) compiler flags for your processor. It is assumed that you have already read and followed the guidelines found in the Safe Cflags tutorial. We will be using -march=native to determine available processor features so this guide is largely amd64 and x86 centric.

Why would I want to use these flags?
You paid good money for that processor, you might as well use it to its full potential. :) The various -march CPU types available in GCC represent the lowest common denominator for that particular processor, only enabling the features that are common to every CPU in that family.  For example, using -march=core2 will enable MMX, SSE, SSE2, SSE3, and SSSE3.  A Core i7 processor also supports SSE4.1, SSE4.2, and AES instructions but they will not be used unless explicitly enabled.

Using -march=native instead
The -march=native option checks which instruction sets the processor supports and enables them automatically. There are very few reasons to not use it, so if you can then it's strongly encouraged that you do.


 * distcc
 * You cannot use -march=native on a distributed compiler network. Because available processor features are determined on the system doing the compilation it's possible for a node to generate code containing instructions that are not available and therefore not executable on the host system.  distcc will not distribute work if it encounters -march=native.


 * -mtune=generic
 * By default, -mtune is set to be the same value as -march. However, because of the lack of a proper cost model for Intel Core and later processors, GCC versions <= 4.5 actually generate better code with -mtune=generic than they do with their respective -mtune options [1].  You would think that this would easily be gotten around by specifying "-march=native -mtune=generic", but this drops some flags that are otherwise available (in particular --param flags).


 * [1] http://gcc.gnu.org/PR45483#c3

Determining available processor features
$ echo "" | gcc -march=native -v -E - 2>&1 | grep cc1

This should give you something like the following:

/usr/libexec/gcc/x86_64-unknown-linux-gnu/4.5.1/cc1 -E -quiet -v - -D_FORTIFY_SOURCE=2 -march=core2 -mcx16 -msahf -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=core2

Edit your CFLAGS accordingly:

CFLAGS="-O2 -pipe -march=core2 -mcx16 -msahf -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=generic"

Ta-da.

Determining what -march= actually does
One disadvantage of spelling everything out like this is that you can run into problems when switching to older GCC versions that do not support these flags (usually the vague "C compiler cannot create executables" error). If you frequently switch GCC versions this can become very annoying. One way to minimize the chances of this happening is to drop any redundant flags, ie. flags that are already enabled by the -march flag.

$ echo "int main { return 0; }" | gcc -march=core2 -v -Q -x c - 2>&1

[...] GNU C (Gentoo 4.5.1-r1 p1.3, pie-0.4.5) version 4.5.1 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.5.1, GMP version 5.0.1, MPFR version 3.0.0-p3, MPC version 0.8.2 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 options passed: -v - -D_FORTIFY_SOURCE=2 -march=core2 options enabled: -falign-loops -fargument-alias -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcommon -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types -ffunction-cse -fgcse-lm -fident -finline-functions-called-once -fira-share-save-slots -fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-debug-strings -fmove-loop-invariants -fpeephole -freg-struct-return -fsched-critical-path-heuristic -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column -fsigned-zeros -fsplit-ivs-in-unroller -ftrapping-math -ftree-cselim -ftree-forwprop -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-slp-vectorize -ftree-vect-loop-version -funit-at-a-time -funwind-tables -fvar-tracking -fvar-tracking-assignments -fvect-cost-model -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387 -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387 -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mno-sse4 -mpush-args -mred-zone -msahf -msse -msse2 -msse3 -mssse3 -mtls-direct-seg-refs [...]

(BTW this is also a useful trick for figuring out what flags different -O levels enable)

Here we can see that -march=core2 already enables both -mcx16 and -msahf but not -msse4.1. So the final result would be:

CFLAGS="-O2 -pipe -march=core2 -msse4.1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=generic"