CFLAGS

What are CFLAGS?
CFLAGS are C compiler flags, usually GCC (GNU Compiler Collection) options. CFLAGS can be used to customize and optimize applications when you build them from source. This is an important feature when using Gentoo Linux as the majority of packages are built from source code.

CFLAGS are commonly used to specify the architecture of your computer, as well as the CPU you are using and any other special options you would like to enable or disable. This information is important to GCC because it tells it exactly how to customize the assembly instructions it creates from the application's source code. If you're the impatient type and wish to get started immediately, you can check out the Safe CFLAGs Guide.

Selecting the best CFLAGS for your system
Technically GCC could run without any CFLAGS, since they are extra options, but the whole premise behind Gentoo is to custom compile packages for your particular system, which almost always gives you somewhat improved speed.

Basic (safe) optimizations
There are several basic optimizations considered safe for general use. The first is -O (letter, not number). To see exactly which optimization flags are enabled at each of the optimization levels, as well as a list of other optimizations that can be enabled, please read the GCC man page or the online documentation.

-O
-O turns on some basic optimizations that don't greatly impact compile speed, but will greatly increase the speed of your system. Though there is usually no reason to use -O instead of -O2.

-O2
-O2 turns on all -O optimizations and all other optimizations that don't greatly increase binary size or interfere with debugging. -O2 is even better than -O, and usually just as safe. This is the optimization level most commonly used for packages and distributions in the Linux world and for the Linux kernel. If you want a fast system that "just works", -O2 is probably good for you.

-O3
-O3 turns on all -O2 optimizations and also some optimizations that increase binary size and make debugging harder or even impossible. Using -O3 as your default optimization level might be a bad idea, see below.

-Os
-Os optimizes for size. -Os enables all -O2 optimizations that do not usually increase code size and performs further optimizations designed to reduce code size.

-Os is very useful for large applications, like Firefox, as it will reduce load time, memory usage, cache misses, disk usage etc. Code compiled with -Os can be faster than -O2 or -O3 because of this. It's also recommended for older computers with a low amount of RAM, disk space or cache on the CPU. But beware that -Os is not as well tested as -O2 and might trigger compiler bugs.

Note that only one of the above flags may be chosen. If you choose more than one, the last one specified will have the effect.

-fomit-frame-pointer
-fomit-frame-pointer tells gcc to omit frame pointers, freeing up an additional register on the CPU. This is mainly useful on x86 as most other arches, like AMD64, have it on by default at -O2 or greater, though binary size may increase slightly. This flag breaks debugging on x86 and possibly other arches unless you're compiling with gcc 4.x and the -fvar-tracking flag.

Glibc has a USE flag. This makes glibc use -fomit-frame-pointer and some other optimizations for the build where it's safe. To read more about optimising glibc try this HOWTO.

-march=
-march tells gcc to optimize for a certain architecture. Basically, you just need to know what your CPU is, and the GCC name for it. Remember that this may break compatibility with other architectures!

This flag takes the following form: -march=pentium4

Of course you want to replace pentium4 with whatever CPU you're actually using.

Check the gcc manual page (for your version) for a complete list (under Hardware Models and Configurations).

If you are using gcc-4.2.2 or newer you can also use -march=native or -mtune=native.

This shows you enabled options on native march setting. gcc -c -Q -march=native --help=target

and also,

gcc -### -march=native -E /usr/include/stdlib.h 2>&1 | grep "/usr/libexec/gcc/.*cc1"

See gcc doc. This comes especially handy if you have an Intel Core* CPU and are planning switching between gcc-4.2 and gcc-4.3.

-mtune=/-mcpu=
-mtune, or -mcpu in older versions of GCC, is similar to -march and accepts the same options. Unlike -march it doesn't break compatibility with older arches. -march and -mtune/-mcpu options can be mixed to get the desired effect. If you aren't going to share your binaries with other computers you don't need this flag and should only set an appropriate -march instead. The exception is arches like PPC where -march isn't available. See the Safe CFLAGs Guide for more info.

GCC documentation about -march and -mcpu/-mtune: x86/x86-64, PPC

-pipe
The final and most common flag is the '-pipe' option. The pipe option tells GCC not to create temporary files when compiling, and instead pipes data directly to the next function, which saves some compile time. Be aware that using -pipe will cause GCC to use more RAM so don't use it if you have very little.

-O2 or -O3?
This intends to be a balanced appraisal of the merits of running either of these optimisations. Generally with those running Gentoo Linux, the choice falls between these two. There is a lot of discussion over the merits of each optimization level, with data showing better performance one way or the other. This is especially true with these two. This should not tell you which is best, there is no straight answer, it ought to merely inform you of the merits and faults of the two optimisation levels. As always, use the CFLAGS best suited to your computer and how you intend to use it.

Some general pointers
If you're a Newbie to Linux, it is recommended to run -O2, you can always experiment later.

If you are running a server it is recommended to run -O2, system reliability might suffer with -O3.

If you want a system that just works it is recommended to run -O2.

If you are running a legacy (pentium equivalent, or older) check out -Os above, this is generally recommended due to these chips' exceedingly small (by modern day standards) cache size.

It is very easy to get a system built using -O3 instead of -O2. A lot of packages that break with -O3 already replace the flag with -O2.

-O3 could be used by those feeling a bit more adventurous and willing to experiment/get their hands dirty.

-O3 makes debugging a lot harder, and so should not be used if you intend to do programming or developing. For this, and other reasons, most Linux developers use -O2.

The kernel is compiled using -O2. This is not affected if you place -O3 in your CFLAGS.

What effects -O3 has
-O3 is the highest optimization level and could possibly make faster code but the applications that benefit from it are very few, usually image and video decoders and such. However the side effects, like larger binary size, affects everything. Larger binaries use more memory, load slower, cause more disc I/O, etc. So compiling a system with -O3 will have the effect that a few applications run slightly faster at the expense of the rest of the system running slightly slower and becoming less responsive.

Linux caches regularly used programs and files in RAM (that's the "cache" part when you run free -m on the command line), so the program may only need loading from the hard disk once (depending on the program and computer usage). Therefore this is less of a problem on systems with large amounts of RAM. A large CPU cache also helps as it is better suited to larger binaries, so you are more likely to see some sort of speed up. So if you have a high-end system, you will suffer less from the problems associated with -O3.

The choice: Summarised
-O2 is the default optimization level. It is ideal for desktop systems because of small binary size which results in faster load from HDD, lower RAM usage and less cache misses on modern CPUs. Servers should also use -O2 since it is considered to be the highest reliable optimization level.

-O3 only degrades application performance as it produces huge binaries which use high amount of HDD, RAM and CPU cache space. Only specific applications such as python and sqlite gain improved performance. This optimization level greatly increases compilation time.

-Os should be used on systems with limited CPU cache (old CPUs, Atom ..). Large executables such as firefox may become more responsive using -Os.

Some other points
As an interesting aside, AMD recommends using function inlining with their processors (Athlon and up).

If you do encounter problems compiling using -O3 please report them back to GNU/GCC to improve the reliability and speed of this optimisation level. They can't do this without your feedback, it is after all intended to produce faster code.

CFLAGS for developers
-Wall
 * This enables all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros.

-Wextra
 * Print extra warnings

-Werror
 * Turn warnings into errors, so GCC will stop compiling after a warning (because it becomes an error).

-ggdb, -ggdb3 NOTE: Do not compile glibc with ggdb3. It will cause your machine to explode. See http://bugs.gentoo.org/112444 for more
 * Include as much debugging information as possible in the generated binary that is useful for gdb. Level 3 includes macros.

What about all the other flags?
GCC has well over a hundred individual optimization flags, and it would be beyond the scope of this article to describe them all. Most of the important stuff, however, is covered when you utilize -O2. On the other hand, if you're feeling really ambitious, you can look up every single GCC optimization option at gcc.GNU.org.

There is a grossly incomplete CFLAGS matrix covering people's experiences with some of the less tested CFLAGS.

I'll probably add more to this document at a later time. Have fun!