April 2008 ~ codingfreak

Compilers have become pretty smart as they can perform all sorts of code transformations — from simple inlining to sophisticated register analysis — that make compiled code run faster.

In most situations, faster is better than smaller, because disk space and memory are quite cheap for desktop users. However, for embedded systems small is often at least as important as fast because of a commonplace environment consisting of extreme memory constraints and no disk space, making code optimization a very important task.

Optimization is a complex process. For each high-level command in the source code there are usually many possible combination's of machine instructions that can be used to achieve the appropriate final result. The compiler must consider these possibilities and choose among them.

In general, different code must be generated for different processors, as they use incompatible assembly and machine languages. Each type of processor also has its own characteristics -- some CPU's provide a large number of registers for holding intermediate results of calculations, while others must store and fetch intermediate results from memory. Appropriate code must be generated in each case.

Furthermore, different amounts of time are needed for different instructions, depending on how they are ordered. GCC takes all these factors into account and tries to produce the fastest executable for a given system when compiling with optimization.

Turning optimization flags ON makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.

NOTE: While the GCC optimizer does a good job of code optimization, it can sometimes result in larger or slower images (the opposite of what you may be after). It’s important to test your image to ensure that you’re getting what you expect. When you don’t get what you expect, changing the options you provide to the optimizer can usually remedy the situation.

Let us see various optimization levels provided by GCC

LEVEL 0: -O0 Optimization
At this optimization level GCC does not perform any optimization and compiles the source code in the most straightforward way possible. Each command in the source code is converted directly to the corresponding instructions in the executable file, without rearrangement. This is the best option to use when debugging with a source code debugger (such as the GNU Debugger, GDB). It is the default level of optimization if no optimization level option is specified. It can be specified as

[bash]$gcc -O0 hello.c -o hello
or
[bash]$gcc hello.c -o hello

NOTE: -O0 is actually -(Capital O)(Number 0). Similarly -O1 is -(Capital O)(Number 1).

LEVEL 1: -O1 Optimization (-O)
In the first level of optimization, the optimizer’s goal is to compile as quickly as possible and also to reduce the resulting code size and execution time. Compilation may take more time with -O1 (over -O0), but depending upon the source being compiled, this is usually not noticeable. Level 1 also has two sometimes conflicting goals. These goals are to reduce the size of the compiled code while increasing its performance. The set of optimizations provided in -O1 support these goals, in most cases.

The -O1 optimization is usually a safe level if you still desire to safely debug the resulting image. Check out the table given below for optimizations enabled at different levels.

NOTE: Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, for example, to enable the defer-pop optimization, we would simply define this as

[bash]$ gcc -fdefer-pop hello.c –o hello

We could also enable level 1 optimization and then disable any particular optimization using the -fno- prefix, like this:

[bash]$ gcc -O1 -fno-defer-pop -o test test.c

This command would enable the first level of optimization and then specifically disable the defer-pop optimization.

LEVEL 2: -O2 Optimization
The second level of optimization provides even more optimizations (while including those in -O1, plus a large number of others). Only optimizations that do not require any speed-space tradeoffs are used, so the executable should not increase in size. The compiler will take longer to compile programs and require more memory than with -O1. This option is generally the best choice for deployment of a program, because it provides maximum optimization without increasing the executable size. It is the default optimization level for releases of GNU packages. The second level is enabled as shown below:

[bash]$gcc -O2 hello.c -o hello

LEVEL 2.5: -Os Optimization
The special optimization level (-Os or size) enables all -O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-level optimizations, except for the alignment optimizations. The alignment optimizations skip space to align functions, loops, jumps and labels to an address that is a multiple of a power of two, in an architecture-dependent manner. Skipping to these boundaries can increase performance as well as the size of the resulting code and data spaces; therefore, these particular optimizations are disabled.

-Os optimization level simply disables some -O2 optimizations like -falign-labels, -falign-jumps, -falign-labels, and -falign-functions. Each of these has the potential to increase the size of the resulting image, and therefore they are disabled to help build a smaller executable. The size optimization level is enabled as:

[bash]$gcc -Os hello.c -o hello

In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is disabled.

LEVEL 3: -O3 Optimization
The third and highest level enables even more optimizations by putting emphasis on speed over size. This includes optimizations enabled at -O2 and rename-register. The optimization inline-functions also is enabled here, which can increase performance but also can drastically increase the size of the object, depending upon the functions that are inlined. The third level is enabled as:

[bash]$gcc -O3 hello.c -o hello

Although -O3 can produce fast code, the increase in the size of the image can have adverse effects on its speed. For example, if the size of the image exceeds the size of the available instruction cache, severe performance penalties can be observed. Therefore, it may be better simply to compile at -O2 to increase the chances that the image fits in the instruction cache.

Architecture Specification
The optimizations discussed thus far can yield significant improvements in software performance and object size, but specifying the target architecture also can yield meaningful benefits.

The -march option of gcc allows the CPU type to be specified

The default architecture is i386. GCC runs on all other i386/x86 architectures, but it can result in degraded performance on more recent processors. If you're concerned about portability of an image, you should compile it with the default. If you're more interested in performance, pick the architecture that matches your own.

References:

1. Optimization in GCC
2. An Introduction to GCC - for the GNU compilers gcc and g++.

codingfreak

Pages

4.07.2008

TinTin E-Book collection

4.03.2008

Optimization levels in GCC