/* Ajith - Syntax Higlighter - End ----------------------------------------------- */

2.19.2008

Compilation process in GCC

Compiling a C program - Compiling a simple C program using gcc.

Above article is a high level view of compliation in gcc. In present article let us see in depth view of the compilation order or flow in gcc which is a default C compiler in GCC.

gcc goes through a sequence of different intermediate steps before generating final executable. Those intermediate steps are the result of different tools which are invoked internally to complete the compilation of the source code.

The whole Compilation process is broken down into following phases:

  • Preprocessing
  • Compilation
  • Assembly
  • Linking

As an example, we will examine these compilation stages individually using the same ‘hello.c’ program given below:

#include <stdio.h>
#define STRING "Hello World"

int main (void)
{
printf ("My First program - %s\n",STRING);
return 0;
}

NOTE: We dont need to manually go through all the intermediate stages to generate a executable using gcc. All these stages are directly taken care transparently by gcc internally, and can be seen using the -v option.

Although 'hello.c' program is very simple it uses external header files and libraries, and so exercises all the major steps of the compilation process. If we compile the 'hello.c' with a simple command namely gcc hello.c then we end-up creating an executable file called a.out.

Back in the days of the PDP computer, a.out stood for "assembler output". Today, it simply means an older executable file format. Modern versions of Unix and Linux use the ELF executable file format. The ELF format is much more sophisticated. So even though the default filename of the output of gcc is "a.out", its actually in ELF format.

The Preprocessor
Basically C Preprocessor is responsible for 3 tasks namely:
  • Text Substitution,
  • Stripping of Comments, and
  • File Inclusion.
Text Substitution and File Inclusion is requested in our source code using Preprocessor Directives.

The lines in our code that begin with the “#” character are preprocessor directives.

In 'hello.c' program the first preprocessor directive requests a standard header file, stdio.h, be included into our source file. The other one requests a string substitution to take place in our code.

So in preprocessor stage those included header files and defined macros are expanded and merged within the source file to produce a transitory source file, which the standard calls a Translation unit. Translation units are also known as compilation units.

The C preprocessor, often known as cpp, is a macro processor that is used automatically by the C compiler to transform a C program before compilation. To perform just preprocessing operation use the following command:
[bash]$ cpp hello.c > hello.i
The result is a file named hello.i which contains the source code with all macros expanded.

By convention, preprocessed files are given the file extension ‘.i’ for C programs and ‘.ii’ for C++ programs.

NOTE: By default the preprocessed file is not saved to disk unless the -save-temps option is used in gcc so we are just using the redirection operator to save a copy of preprocessed file.

By using gcc's “-E” flag we can directly do the pre-processing operation.
[bash]$ gcc -E hello.c -o hello.i
Now we will try to check the contents of the preproccessed input file. Since the stdio.h file is fairly large the resultant output is cleaned up a bit.
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3
# 1 "/usr/include/_ansi.h" 1 3
# 1 "/usr/include/sys/config.h" 1 3
# 14 "/usr/include/sys/config.h" 3
# 25 "/usr/include/sys/config.h" 3
# 44 "/usr/include/sys/config.h" 3

# 40 "/usr/include/stdio.h" 2 3
# 1 "/usr/include/sys/reent.h" 1 3

int main(void){

printf ("My First Program - %s\n", "HELLO WORLD" );
return 0;
}
Since our program has requested the stdio.h header be included into our source which in turn, requested a whole bunch of other header files. So, the preprocessor made a note of the file and line number where the request was made and made this information available to the next steps in the compilation process. Thus, the lines,
# 40 "/usr/include/stdio.h" 2 3
# 1 "/usr/include/sys/reent.h" 1 3
indicates that the reent.h file was requested on line 40 of stdio.h. The preprocessor creates a line number and file name entry before what might be "interesting" to subsequent compilation steps, so that if there is an error, the compiler can report exactly where the error occurred.

The Compiler
The next stage of the process is the actual compilation of the preprocessed source code to assembly language, for a specific processor (Depending upon the target processor architecture the source code is converted into particular assembly language and it can be known as cross-compilation).

By using “-S” flag with gcc we can convert the preprocessed C source code into assembly language without creating an object file:
[bash]$ gcc -Wall -S hello.i -o hello.s
The resulting assembly language is stored in the file ‘hello.s’.

NOTE: The assembly code generated depends upon the PC architecture and other reasons. I am just attaching the assembly code generated with the Cygwin - Linux Emulator.


NOTE: Above Assembly code contains a call to the external printf function.

The Assembler
We know that MACHINES (i.e. a computer) can understand only Machine-Level Code. So we require an ASSEMBLER that converts assembly code in "hello.s" file into machine code. Eventhough it is a straightforward one-to-one mapping of assembly language statements to their machine language counterparts, it is tedious and error-prone if done manually.

NOTE: ASSEMBLER was one of the first software tools developed after the invention of the digital electronic computer.

If there are any calls to external functions in the assembly code, the Assembler leaves the addresses of the external functions undefined, to be filled in later by the Linker.

The Assembler as in gcc can be invoked as shown below.
[bash]$ as hello.s -o hello.o
As with gcc, the output file is specified with the -o option. The resulting file ‘hello.o’ contains the machine level code for 'hello.c' program.

or

By using “-c” flag in gcc we can convert the assembly code into machine level code:
[bash]$ gcc -c hello.c
The Linker
The final stage of the compilation process is producing a single executable program file by linking set of object files. An object file and an executable file come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format). For example, ELF is used on Linux systems, while COFF is used on Windows systems.

In practice, an executable file requires many external functions from system and C run-time libraries. The linker will resolve all of these dependencies and plug in the actual address of the functions.

The linker also does a few additional tasks for us. It combines our program with some standard routines that are needed to make our program run. For example, there is standard code required at the beginning of our program that sets up the running environment, such as passing in command-line parameters and environment variables. Also, there is code that needs to be run at the end of our program so that it can pass back a return code, among other tasks. It turns out that this is no small amount of code.

Actually the mechanism used internally by gcc to link up different files is a bit complicated. For example, the full command for linking the 'hello.c' program might look as shown below:
[bash]$ ld -dynamic-linker /lib/ld-linux.so.2/usr/lib/crt1.o /usr/lib/crti.o /usr/lib/gcc-lib/i686/3.3.1/crtbegin.o-L/usr/lib/gcc-lib/i686/3.3.1 hello.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh/usr/lib/gcc-lib/i686/3.3.1/crtend.o /usr/lib/crtn.o
Luckily we are never asked to type the above command directly -- the entire linking process is handled transparently by gcc when invoked as follows:
[bash]$ gcc hello.o
This links the object file ‘hello.o’ to the C standard library, and produces an executable file ‘a.out’.
[bash]$ ./a.out
My First Program - Hello World
An object file for a C++ program can be linked to the C++ standard library in the same way with a single g++ command.

19 comments :

  1. Hi i am vasanth, Nice notes buddy I appreciate your work. Thanks

    ReplyDelete
  2. Rather clear, nice graphs, thanks!

    ReplyDelete
  3. Thanx dude gr8 work ever clear....

    ReplyDelete
  4. your work is really appreciable...
    thnx

    ReplyDelete
  5. Thanks dude! Really helpful!!

    ReplyDelete
  6. Thanks dude!! Really helpful!

    ReplyDelete
  7. thnx it was really helpful

    ReplyDelete
  8. Helpful for me! thanks.

    ReplyDelete
  9. Good explanation keep up good work.

    ReplyDelete
  10. Excellent explanation. Running gcc has never been clearer.

    ReplyDelete
  11. good article; now i want to know more about linux internals

    ReplyDelete
  12. Quite clear and accessible :-D
    btw, some images are dead

    ReplyDelete

Your comments are moderated