/* Ajith - Syntax Higlighter - End ----------------------------------------------- */

3.04.2012

Memory Layout of a C program - Part 1

A running program is called a process.

When we run a program, its executable image is loaded into memory area that normally called a process address space in an organized manner. It is organized into following areas of memory, called segments:
  • text segment 
  • data segment 
  • stack segment 
  • heap segment 
Memory layout of a C program
Figure 1: Memory layout

text segment

It is also called the code segment.

This is the area where the compiled code of the program itself resides. This is the machine language representation of the program steps to be carried out, including all functions making up the program, both user defined and system.

For example, Linux/Unix arranges things so that multiple running instances of the same program share their code if possible. Only one copy of the instructions for the same program resides in memory at any time and also it is often read-only, to prevent a program from accidentally modifying its instructions.

The portion of the executable file containing the text segment is the text section.

data segment

initialized data
It contains both static and global data that are initialized with non-zero values. Each process running the same program has its own data segment.This segment can be further classified into initialized read-only area and initialized read-write area.

For Eg: The global string defined by char s[] = “hello world” in C and a C statement like int debug=1 outside the main (i.e. global) would be stored in initialized read-write area. And a global C statement like const char* string = “hello world” makes the string literal “hello world” to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.

The portion of the executable file containing the data segment is the data section.

Uninitialized data - bss segment
BSS stands for ‘Block Started by Symbol’ named after an ancient assembler operator that stood for “block started by symbol".uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.

For Eg: A variable declared static int i and a global variable declared int j would be contained in the BSS segment.

Each process running the same program has its own BSS area. When running, the BSS data are placed in the data segment.

For Linux/Unix the format of an executable, only variables that are initialized to a nonzero value occupy space in the executable’s disk file. In the executable file, they are stored in the BSS section.

The remaining two areas of system memory are where storage may be allocated by the compiler for data storage.

heap segment

The heap is where dynamically allocated memory (obtained by malloc(), calloc(), realloc() and new for C++) comes from. As memory is allocated on the heap, the process’s address space grows.  Although it is possible to give memory back to the system and shrink a process’s address space, this is almost never done because it will be allocated to other process again.  

Freed memory (free() and delete) goes back to the heap. Heap memory does not have to be returned in the same order in which it was acquired (it doesn't have to be returned at all), so unordered malloc/free's eventually cause heap fragmentation. The heap must keep track of different regions, and whether they are in use or available to malloc. One scheme is to have a linked list of available blocks (the "free store"), and each block handed to malloc is preceded by a size count that goes with it. Some people use the term arena to describe the set of blocks managed by a memory allocator (in SunOS, the area between the end of the data segment and the current position of the break).

It is typical for the heap to grow upward. This means that successive items that are added to the heap are added at addresses that are numerically greater than previous items.  It is also typical for the heap to start immediately after the BSS area of the data segment. The end of the heap is marked by a pointer known as the "break" (Your programs will "break" if they reference past the break).

When the heap manager needs more memory, it can push the break further away using the system calls brk and sbrk. We don't call brk yourself explicitly, but if we malloc enough memory, brk will eventually be called.

stack segment

The stack segment is where local (automatic) variables are allocated.  In C program, local variables are all variables declared inside the opening left curly brace of a function body including the main() or other left curly brace that aren’t defined as static. The data is popped up or pushed into the stack following the Last in First out (LIFO) rule. 

On the standard PC x86 computer architecture it grows toward address zero; on some other architectures it grows the opposite direction.

A “stack pointer” register tracks the top of the stack; it is adjusted each time a value is “pushed” onto the stack.When a function is called, a stack frame (or a procedure activation record) is created and PUSHed onto the top of the stack. This stack frame contains information such as the address from which the function was called and where to jump back to when the function is finished (return address), parameters, local variables, and any other information needed by the invoked function. The order of the information may vary by system and compiler.

When a function returns, the stack frame is POPped from the stack.  Typically the stack grows downward, meaning that items deeper in the call chain are at numerically lower addresses and toward the heap.
Figure 2: Stack layout

When a program begins executing in the function main(), space is allocated on the stack for all variables declared within  main(), as seen in Figure (a). If main() calls a function, func1(), additional storage is allocated for the variables in  func1() at the top of the stack as shown in Figure (b). Notice that the parameters passed by main() to func1() are also stored on the stack.

If func1() were to call any additional functions, storage would be allocated at the new Top of stack as seen in the figure. When func1() returns, storage for its local variables is deallocated, and the Top of the stack returns to position shown in Figure (c). If main() were to call another function, storage would be allocated for that function at the Top shown in the figure. As can be seen, the memory allocated in the stack area is used and reused during program execution. It should be clear that memory allocated in this area will contain garbage values left over from previous usage.

Although it is theoretically possible for the stack and heap to grow into each other, the operating system prevents that event.

Check the PART-2 for detailed understanding of these sections with examples.

The relationship among the different sections/segments is summarized below.

Figure 3: Relationship between Sections and Segments