This blog is under construction

Tuesday 1 October 2013

Memory layout of a C program

Below is the typical memory layout of a c program.

                                             Higher- Numbered Address
  Command line arguments 
   and environmental variables 
Stack
|
V

/\
|
Heap
Uninitialized data segment
Initialized data segment
ROdata
Text segment
                                             Lower-Numbered Address


Let us use the below example(c program) to understand the memory layout of a C program.


  /* memory_layout.c */
  #include <stdio.h>
  #include <stdlib.h>

  /* initialized and uninitialized globals */
  int init_global = 10, uninit_global;

  /* static global variable */
  static int static_variable = 100;

  /* string constant */
  const char str[] = "see-programming";

  /* numeric constant */
  const int num = 10;

  void fun2() {
        int *ptr, num1, num2;
        /* dynamic memory allocation */
        ptr = (int *) malloc(sizeof(int));
        *ptr = 100;
        free(ptr);
        return;
  }

  void fun1() {
        fun2();
        return;
  }

  int main() {
        fun1();
        return 0;
  }



Let us use GDB(GNU debugger) to poke around inside the C program which we are executing.  Basically, GDB works on executable file.  During compilation of the C program, we need to use -g option to produce debugging information for use by GDB.


  jp@jp-VirtualBox:~/$ gcc -g mem_layout.c -o memlayout
  jp@jp-VirtualBox:~/$ ls
  memlayout     mem_layout.c
  jp@jp-VirtualBox:~/$ gdb ./memlayout 
  GNU gdb (GDB) 7.2-ubuntu
  (gdb) break fun2
  Breakpoint 1 at 0x80484a8: file mem_layout.c, line 36.
  (gdb) run
  Starting program: /home/jp/cpgms/lab_pgms/temp/memlayout "INDIA"

We have created an executable named memlayout and loaded the same in GDB. We have set a break point(break fun2 - pauses execution at the given location) at function fun2().


Text Segment:
It has the machine codes and the CPU executes these machine codes.  Inorder to prevent machine codes from being over written by stack or heap overflow, it is kept read only.  It is located below the stack and heap segments.  Symbols correspond to various functions in the given C program will also be available in text segment.  And it can be accessed using function pointers.

Let us find the address of the various functions in our C program using GDB.
  (gdb) print main
  $6 = {int ()} 0x804842c <main>
  (gdb) print fun1
  $7 = {void ()} 0x804841f <fun1>
  (gdb) print fun2
  $8 = {void ()} 0x80483f4 <fun2>


RO Data Segment:
It is also called as Read Only Segment.  If const qualifier is applied to any of the declaration, then the value of it cannot be changed after initialization.  Consider the below example.

Example:
const char str[] = "See-Programming";
const int num = 10;

These kind of data are stored under RO Data Segment. It is kept read only since the value cannot be altered after initialization.

Let us print the address of RO Data in our program using GDB
  (gdb) p &num
  $1 = (const int *) 0x8048510
  (gdb) p &str
  $2 = (char (*)[16]) 0x8048500


Initialized Data Segment:
It is also called as data segment.  It is available above the text/code segment.  It has static and global variables.  Initialized data segment of kept read-write, since the value of the variables can be modified during program execution.  Lifetime of the initialized data is the complete execution time of the program. Data whose lifetime is the complete execution time of the program will be mention in the executable file.  Memory for initialized data are allocated statically(static memory allocation).  

Let us print the address of the initialized data using GDB.
  (gdb) p &init_global
  $3 = (int *) 0x804a018
  (gdb) p &static_variable
  $4 = (int *) 0x804a01c


Uninitialized Data Segment:
It is also called as BSS segment.  It is placed above the data segment.  It contains uninitialized static and global variables.  And the data in BSS segment are initialized to zero.  Lifetime of uninitialized data is the complete execution time of the program. Memory for uninitialized data are allocated statically(static memory allocation).

Let us print the address and value of uninitialized data using GDB.
  (gdb) print &uninit_global
  $11 = (int *) 0x804a028
  (gdb) print uninit_global
  $12 = 0


Heap Segment:
It is placed above the BSS segment.  Dynamic memory allocation and deallocation takes place at the heap segment. Built-in functions like malloc(), calloc(), realloc() can be used to perform dynamic memory allocation.  And the built-in function free() is used to deallocate the dynamically allocated memory.  Heap segment's address space grows from lower-numbered address to higher numbered address.  The area of memory which is used for dynamic memory allocation is called heap.  The lifetime of heap data is the time between the explicit creation of data(using malloc) and explicit destruction of data(using free).  Dynamic memory allocation is also called as heap allocation.

Let us print the address of dynamically allocated block using GDB.
  (gdb) run
  Starting program: /home/jp/cpgms/lab_pgms/temp/a.out 
  Breakpoint 1, fun2 () at dummy.c:16
  16 ptr = (int *) malloc(sizeof(int));
  (gdb) step
  17 *ptr = 100;
  (gdb) print ptr
  $9 = (int *) 0x804b008


We have already applied break point at fun2().  So, the program execution will be paused at fun2(). And the command step executes the next line of instruction.


Stack:
All local variables and parameters of a function are stored under stack segment.  It is located above the heap segment.  Stack segment grows from higher-numbered address to lower numbered address.  Whenever a function is called, the space for its local variables and parameters is allocated on the top of the stack.  If the first function calls another function, the space for local variables and parameters of the newly called function(second function) is allocated on the top of the stack.  If any function returns, then its local variables and parameters are popped from the stack.  And the space allocated for its local variables and parameters will be reclaimed.  When a function is called, information like return address(where to return), caller's environment and few other machine registers are also stored on to the stack.

Let us print the address of local variables in fun2() using GDB.
  (gdb) p &num1
  $11 = (int *) 0xbffff388
  (gdb) p &num2
  $12 = (int *) 0xbffff384

When the program starts executing, the local variables of main is pushed into the stack.
  
  
 local variables of main function

main() function calls fun1().  So, the local variables and parameters of fun1() are pushed onto the stack. (i.e)memory for local variables and parameters are allocated on the top of the stack
  
  local variables 
and parameters of fun1()
  local variables of main()   

fun1() calls fun2.  So, the memory for local variables and parameters of fun2 are allocated on the top of the stack.
  local variables 
and parameters of fun2() 
  local variables 
and parameters of fun1()
  local variables of main()   

If fun2() returns, then the local variable, parameters and other details of fun2() are popped from the stack.  Memory allocated on stack for fun2() will be reclaimed.

  
  local variables 
and parameters of fun1()
  local variables of main()   


Let us reassemble the information which we obtained using GDB.

  Command line arguments 
   and environmental variables 
Stack
address of num1 - 0xbffff388
address of num2 - 0xbffff384
|
V

/\
|
Heap
address of ptr - 0x804b008
Uninitialized data segment
addr of uninit_global - 0x804a028
Initialized data segment
addr of init_global - 0x804a018
 addr of static_variable- 0x804a01c 
ROdata
  address of num - 0x8048510
  address of str - 0x8048500
Text segment
  {int ()} 0x804842c <main>
  {void ()} 0x804841f <fun1>
  {void ()} 0x80483f4 <fun2>
                                           
Consider the below example program.  We are going to list the size of BSS and data segments using size command.


  #include <stdio.h>
  int global = 10;
  static int static_global = 20;
  int uninit_global1, uninit_global2;

  int main(){
        printf("Hello World");
        return 0;
  }


Above program has two initialized and uninitialized global variables.  Let us list the size of text, data and BSS segment using size command.

  jp@jp-VirtualBox:~/$ size ./a.out
  text   data   bss   dec    hex filename
   893    264    16   1173   495 ./a.out


Removed all initialized and uninitialized variables from the above program.

  #include <stdio.h>

  int main(){
        printf("Hello World");
        return 0;
  }


Let us again list the size of text, data and BSS segment using size command.
  jp@jp-VirtualBox:~/$ size a.out 
  text   data   bss   dec     hex filename
   893    256     8   1157     485 a.out

Size of two initialized integer global variable is 8 bytes
Size of two uninitialized integer global variable is 8 bytes

From the above output, we could infer that the size of BSS and data segment is reduced by 8 bytes each.  Its because we have removed two initialized and uninitialized global variables(of type integer).



2 comments:

  1. Nice explanation but i have a doubt.
    main()->calling-> func1()->calling->func2()
    if func2 have
    static int a=4;
    where it will be saved.
    according to above explanation i can understand that it must go to initialized data segment.
    Am i right?

    ReplyDelete
  2. where is volatile , register,and extern is saved.

    ReplyDelete