This blog is under construction

Saturday, 20 July 2013

C program to implement Knuth–Morris–Pratt algorithm

Write a C program to implement Knuth–Morris–Pratt algorithm.
  • Knuth-Morris-Pratt(KMP) is a string matching algorithm.
  • It helps to find the search string in the given target string with minimal comparisons.
  • For the target string of length n and patter string of length m, the running time of the KMP algorithm is O(m+n).
  • KMP algorithm involves two phases.  They are finding the overlap in the pattern and finding pattern in target string.
How to find the overlap in the given pattern?
Here, we need to find suffix(in the pattern) that matches prefix of the given pattern.  For example, "abcdabd" is the input string.  "ab" at the end is the suffix that matches the prefix "ab" at the start  => abcdabd

Algorithm to find overlap in the given pattern:
Input   :  pattern in which overlap needs to be found

Output:  Overlap array

        i<-2, j<-0, len<-strlen(pattern)

        overlap[0] <- -1, overlap[1] <-  0

        while  i < len 

                if  pattern[i - 1] equal to pattern[j]

                        j <- j + 1, ptr[i++] <- j;
                otherwise if j greater than 0
                        j <- ptr[j]
                otherwise
                        ptr[i++] <- 0
        end while

i = 2, j = 0
Pattern :   a | b | c | d | a | b | d |
Overlap :  -1 | 0 | 0 |

i = 3, j = 0
Pattern :  a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 |

i = 4, j = 0
Pattern :  a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 |

i = 5, j = 0
Pattern :  a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 |

i = 6, j = 1
Pattern :  a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 | 2 |


Algorithm for Knuth-Morris-Pratt(KMP) Searching:
Input    : Target string, Pattern and Overlap array
Output :  Index of the Pattern in Target string
        i->0, j->0, count->0
        while i + j is lesser than  strlen(Target)
                if pattern[j] equals Target[i + j],
                        if j equals strlen(word) - 1)
                                return index of pattern
                        j <- j + 1;
                otherwise,
                        i <- i + j - overlap[j];
                        if overlap[j] is greater than -1)
                                j <- overlap[j]
                        otherwise,
                                j <- 0;
         end while

Target : abc abcdabcdabd
Pattern: abcdabd

i => 0 , j=> 0
Target[i + j] and Pattern[j] are same.  So, increment j.
Target :  abc abcdabcdabd
Pattern:  abcdabd

i => 0, j=> 1
Target[i + j] and Pattern[j] are same.  So, increment j.
Target :  abc abcdabcdabd
Pattern:  abcdabd

i => 0 , j=> 2
Target[i + j] and Pattern[j] are same.  So, increment j.
Target :  abc abcdabcdabd
Pattern:  abcdabd


i => 0 , j=> 3
Target[i + j] and Pattern[j] are not same.
Target : abc abcdabcdabd
Pattern: abcdabd
i <- i + j - overlap[3] => 0 + 3 - 0 => 3
j <- overlap[j] => overlap[3] => 0

i => 3 , j=> 0
Target[i + j] and Pattern[j] are not same.
Target :  abc abcdabcdabd
Pattern:       abcdabd
i <- i + j - overlap[0] => 3 + 0 - (-1) => 4
j <- 0

i => 4 , j=> 0
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd


i => 4 , j=> 1
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd


i => 4 , j=> 2
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd


i => 4 , j=> 3
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd


i => 4 , j=> 4
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd

i => 4 , j=> 5
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:       abcdabd

i => 4 , j=> 6
Target[i + j] and Pattern[j] are not same.
Target : abc abcdabcdabd
Pattern:       abcdabd
i <- i + j - overlap[6] => 4 + 6 - 2 => 8
j <- overlap[j] => overlap[6] => 2

i => 8 , j=> 2
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:              abcdabd

i => 8 , j=> 3
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:             abcdabd

i => 8 , j=> 4
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:              abcdabd

i => 8 , j=> 5
Target[i + j] and Pattern[j] are same.  So, increment j.
Target : abc abcdabcdabd
Pattern:              abcdabd

i => 8 , j=> 6
Target : abc abcdabcdabd
Pattern:              abcdabd
Finally, match Found!!!


C program to implement Knuth–Morris–Pratt algorithm(KMP String Matching):


  #include <stdio.h>
  #include <string.h>
  #include <stdlib.h>

  /* 
   * finds the position of the pattern in the given target string
   * target - str, patter - word
   */

  int  kmpSearch(char *str, char *word, int *ptr) {
        int i = 0, j = 0;

        while ((i + j) < strlen(str)) {
                /* match found on the target and pattern string char */
                if (word[j] == str[i + j]) {
                        if (j == (strlen(word) - 1)) {
                                printf("%s located at the index %d\n",
                                        word, i + 1);
                                return;
                        }
                        j = j + 1;
                } else {
                        /* manipulating next indices to compare */
                        i = i + j - ptr[j];
                        if (ptr[j] > -1) {
                                j = ptr[j];
                        } else {
                                j = 0;
                        }
                }
        }
  }

  /* find the overlap array for the given pattern */
  void findOverlap(char *word, int *ptr) {
        int i = 2, j = 0, len = strlen(word);

        ptr[0] = -1;
        ptr[1] = 0;

        while (i < len) {
                if (word[i - 1] == word[j]) {
                        j = j + 1;
                        ptr[i] = j;
                        i = i + 1;
                } else if (j > 0) {
                        j = ptr[j];
                } else {
                        ptr[i] = 0;
                        i = i + 1;
                }
        }
        return;
  }

  int main() {
        char word[256], str[1024];;
        int *ptr, i;

        /* get the target string from the user */
        printf("Enter your target string:");
        fgets(str, 1024, stdin);
        str[strlen(str) - 1] = '\0';

        /* get the pattern string from the user */
        printf("Enter your pattern string:");
        fgets(word, 256, stdin);
        word[strlen(word) - 1] = '\0';

        /* dynamic memory allocation for overlap array */
        ptr = (int *)calloc(1, sizeof(int) * (strlen(word)));

        /* finds overlap array */
        findOverlap(word, ptr);
        /* find the index of the pattern in target string */
        kmpSearch(str, word, ptr);

        return 0;
  }



  Output:
  jp@jp-VirtualBox:~/$ ./a.out
  Enter your target string: abc abcdabcdabd
  Enter your pattern string: abcdabd
  abcdabd located at the index 9


3 comments:

  1. Error at line 10 saying function call missing and 48 declaration not allowed...can pls say wts wrng

    ReplyDelete
  2. Error at line 10 saying function call missing and 48 declaration not allowed...can pls say wts wrng

    ReplyDelete