Write a C program to implement Knuth–Morris–Pratt algorithm.
Algorithm for Knuth-Morris-Pratt(KMP) Searching:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/*
* finds the position of the pattern in the given target string
* target - str, patter - word
*/
int kmpSearch(char *str, char *word, int *ptr) {
int i = 0, j = 0;
while ((i + j) < strlen(str)) {
/* match found on the target and pattern string char */
if (word[j] == str[i + j]) {
if (j == (strlen(word) - 1)) {
printf("%s located at the index %d\n",
word, i + 1);
return;
}
j = j + 1;
} else {
/* manipulating next indices to compare */
i = i + j - ptr[j];
if (ptr[j] > -1) {
j = ptr[j];
} else {
j = 0;
}
}
}
}
- Knuth-Morris-Pratt(KMP) is a string matching algorithm.
- It helps to find the search string in the given target string with minimal comparisons.
- For the target string of length n and patter string of length m, the running time of the KMP algorithm is O(m+n).
- KMP algorithm involves two phases. They are finding the overlap in the pattern and finding pattern in target string.
How to find the overlap in the given pattern?
Here, we need to find suffix(in the pattern) that matches prefix of the given pattern. For example, "abcdabd" is the input string. "ab" at the end is the suffix that matches the prefix "ab" at the start => abcdabd
Algorithm to find overlap in the given pattern:
Input : pattern in which overlap needs to be found
Output: Overlap array
i<-2, j<-0, len<-strlen(pattern)
overlap[0] <- -1, overlap[1] <- 0
while i < len
if pattern[i - 1] equal to pattern[j]
j <- j + 1, ptr[i++] <- j;
otherwise if j greater than 0
j <- ptr[j]
otherwise
ptr[i++] <- 0
end while
i = 2, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 |
i = 3, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 |
i = 4, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 |
i = 5, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 |
i = 6, j = 1
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 | 2 |
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 |
i = 3, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 |
i = 4, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 |
i = 5, j = 0
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 |
i = 6, j = 1
Pattern : a | b | c | d | a | b | d |
Overlap : -1 | 0 | 0 | 0 | 0 | 1 | 2 |
Algorithm for Knuth-Morris-Pratt(KMP) Searching:
Input : Target string, Pattern and Overlap array
Output : Index of the Pattern in Target string
i->0, j->0, count->0
while i + j is lesser than strlen(Target)
if pattern[j] equals Target[i + j],
if j equals strlen(word) - 1)
return index of pattern
j <- j + 1;
otherwise,
i <- i + j - overlap[j];
if overlap[j] is greater than -1)
j <- overlap[j]
otherwise,
j <- 0;
end while
Target : abc abcdabcdabd
Pattern: abcdabd
i => 0 , j=> 0
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 0, j=> 1
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 0 , j=> 2
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 0 , j=> 3
Target[i + j] and Pattern[j] are not same.
Target[i + j] and Pattern[j] are not same.
Target : abc abcdabcdabd
Pattern: abcdabd
i <- i + j - overlap[3] => 0 + 3 - 0 => 3
j <- overlap[j] => overlap[3] => 0
i => 3 , j=> 0
Target[i + j] and Pattern[j] are not same.
Target[i + j] and Pattern[j] are not same.
Target : abc abcdabcdabd
Pattern: abcdabd
i <- i + j - overlap[0] => 3 + 0 - (-1) => 4
j <- 0
i => 4 , j=> 0
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 1
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 2
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 3
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 4
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 5
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 4 , j=> 6
Target[i + j] and Pattern[j] are not same.
Target[i + j] and Pattern[j] are not same.
Target : abc abcdabcdabd
Pattern: abcdabd
i <- i + j - overlap[6] => 4 + 6 - 2 => 8
j <- overlap[j] => overlap[6] => 2
i => 8 , j=> 2
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 8 , j=> 3
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 8 , j=> 4
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 8 , j=> 5
Target[i + j] and Pattern[j] are same. So, increment j.
Target[i + j] and Pattern[j] are same. So, increment j.
Target : abc abcdabcdabd
Pattern: abcdabd
i => 8 , j=> 6
Target : abc abcdabcdabd
Pattern: abcdabd
Finally, match Found!!!
C program to implement Knuth–Morris–Pratt algorithm(KMP String Matching):
#include <string.h>
#include <stdlib.h>
/*
* finds the position of the pattern in the given target string
* target - str, patter - word
*/
int kmpSearch(char *str, char *word, int *ptr) {
int i = 0, j = 0;
while ((i + j) < strlen(str)) {
/* match found on the target and pattern string char */
if (word[j] == str[i + j]) {
if (j == (strlen(word) - 1)) {
printf("%s located at the index %d\n",
word, i + 1);
return;
}
j = j + 1;
} else {
/* manipulating next indices to compare */
i = i + j - ptr[j];
if (ptr[j] > -1) {
j = ptr[j];
} else {
j = 0;
}
}
}
}
/* find the overlap array for the given pattern */
void findOverlap(char *word, int *ptr) {
int i = 2, j = 0, len = strlen(word);
ptr[0] = -1;
ptr[1] = 0;
while (i < len) {
if (word[i - 1] == word[j]) {
j = j + 1;
ptr[i] = j;
i = i + 1;
} else if (j > 0) {
j = ptr[j];
} else {
ptr[i] = 0;
i = i + 1;
}
}
return;
}
int main() {
char word[256], str[1024];;
int *ptr, i;
/* get the target string from the user */
printf("Enter your target string:");
fgets(str, 1024, stdin);
str[strlen(str) - 1] = '\0';
/* get the pattern string from the user */
printf("Enter your pattern string:");
fgets(word, 256, stdin);
word[strlen(word) - 1] = '\0';
/* dynamic memory allocation for overlap array */
ptr = (int *)calloc(1, sizeof(int) * (strlen(word)));
/* finds overlap array */
findOverlap(word, ptr);
/* find the index of the pattern in target string */
kmpSearch(str, word, ptr);
return 0;
}
Output:
jp@jp-VirtualBox:~/$ ./a.out
Enter your target string: abc abcdabcdabd
Enter your pattern string: abcdabd
abcdabd located at the index 9
Enter your target string: abc abcdabcdabd
Enter your pattern string: abcdabd
abcdabd located at the index 9
Download?
ReplyDeleteError at line 10 saying function call missing and 48 declaration not allowed...can pls say wts wrng
ReplyDeleteError at line 10 saying function call missing and 48 declaration not allowed...can pls say wts wrng
ReplyDelete