Final Project, Preliminary Presentations
Input.
TEXT
PATTERN
Output.
TEXT contains PATTERN as a substringPATTERN in TEXT
-1 if PATTERN does not appearidx = 0;
matches = 0;
while (idx < TEXT.length - PATTERN.length) {
if (matches == PATTERN.length) return idx;
if (TEXT[idx + matches] == PATTERN[matches]) {
matches++;
} else {
idx++;
matches = 0;
}
return (matches == PATTERN.length) ? idx : -1;
}
lec21-naive-pattern-matching.zipResetting the state of the algorithm
Method should be called by update buttons (Input Group), but implementation should be handled by algorithm (Algo Group)
resetState methodThroughout:
TEXT
PATTERN
Last Time, we showed worst-case running time is $\Theta(n \cdot m)$.
TEXT = 'aaaaaaaaaaaaa...a'PATTERN = 'aaaa....ab'Question. What redundant/unnecessary work is being done by the algorithm?
What does naive pattern matching do?

Suppose the following string is matched up to index i = 3, and mismatched at index i = 4. What should our next comparison be?

Suppose the following string is matched up to index i = 4, and mismatched at index i = 5. What should be our next comparison be?



Question. Can we perform pattern matching search in such a way that the textIndex never decreases?
TEXT, then we know they are the same as the previous characters in PATTERN

Question. Suppose we’ve matched $P_{k}$ with our text, but $P[k+1]$ is a mismatch. Under what condition can we match $P_i$ with our text?

Question. Suppose we’ve matched $P_{k}$ with our text, but $P[k+1]$ is a mismatch. Under what condition can we match $P_i$ with our text?
Answer. We can match $P_i$ with the text if $P_i$ is a suffix of $P_k$

Definition. Given a pattern $P$ of length $m$, the associated prefix function is an array $\pi$ of length $m$ defined as follows:
Write the prefix function of this pattern:

Question. Given the prefix function $\pi$, how can we compute matches faster?
Idea.
Use matches and $\pi$ to do more efficient shifts:
In Pseudo-code!
T a text of length n, P a pattern of length m, pi the prefix function of P
let matched = 0
for (i from 0 to n - 1):
while matched > 0 and P[matched+1] != T[i]
matched = pi[matched]
if P[matched] == T[i]
matched++
if matched == m
return i
lec21-kmp-pattern-matching
Question. What is the running time of the method?
let matched = 0
for (i from 0 to n - 1):
while matched > 0 and P[matched+1] != T[i]
matched = pi[matched]
if P[matched] == T[i]
matched++
if matched == m
return i
Observations.
while loop does at most matched iterationsk iterations, matched must be incremented k timesfor loop increments k oncewhile loop iterations is $\leq$ number of for loop iterationComputing $\pi$ of $P$ efficiently