彻底弄明白之数据结构中的KMP算法
2014-03-08 21:23
399 查看
如何加速朴素查找算法?KMP,当然还有其他算法,后续介绍.
Tryingtomatchstring[i]==pattern[j].
Givenasearchpattern,pre-buildatable,next[j],showing,whenthereisamismatchatpatternpositionj,wheretoresetjto.
Ifmatchfails,keepisame,resetjtopositionnext[j].
Ifmismatchstring[i]!=pattern[1],weleaveithesame,j=0
pattern=10
string=...1100000
Ifmismatchstring[i]!=pattern[2],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[1]
pattern=110
string=...11100000
istayssame,jgoesfrom2backto1
pattern=100
string=...10100000
istayssame,jgoesfrom2backto0
Ifmismatchstring[i]!=pattern[j],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[j-1]
Givenacertainpattern,constructatableshowingwheretoresetjto.
next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]”
next[j]=“最大匹配的子串的长度"
Thatis:
prefixmustincludepattern[0]
suffixmustincludepattern[j]
prefixandsuffixaredifferent
key
Exampleforpattern“ABABAC":
next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]”
当j+1位与s[k]位比较,不匹配时
j'=next[j],j’和s[k]比较了,j’移到了原j+1的位置
Givenj,letn=next[j]
"pattern[0]..pattern[n-1]"="pattern[j-(n-1)]..pattern[j]"
"pattern[0]..pattern[next[j]-1]"="pattern[j-(next[j]-1)]..pattern[j]"
e.g.j=4,n=3,
"pattern[0]..pattern[2]"="pattern[2]..pattern[4]"
Ifmatchfailsatpositionj+1(comparewiths[j+1]),keepisame,resetpatterntopositionn(next[j]).
Havealreadymatchedpattern[0]..pattern[n-1],pattern[0]..pattern[n-1]=pattern[1]..pattern
e.g.WehavematchedABABAsofar.
Ifnextonefails,saywehavematchedABAsofarandthenseeifnextonematches.
Thatis,keepisame,justresetjto3(=preciselylengthoflongestprefix-suffixmatch)
Then,ifmatchafterABAfailstoo,bythesamerulewesaywehavematchedAsofar,resettoj=1,andtryagainfromthere.
Inotherwords,itstartsbytryingtomatchthelongestprefix-suffix,butifthatfailsitworksdowntotheshorteronesuntilexhausted(noprefix-suffixmatchesleft).
pattern[0]...pattern[m-1]
Here,iandjbothindexpattern.
就是说是两个模式串在比较
Knuth–Morris–Prattstringsearchalgorithm
StartatLHSofstring,string[0],tryingtomatchpattern,workingright.Tryingtomatchstring[i]==pattern[j].
Givenasearchpattern,pre-buildatable,next[j],showing,whenthereisamismatchatpatternpositionj,wheretoresetjto.
Ifmatchfails,keepisame,resetjtopositionnext[j].
Howtobuildthetable
Everythingelsebelowisjusthowtobuildthetable.Constructatableshowingwheretoresetjto
Ifmismatchstring[i]!=pattern[0],justmovestringtoi+1,j=0Ifmismatchstring[i]!=pattern[1],weleaveithesame,j=0
pattern=10
string=...1100000
Ifmismatchstring[i]!=pattern[2],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[1]
pattern=110
string=...11100000
istayssame,jgoesfrom2backto1
pattern=100
string=...10100000
istayssame,jgoesfrom2backto0
Ifmismatchstring[i]!=pattern[j],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[j-1]
Givenacertainpattern,constructatableshowingwheretoresetjto.
Constructatableofnext[j]
Foreachj,figureout:next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]”
next[j]=“最大匹配的子串的长度"
Thatis:
prefixmustincludepattern[0]
suffixmustincludepattern[j]
prefixandsuffixaredifferent
key
Exampleforpattern“ABABAC":
next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]”
当j+1位与s[k]位比较,不匹配时
j'=next[j],j’和s[k]比较了,j’移到了原j+1的位置
j | 0 | 1 | 2 | 3 | 4 | 5 |
substring0toj | A | AB | ABA | ABAB | ABABA | ABABAC |
longestprefix-suffixmatch | none | none | A | AB | ABA | none |
next[j] | 0 | 0 | 1 | 2 | 3 | 0 |
notes | noprefixandsuffixthataredifferent i.e.next[0]=0forallpatterns |
"pattern[0]..pattern[n-1]"="pattern[j-(n-1)]..pattern[j]"
"pattern[0]..pattern[next[j]-1]"="pattern[j-(next[j]-1)]..pattern[j]"
e.g.j=4,n=3,
"pattern[0]..pattern[2]"="pattern[2]..pattern[4]"
Ifmatchfailsatpositionj+1(comparewiths[j+1]),keepisame,resetpatterntopositionn(next[j]).
Havealreadymatchedpattern[0]..pattern[n-1],pattern[0]..pattern[n-1]=pattern[1]..pattern
e.g.WehavematchedABABAsofar.
Ifnextonefails,saywehavematchedABAsofarandthenseeifnextonematches.
Thatis,keepisame,justresetjto3(=preciselylengthoflongestprefix-suffixmatch)
Then,ifmatchafterABAfailstoo,bythesamerulewesaywehavematchedAsofar,resettoj=1,andtryagainfromthere.
Inotherwords,itstartsbytryingtomatchthelongestprefix-suffix,butifthatfailsitworksdowntotheshorteronesuntilexhausted(noprefix-suffixmatchesleft).
Algorithmtoconstructtableofnext[j]
Dothisonce,whenthepatterncomesin.pattern[0]...pattern[m-1]
Here,iandjbothindexpattern.
就是说是两个模式串在比较
next[0]=0 i=1 //on1stepi=1,j=0 //比如[0],[1],[2]===[4],[5][6] //这时[3]<>[7] //maybethereisanotherpatternwecanshiftrightthough,就是前缀和后缀 j=next[j-1]//因为next[j]就是给j+1用的,这个可记为定律,并且用j-1的原因还有0到[j-1]才有前后缀匹配的概念, //j是没有和模式串中的前缀匹配的,画画图就知道了 } //模式串的下标为0时,与文本串s的下标i的值不匹配,i右移一位,模式串右移一位,0右移还是0 next[i]=0 |
相关文章推荐
- 数据结构与算法分析–Minimum Spanning Tree(最小生成树)
- 数据结构_不相交集合_绘制迷宫
- CUGB专题训练之数据结构:E - Keywords Search(HDU 2222 AC自动机经典入门模板题)
- 稀疏矩阵Compressed Row Storage存储格式
- 数据结构:循环队列--Java实现
- 快速排序的优化 - 数据结构和算法97
- HashMap的工作原理
- 《数据结构》必看知识点
- 数据结构面试之六——二叉树的常见操作2(非递归遍历&二叉排序树)
- 数据结构面试之五—二叉树的常见操作(递归实现部分
- 数据结构面试之四——队列的常见操作
- 数据结构面试之三——栈的常见操作
- 数据结构面试之二——双向链表表、循环链表、有序链表的常见操作
- 数据结构面试之一——单链表常见操作
- Lua 的数据结构
- Inna and Binary Logic
- 算法2.2
- 数据结构--线段树--区间涂色问题
- 数据结构--线段树--区间涂色问题
- 数据结构:循环队列(C语言实现)