您的位置:首页 > 理论基础 > 数据结构算法

彻底弄明白之数据结构中的KMP算法

2014-03-08 21:23 399 查看
如何加速朴素查找算法?KMP,当然还有其他算法,后续介绍.



Knuth–Morris–Prattstringsearchalgorithm

StartatLHSofstring,string[0],tryingtomatchpattern,workingright.
Tryingtomatchstring[i]==pattern[j].

Givenasearchpattern,pre-buildatable,next[j],showing,whenthereisamismatchatpatternpositionj,wheretoresetjto.

Ifmatchfails,keepisame,resetjtopositionnext[j].



Howtobuildthetable

Everythingelsebelowisjusthowtobuildthetable.

Constructatableshowingwheretoresetjto

Ifmismatchstring[i]!=pattern[0],justmovestringtoi+1,j=0

Ifmismatchstring[i]!=pattern[1],weleaveithesame,j=0
pattern=10
string=...1100000

Ifmismatchstring[i]!=pattern[2],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[1]
pattern=110
string=...11100000
istayssame,jgoesfrom2backto1

pattern=100
string=...10100000
istayssame,jgoesfrom2backto0

Ifmismatchstring[i]!=pattern[j],weleaveithesame,andchangej,butweneedtoconsiderrepeatsinpattern[0]..pattern[j-1]

Givenacertainpattern,constructatableshowingwheretoresetjto.



Constructatableofnext[j]

Foreachj,figureout:
next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]
next[j]=“最大匹配的子串的长度"
Thatis:
prefixmustincludepattern[0]

suffixmustincludepattern[j]

prefixandsuffixaredifferent

key

Exampleforpattern“ABABAC":

next[j]=lengthoflongestprefixin"pattern[0]..pattern[j-1]"thatmatchesthesuffixof"pattern[1]..pattern[j]

当j+1位与s[k]位比较,不匹配时

j'=next[j],j’和s[k]比较了,j’移到了原j+1的位置

j012345
substring0tojAABABAABABABABAABABAC
longestprefix-suffixmatchnonenoneAABABAnone
next[j]001230
notesnoprefixandsuffixthataredifferent
i.e.next[0]=0forallpatterns
Givenj,letn=next[j]
"pattern[0]..pattern[n-1]"="pattern[j-(n-1)]..pattern[j]"

"pattern[0]..pattern[next[j]-1]"="pattern[j-(next[j]-1)]..pattern[j]"

e.g.j=4,n=3,

"pattern[0]..pattern[2]"="pattern[2]..pattern[4]"

Ifmatchfailsatpositionj+1(comparewiths[j+1]),keepisame,resetpatterntopositionn(next[j]).
Havealreadymatchedpattern[0]..pattern[n-1],pattern[0]..pattern[n-1]=pattern[1]..pattern

e.g.WehavematchedABABAsofar.
Ifnextonefails,saywehavematchedABAsofarandthenseeifnextonematches.
Thatis,keepisame,justresetjto3(=preciselylengthoflongestprefix-suffixmatch)
Then,ifmatchafterABAfailstoo,bythesamerulewesaywehavematchedAsofar,resettoj=1,andtryagainfromthere.
Inotherwords,itstartsbytryingtomatchthelongestprefix-suffix,butifthatfailsitworksdowntotheshorteronesuntilexhausted(noprefix-suffixmatchesleft).



Algorithmtoconstructtableofnext[j]

Dothisonce,whenthepatterncomesin.
pattern[0]...pattern[m-1]
Here,iandjbothindexpattern.
就是说是两个模式串在比较



next[0]=0

i=1
j=0
m=pattern.length

while(i<m)
{

//on1stepi=1,j=0
if(pattern[j]==pattern[i])
{
next[i]=j+1//it’sinotj
i++
j++
}
else(pattern[j]!=pattern[i])
{
if(j>0){

//比如[0],[1],[2]===[4],[5][6]

//这时[3]<>[7]

//maybethereisanotherpatternwecanshiftrightthough,就是前缀和后缀

j=next[j-1]//因为next[j]就是给j+1用的,这个可记为定律,并且用j-1的原因还有0到[j-1]才有前后缀匹配的概念,

//j是没有和模式串中的前缀匹配的,画画图就知道了

}
else(j==0)
{

//模式串的下标为0时,与文本串s的下标i的值不匹配,i右移一位,模式串右移一位,0右移还是0

next[i]=0
i++
j=0//redundant,justtomakeitclearwhatweareloopingwith
}
}
}





                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: