收藏 分享(赏)

算法设计与分析20BoyMoore.ppt

上传人:lxhqcj 文档编号:7281983 上传时间:2019-05-12 格式:PPT 页数:22 大小:203KB
下载 相关 举报
算法设计与分析20BoyMoore.ppt_第1页
第1页 / 共22页
算法设计与分析20BoyMoore.ppt_第2页
第2页 / 共22页
算法设计与分析20BoyMoore.ppt_第3页
第3页 / 共22页
算法设计与分析20BoyMoore.ppt_第4页
第4页 / 共22页
算法设计与分析20BoyMoore.ppt_第5页
第5页 / 共22页
点击查看更多>>
资源描述

1、String Matching II,Algorithm : Design & Analysis 20,In the last class,Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan,String Matching II,Boyer-Moores heuristics Skipping unnecessary comparison Combining fail match knowledge into jump Horspool Algorithm Boyer-Moore Algorithm,S

2、kipping over Characters in Text,Longer pattern cantains more information about impossible positions in the text. For example: if we know that the pattern doesnt contain a specific character. It doesnt make the best use of the information by examining characters one by one forward in the text.,An Exa

3、mple,If you wish to understand others you must ,must,must,must,must,Checking the characters in P, in reverse order,must,must,must,must,must,must,must,must,The copy of the P begins at t38. Matching is achieved in 18 comparisons,just passed by,match,mismatch,Distance of Jumping Forward,With the knowle

4、dge of P, the distance of jumping forward for the pointer of T is determined by the character itself, independent of the location in T.,p1 A A pm,p1 A A ps pm,current j,new j,Rightmost A, at location pk,charJumpA = m-k,m-k,t1 tj=A tr tn,next scan,Computing the Jump: Algorithm,Input: Pattern string P

5、; m, the length of P; alphabet size alpha=| Output: Array charJump, indexed 0, alpha-1, storing the jumping offsets for each char in alphabet.,void computeJumps(char P, int m, int alpha, int charJumpchar ch;int k;for (ch=0; chalpha; ch+)charJumpch=m; /For all char no in P, jump by mfor (k=1; km; k+)

6、charJumppk=m-k;,The increasing order of k ensure that for duplicating symbols in P, the jump is computed according to the rightmost,(|+m),Scan by CharJump: Horspools Algorithm,int horspoolScan(char P, char T, int m, int charjump)int j=m-1, k, match=-1;while (endText(T,j) = = false) /up to n loopsk=0

7、;while (km and Pm-k-1 = = Tj-k)/up to m loopsk+;if (k= = m) match=j-m; break;else j=j+charjumpTj;return match;,So, in the worst case: (mn),Partially Matched Substring,P: b a t s a n d c a t sT: d a t s ,matched suffix,Current j charJumpd=4,New j Move only 1 char,Remember the matched suffix, we can g

8、et a better jump,P: b a t s a n d c a t sT: d a t s ,New j Move 7 chars,And cat will be over ats, dismatch expected,scan backward,New cycle of scanning,Basic Idea,T: the text,tj,mismatch,matched,matched suffix,Forward to Match the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,Substring

9、same as the matched suffix occurs in P,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,Partial Match for the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,No entire substring same as the matched suffix occurs in P,p1 pq pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j

10、,New j,slidek,matchJumpk,May be empty,matchjump and slide,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,slidek: the distance P slides forward after dismatch at pk, with m-k chars matched to the rightmatchjumpk: the distance j, the pointer of P, jumps, that is: matchj

11、umpk=slidek+m-k,Length of the frame is m-k,Determining the slide,Let r(rk) be the largest index, such that pr+1 starts a largest substring matching the matched suffix of P, and prpk, then slidek=k-rIf the r not found, the longest prefix of P, of length q, matching the matched suffix of P will be lin

12、ed up. Then slidek=m-q.,pr=pk is senseless since pk is a mismatch,Computing matchJump: Example,P = “ w o w w o w ”,matchJump6=1,Direction of computing,w o w w o w,t1 tj ,Matched is empty,w o w w o w,matchJump5=3,w o w w o w,t1 tj w ,Matched is 1,w o w w o w,Slide6=1 (m-k)=0,pk,pk,Slide5=5-3=2 (m-k)=

13、1,Computing matchJump: Example,P = “ w o w w o w ”,matchJump4=7,Direction of computing,w o w w o w,t1 tj o w ,Matched is 2,w o w w o w,matchJump3=6,w o w w o w,t1 tj w o w ,Matched is 3,w o w w o w,Not lined up,=pk,No found, but a prefix of length 1, so, Slide4 = m-1=5,pk,Slide3=3-0=3 (m-k)=3,Comput

14、ing matchJump: Example,P = “ w o w w o w ”,matchJump2=7,Direction of computing,w o w w o w,t1 tj w w o w ,Matched is 4,w o w w o w,matchJump1=8,w o w w o w,t1 tj o w w o w ,Matched is 5,w o w w o w,No found, but a prefix of length 3, so, Slide2 = m-3=3,No found, but a prefix of length 3, so, Slide1

15、= m-3=3,Finding r by Recursion,P,p1,pk,pk+1,pk+2,ps,sufxk+1=s,ps+1,Case 1: pk+1=pssufxk=sufxk+1-1,Case 2: pk+1 ps,recursively,Computing the slides: the Algorithm,for (k=1; km; k+) matchjumpk=m+1; sufxm=m+1;for (k=m-1; k0; k-)s=sufixk+1while (sm)if (pk+1= = ps) break;matchjumps = min (matchjumps, s-(

16、k+1);s = sufxs;sufxk=s-1;,initialized as impossible values,Remember: slidek=k-r here: k is s, and r is k+1,Computing the matchjump: Whole Procedure,void computeMatchjumps(char P, int m, int matchjump)int k,r,s,low,shift;int sufx = new intm+1low=1; shift=sufx0;while (shiftm)for (k=low; kshift; k+)mat

17、chjumpk = min(matchjumpk, shift);low=shift+1; shift=sufxshift;for (k=1; km; k+)matchjumpk+=(m-k);return,computing slides for sufix matched shorter prefix,turn into matchjump by adding m-k,Boyer-Moore Scan Algorithm,int boyerMooreScan(char P, char T, int charjump, int matchjump)int match, j, k;match=-1;j=m; k=m; / first comparison locationwhile (endText(T,j) =false)if (k1)match = j+1 /successbreak;if (tj = = pk ) j-; k-;elsej+=max(charjumptj, matchjumpk);k=m;return match;,scan from right to left,take the better of the two heuristics,Home Assignment,pp.508- 11.16 11.19 11.20 11.25,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 网络科技 > 数据结构与算法

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报