收藏 分享(赏)

算法设计与分析19StrMatch.ppt

上传人:lxhqcj 文档编号:7281984 上传时间:2019-05-12 格式:PPT 页数:29 大小:262KB
下载 相关 举报
算法设计与分析19StrMatch.ppt_第1页
第1页 / 共29页
算法设计与分析19StrMatch.ppt_第2页
第2页 / 共29页
算法设计与分析19StrMatch.ppt_第3页
第3页 / 共29页
算法设计与分析19StrMatch.ppt_第4页
第4页 / 共29页
算法设计与分析19StrMatch.ppt_第5页
第5页 / 共29页
点击查看更多>>
资源描述

1、String Matching,Algorithm : Design & Analysis 19,In the last class,Optimal Binary Search Tree Separating Sequence of Word Dynamic Programming Algorithms,String Matching,Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan,String Matching: Problem Description,Search the text T, a s

2、tring of characters of length n For the pattern P, a string of characters of length m (usually, mn) The result If T contains P as a substring, returning the index starting the substring in T Otherwise: fail,Straightforward Solution,t1 ti ti+k-2 ti+k-1 ti+m-1 tn,p1 pk-1 pk pm,T :,P :,?,First matched

3、character,Matched window Expanding to right,Next comparison,Note: If it fails to match pk to ti+k-1, then backtracking occurs, a cycle of new matching of characters starts from ti+1.In the worst case, nearly n backtracking occurs and there are nearly m-1 comparisons in one cycle, so (mn),Disadvantag

4、es of Backtracking,More comparisons are needed Up to m-1 most recently matched characters have to be readily available for re-examination. (Considering those text which are too long to be loaded in entirety),An Intuitive Finite Automaton for Matching a Given Pattern,1,2,3,4,*,start node,stop node ma

5、tched!,B,C,C,B,C,A,B,A,A,A,B,C,Automaton for pattern “AABC”,Alphabet=A,B,C,Advantage: each character in the text is checked only once Difficulty: Construction of the automaton too many edges(for a large alphabet) to defined and stored,Why no backtracking? Memorize the prefix.,The Knuth-Morris-Pratt

6、Flowchart,Get next text char.,A,A,B,B,B,C,*,1,2,3,4,5,6,An example: T=“ACABAABABA”, P=“ABABCB”,Success,Failure,P: ABABABCBT: . ABABAB x ,Matched Frame,matched frame,to be compared next,If x is not C,P: ABAB ABCBT: . ABABAB x ,The matched frame move to right for 2 chars, which is equal to moving the

7、pointers backward.,P: ABABABCBT: . ABABABABCB ,Moving for 4 chars may result in error.,Matched frame slides, with its breadth changed as well: p1 pr-1 pr p1 pk-r+1 pk-1t1 ti pj-r+1 tj-1 tj ,Sliding the Matched Frame,When dismatching occurs: p1 pk-1 pk t1 ti tj-1 tj ,Matched frame,Dismatching,New mat

8、ched frame,Next comparison,As large as possible.,Fail Links,Out of each node of KMP flowchart is a fail link, leading to node r, where r is the largest non-negative interger satisfying rk and p1,pr-1 matches pk-r+1,pk-1. (stored in failk)Note: r is independent of T.,k,r,k-r,P,P,pointer for P backwar

9、d,pointer for T forward,Which means: When fail at node k, next comparison is pk vs. pr,Computing the Fail Links,Thinking recursively, let failk-1=s: p1 ps-1 ps ps+1 p1 pk-r+1 pk-2 pk-1 pk pm,To be compared,Matched,Case 1ps=pk-1 failk=s+1,Case 2: pspk-1p1 pfails-1 pfailsp1 ps-1 ps ps+1 p1 pk-r+1 pk-2

10、 pk-1 pk pm,To be compared and thinking recursively,Recursion on Node fails,Thinking recursively, at the beginning, s=failk-1:,Case 2: pspk-1p1 pfails-1 pfailsp1 ps-1 ps ps+1 p1 pk-r+1 pk-2 pk-1 pk pm,ps is replaced by pfails, that is, new value assumed for s,Then, proceeding on new s, that is: If c

11、ase 1 applys (ps=pk-1): failk=s+1, or If case 2 applys (pspk-1): another new s,Computing Fail Links: an Example,Constructing the KMP flowchart for P = “ABABABCB”,Assuming that fail1 to fail6 has been computed,Get next text char.,A,A,B,B,A,B,C,B,*,0,3,4,5,6,7,8,9,1,2,fail7: fail6=4, and p6=p4, fail7=

12、fail6+1=5 (case 1) fail8: fail7=5, but p7p5, so, let s=fail5=3, but p7p3, keeping back, let s=fail3=1. Still p7p1. Further, let s=fail1=0, so, fail8=0+1=1.(case 2),Constructing KMP Flowchart,Input: P, a string of characters; m, the length of P Output: fail, the array of failure links, filledvoid kmp

13、Setup (char P, int m, int fail)int k, s;fail1=0;for (k=2; km; k+)s=failk-1;while (s1)if (ps= = pk-1)break;s=fails;failk=s+1;,For loop executes m-1 times, and while loop executes at most m times since fails is always less than s. So, the complexity is roughly O(m2),Number of Character Comparisons,Suc

14、cess comparison: at most once for a specified k, totaling at most m-1,Unsuccess comparison: Always followed by decreasing of s. Since: s is initialed as 0, s increases by one each time s is never negative So, the counting of decreasing can not be larger than that of increasing,fail1=0;for (k=2; km;

15、k+)s=failk-1;while (s1)if (ps= = pk-1)break;s=fails;failk=s+1;,These 2 lines combine to increase s by 1, done m-2 times,2m-3,Input: P and T, the pattern and text; m, the length of P; fail: the array of failure links for P. Output: index in T where a copy of P begins, or -1 if no match int kmpScan(ch

16、ar P, char T, int m, int fail)int match, j,k; /j indexes T, and k indexes Pmatch=-1; j=1; k=1;while (endText(T,j)=false)if (km) match=j-m; break;if (k= =0) j+; k=1;else if ( tj= =pk) j+; k+; /one character matchedelse k=failk; /following the failure linkreturn match,KMP Scan: the Algorithm,Each time

17、 a new cycle begins, p1,pk-1 matched,Executed at most 2n times, why?,Skipping Characters in String Matching,If you wish to understand others you must ,must,must,must,must,Checking the characters in P, in reverse order,must,must,must,must,must,must,must,must,The copy of the P begins at t38. Matching

18、is achieved in 18 comparisons,Distance of Jumping Forward,With the knowledge of P, the distance of jumping forward for the pointer of T is determined by the character itself, independent of the location in T.,p1 A A pm,p1 A A ps pm,t1 tj=A tn,current j,new j,Rightmost A,charJumpA = m-k,=pk,Computing

19、 the Jump: Algorithm,Input: Pattern string P; m, the length of P; alphabet size alpha=| Output: Array charJump, indexed 0, alpha-1, storing the jumping offsets for each char in alphabet.,void computeJumps(char P, int m, int alpha, int charJumpchar ch;int k;for (ch=0; chalpha; ch+)charJumpch=m; /For

20、all char no in P, jump by mfor (k=1; km; k+)charJumppk=m-k;,The increasing order of k ensure that for duplicating symbols in P, the jump is computed according to the rightmost,(|+m),Partially Matched Substring,P: b a t s a n d c a t sT: d a t s ,matched suffix,Current j charJumpd=4,New j Move only 1

21、 char,Remember the matched suffix, we can get a better jump,P: b a t s a n d c a t sT: d a t s ,New j Move 7 chars,Forward to Match the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,Substring same as the matched suffix occurs in P,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,N

22、ew j,slidek,matchJumpk,Partial Match for the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,No entire substring same as the matched suffix occurs in P,p1 pq pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,May be empty,matchjump and slide,slidek: the distance P slides forward

23、 after dismatch at pk, with m-k chars matched to the rightmatchjumpk: the distance j, the pointer of P, jumps, that is: matchjumpk=slidek+m-kLet r(rk) be the largest index, such that pr+1 starts a largest substring matching the matched suffix of P, and prpk, then slidek=k-rIf the r not found, the lo

24、ngest prefix of P, of length q, matching the matched suffix of P will be lined up. Then slidek=m-q.,Computing matchJump: Example,P = “ w o w w o w ”,matchJump6=1,Direction of computing,w o w w o w,t1 tj ,Matched is empty,w o w w o w,matchJump5=3,w o w w o w,t1 tj w ,Matched is 1,w o w w o w,Slide6=1

25、 (m-k)=0,pk,pk,Slide5=5-3=2 (m-k)=1,Computing matchJump: Example,P = “ w o w w o w ”,matchJump4=7,Direction of computing,w o w w o w,t1 tj o w ,Matched is 2,w o w w o w,matchJump3=6,w o w w o w,t1 tj w o w ,Matched is 3,w o w w o w,Not lined up,=pk,No found, but a prefix of length 1, so, Slide4 = m-

26、1=5,pk,Slide3=3-0=3 (m-k)=3,Computing matchJump: Example,P = “ w o w w o w ”,matchJump2=7,Direction of computing,w o w w o w,t1 tj w w o w ,Matched is 4,w o w w o w,matchJump1=8,w o w w o w,t1 tj o w w o w ,Matched is 5,w o w w o w,No found, but a prefix of length 3, so, Slide2 = m-3=3,No found, but

27、 a prefix of length 3, so, Slide1 = m-3=3,The Boyer-Moore Algorithm,Void computeMatchjumps(char P, int m, int matchjump)int k, r, s, low, shift; int sufx=new intm+1for (k=1; km; k+) matchjumpk=m+1; sufxm=m+1;for (k=m-1; k0; k-)s=sufixk+1while (sm)if (pk+1=ps) break;matchjumps = min (matchjumps, s-(k+1);s = sufxk;sufxk=s-1;,Sufxk=x means a substring starting from pk+1 matches suffix starting from px+1,Computing slidek,/ computing prefix length is necessary; / change slide value to matchjump by addition;,Home Assignment,pp.508- 11.4 11.8 11.9 11.13 11.18,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 网络科技 > 数据结构与算法

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报