ImageVerifierCode 换一换
格式:PPT , 页数:29 ,大小:262KB ,
资源ID:7281984      下载积分:10 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.docduoduo.com/d-7281984.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录   微博登录 

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(算法设计与分析19StrMatch.ppt)为本站会员(lxhqcj)主动上传,道客多多仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知道客多多(发送邮件至docduoduo@163.com或直接QQ联系客服),我们立即给予删除!

算法设计与分析19StrMatch.ppt

1、String Matching,Algorithm : Design & Analysis 19,In the last class,Optimal Binary Search Tree Separating Sequence of Word Dynamic Programming Algorithms,String Matching,Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan,String Matching: Problem Description,Search the text T, a s

2、tring of characters of length n For the pattern P, a string of characters of length m (usually, mn) The result If T contains P as a substring, returning the index starting the substring in T Otherwise: fail,Straightforward Solution,t1 ti ti+k-2 ti+k-1 ti+m-1 tn,p1 pk-1 pk pm,T :,P :,?,First matched

3、character,Matched window Expanding to right,Next comparison,Note: If it fails to match pk to ti+k-1, then backtracking occurs, a cycle of new matching of characters starts from ti+1.In the worst case, nearly n backtracking occurs and there are nearly m-1 comparisons in one cycle, so (mn),Disadvantag

4、es of Backtracking,More comparisons are needed Up to m-1 most recently matched characters have to be readily available for re-examination. (Considering those text which are too long to be loaded in entirety),An Intuitive Finite Automaton for Matching a Given Pattern,1,2,3,4,*,start node,stop node ma

5、tched!,B,C,C,B,C,A,B,A,A,A,B,C,Automaton for pattern “AABC”,Alphabet=A,B,C,Advantage: each character in the text is checked only once Difficulty: Construction of the automaton too many edges(for a large alphabet) to defined and stored,Why no backtracking? Memorize the prefix.,The Knuth-Morris-Pratt

6、Flowchart,Get next text char.,A,A,B,B,B,C,*,1,2,3,4,5,6,An example: T=“ACABAABABA”, P=“ABABCB”,Success,Failure,P: ABABABCBT: . ABABAB x ,Matched Frame,matched frame,to be compared next,If x is not C,P: ABAB ABCBT: . ABABAB x ,The matched frame move to right for 2 chars, which is equal to moving the

7、pointers backward.,P: ABABABCBT: . ABABABABCB ,Moving for 4 chars may result in error.,Matched frame slides, with its breadth changed as well: p1 pr-1 pr p1 pk-r+1 pk-1t1 ti pj-r+1 tj-1 tj ,Sliding the Matched Frame,When dismatching occurs: p1 pk-1 pk t1 ti tj-1 tj ,Matched frame,Dismatching,New mat

8、ched frame,Next comparison,As large as possible.,Fail Links,Out of each node of KMP flowchart is a fail link, leading to node r, where r is the largest non-negative interger satisfying rk and p1,pr-1 matches pk-r+1,pk-1. (stored in failk)Note: r is independent of T.,k,r,k-r,P,P,pointer for P backwar

9、d,pointer for T forward,Which means: When fail at node k, next comparison is pk vs. pr,Computing the Fail Links,Thinking recursively, let failk-1=s: p1 ps-1 ps ps+1 p1 pk-r+1 pk-2 pk-1 pk pm,To be compared,Matched,Case 1ps=pk-1 failk=s+1,Case 2: pspk-1p1 pfails-1 pfailsp1 ps-1 ps ps+1 p1 pk-r+1 pk-2

10、 pk-1 pk pm,To be compared and thinking recursively,Recursion on Node fails,Thinking recursively, at the beginning, s=failk-1:,Case 2: pspk-1p1 pfails-1 pfailsp1 ps-1 ps ps+1 p1 pk-r+1 pk-2 pk-1 pk pm,ps is replaced by pfails, that is, new value assumed for s,Then, proceeding on new s, that is: If c

11、ase 1 applys (ps=pk-1): failk=s+1, or If case 2 applys (pspk-1): another new s,Computing Fail Links: an Example,Constructing the KMP flowchart for P = “ABABABCB”,Assuming that fail1 to fail6 has been computed,Get next text char.,A,A,B,B,A,B,C,B,*,0,3,4,5,6,7,8,9,1,2,fail7: fail6=4, and p6=p4, fail7=

12、fail6+1=5 (case 1) fail8: fail7=5, but p7p5, so, let s=fail5=3, but p7p3, keeping back, let s=fail3=1. Still p7p1. Further, let s=fail1=0, so, fail8=0+1=1.(case 2),Constructing KMP Flowchart,Input: P, a string of characters; m, the length of P Output: fail, the array of failure links, filledvoid kmp

13、Setup (char P, int m, int fail)int k, s;fail1=0;for (k=2; km; k+)s=failk-1;while (s1)if (ps= = pk-1)break;s=fails;failk=s+1;,For loop executes m-1 times, and while loop executes at most m times since fails is always less than s. So, the complexity is roughly O(m2),Number of Character Comparisons,Suc

14、cess comparison: at most once for a specified k, totaling at most m-1,Unsuccess comparison: Always followed by decreasing of s. Since: s is initialed as 0, s increases by one each time s is never negative So, the counting of decreasing can not be larger than that of increasing,fail1=0;for (k=2; km;

15、k+)s=failk-1;while (s1)if (ps= = pk-1)break;s=fails;failk=s+1;,These 2 lines combine to increase s by 1, done m-2 times,2m-3,Input: P and T, the pattern and text; m, the length of P; fail: the array of failure links for P. Output: index in T where a copy of P begins, or -1 if no match int kmpScan(ch

16、ar P, char T, int m, int fail)int match, j,k; /j indexes T, and k indexes Pmatch=-1; j=1; k=1;while (endText(T,j)=false)if (km) match=j-m; break;if (k= =0) j+; k=1;else if ( tj= =pk) j+; k+; /one character matchedelse k=failk; /following the failure linkreturn match,KMP Scan: the Algorithm,Each time

17、 a new cycle begins, p1,pk-1 matched,Executed at most 2n times, why?,Skipping Characters in String Matching,If you wish to understand others you must ,must,must,must,must,Checking the characters in P, in reverse order,must,must,must,must,must,must,must,must,The copy of the P begins at t38. Matching

18、is achieved in 18 comparisons,Distance of Jumping Forward,With the knowledge of P, the distance of jumping forward for the pointer of T is determined by the character itself, independent of the location in T.,p1 A A pm,p1 A A ps pm,t1 tj=A tn,current j,new j,Rightmost A,charJumpA = m-k,=pk,Computing

19、 the Jump: Algorithm,Input: Pattern string P; m, the length of P; alphabet size alpha=| Output: Array charJump, indexed 0, alpha-1, storing the jumping offsets for each char in alphabet.,void computeJumps(char P, int m, int alpha, int charJumpchar ch;int k;for (ch=0; chalpha; ch+)charJumpch=m; /For

20、all char no in P, jump by mfor (k=1; km; k+)charJumppk=m-k;,The increasing order of k ensure that for duplicating symbols in P, the jump is computed according to the rightmost,(|+m),Partially Matched Substring,P: b a t s a n d c a t sT: d a t s ,matched suffix,Current j charJumpd=4,New j Move only 1

21、 char,Remember the matched suffix, we can get a better jump,P: b a t s a n d c a t sT: d a t s ,New j Move 7 chars,Forward to Match the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,Substring same as the matched suffix occurs in P,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,N

22、ew j,slidek,matchJumpk,Partial Match for the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,No entire substring same as the matched suffix occurs in P,p1 pq pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,May be empty,matchjump and slide,slidek: the distance P slides forward

23、 after dismatch at pk, with m-k chars matched to the rightmatchjumpk: the distance j, the pointer of P, jumps, that is: matchjumpk=slidek+m-kLet r(rk) be the largest index, such that pr+1 starts a largest substring matching the matched suffix of P, and prpk, then slidek=k-rIf the r not found, the lo

24、ngest prefix of P, of length q, matching the matched suffix of P will be lined up. Then slidek=m-q.,Computing matchJump: Example,P = “ w o w w o w ”,matchJump6=1,Direction of computing,w o w w o w,t1 tj ,Matched is empty,w o w w o w,matchJump5=3,w o w w o w,t1 tj w ,Matched is 1,w o w w o w,Slide6=1

25、 (m-k)=0,pk,pk,Slide5=5-3=2 (m-k)=1,Computing matchJump: Example,P = “ w o w w o w ”,matchJump4=7,Direction of computing,w o w w o w,t1 tj o w ,Matched is 2,w o w w o w,matchJump3=6,w o w w o w,t1 tj w o w ,Matched is 3,w o w w o w,Not lined up,=pk,No found, but a prefix of length 1, so, Slide4 = m-

26、1=5,pk,Slide3=3-0=3 (m-k)=3,Computing matchJump: Example,P = “ w o w w o w ”,matchJump2=7,Direction of computing,w o w w o w,t1 tj w w o w ,Matched is 4,w o w w o w,matchJump1=8,w o w w o w,t1 tj o w w o w ,Matched is 5,w o w w o w,No found, but a prefix of length 3, so, Slide2 = m-3=3,No found, but

27、 a prefix of length 3, so, Slide1 = m-3=3,The Boyer-Moore Algorithm,Void computeMatchjumps(char P, int m, int matchjump)int k, r, s, low, shift; int sufx=new intm+1for (k=1; km; k+) matchjumpk=m+1; sufxm=m+1;for (k=m-1; k0; k-)s=sufixk+1while (sm)if (pk+1=ps) break;matchjumps = min (matchjumps, s-(k+1);s = sufxk;sufxk=s-1;,Sufxk=x means a substring starting from pk+1 matches suffix starting from px+1,Computing slidek,/ computing prefix length is necessary; / change slide value to matchjump by addition;,Home Assignment,pp.508- 11.4 11.8 11.9 11.13 11.18,

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报