1、String Matching II,Algorithm : Design & Analysis 20,In the last class,Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan,String Matching II,Boyer-Moores heuristics Skipping unnecessary comparison Combining fail match knowledge into jump Horspool Algorithm Boyer-Moore Algorithm,S
2、kipping over Characters in Text,Longer pattern cantains more information about impossible positions in the text. For example: if we know that the pattern doesnt contain a specific character. It doesnt make the best use of the information by examining characters one by one forward in the text.,An Exa
3、mple,If you wish to understand others you must ,must,must,must,must,Checking the characters in P, in reverse order,must,must,must,must,must,must,must,must,The copy of the P begins at t38. Matching is achieved in 18 comparisons,just passed by,match,mismatch,Distance of Jumping Forward,With the knowle
4、dge of P, the distance of jumping forward for the pointer of T is determined by the character itself, independent of the location in T.,p1 A A pm,p1 A A ps pm,current j,new j,Rightmost A, at location pk,charJumpA = m-k,m-k,t1 tj=A tr tn,next scan,Computing the Jump: Algorithm,Input: Pattern string P
5、; m, the length of P; alphabet size alpha=| Output: Array charJump, indexed 0, alpha-1, storing the jumping offsets for each char in alphabet.,void computeJumps(char P, int m, int alpha, int charJumpchar ch;int k;for (ch=0; chalpha; ch+)charJumpch=m; /For all char no in P, jump by mfor (k=1; km; k+)
6、charJumppk=m-k;,The increasing order of k ensure that for duplicating symbols in P, the jump is computed according to the rightmost,(|+m),Scan by CharJump: Horspools Algorithm,int horspoolScan(char P, char T, int m, int charjump)int j=m-1, k, match=-1;while (endText(T,j) = = false) /up to n loopsk=0
7、;while (km and Pm-k-1 = = Tj-k)/up to m loopsk+;if (k= = m) match=j-m; break;else j=j+charjumpTj;return match;,So, in the worst case: (mn),Partially Matched Substring,P: b a t s a n d c a t sT: d a t s ,matched suffix,Current j charJumpd=4,New j Move only 1 char,Remember the matched suffix, we can g
8、et a better jump,P: b a t s a n d c a t sT: d a t s ,New j Move 7 chars,And cat will be over ats, dismatch expected,scan backward,New cycle of scanning,Basic Idea,T: the text,tj,mismatch,matched,matched suffix,Forward to Match the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,Substring
9、same as the matched suffix occurs in P,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,Partial Match for the Suffix,p1 pk pk+1 pm,t1 tj tj+1 tn,Matched suffix,Dismatch,No entire substring same as the matched suffix occurs in P,p1 pq pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j
10、,New j,slidek,matchJumpk,May be empty,matchjump and slide,p1 pr pr+1 pr+m-k pm,p1 pk pk+1 pm,t1 tj tj+1 tn,Old j,New j,slidek,matchJumpk,slidek: the distance P slides forward after dismatch at pk, with m-k chars matched to the rightmatchjumpk: the distance j, the pointer of P, jumps, that is: matchj
11、umpk=slidek+m-k,Length of the frame is m-k,Determining the slide,Let r(rk) be the largest index, such that pr+1 starts a largest substring matching the matched suffix of P, and prpk, then slidek=k-rIf the r not found, the longest prefix of P, of length q, matching the matched suffix of P will be lin
12、ed up. Then slidek=m-q.,pr=pk is senseless since pk is a mismatch,Computing matchJump: Example,P = “ w o w w o w ”,matchJump6=1,Direction of computing,w o w w o w,t1 tj ,Matched is empty,w o w w o w,matchJump5=3,w o w w o w,t1 tj w ,Matched is 1,w o w w o w,Slide6=1 (m-k)=0,pk,pk,Slide5=5-3=2 (m-k)=
13、1,Computing matchJump: Example,P = “ w o w w o w ”,matchJump4=7,Direction of computing,w o w w o w,t1 tj o w ,Matched is 2,w o w w o w,matchJump3=6,w o w w o w,t1 tj w o w ,Matched is 3,w o w w o w,Not lined up,=pk,No found, but a prefix of length 1, so, Slide4 = m-1=5,pk,Slide3=3-0=3 (m-k)=3,Comput
14、ing matchJump: Example,P = “ w o w w o w ”,matchJump2=7,Direction of computing,w o w w o w,t1 tj w w o w ,Matched is 4,w o w w o w,matchJump1=8,w o w w o w,t1 tj o w w o w ,Matched is 5,w o w w o w,No found, but a prefix of length 3, so, Slide2 = m-3=3,No found, but a prefix of length 3, so, Slide1
15、= m-3=3,Finding r by Recursion,P,p1,pk,pk+1,pk+2,ps,sufxk+1=s,ps+1,Case 1: pk+1=pssufxk=sufxk+1-1,Case 2: pk+1 ps,recursively,Computing the slides: the Algorithm,for (k=1; km; k+) matchjumpk=m+1; sufxm=m+1;for (k=m-1; k0; k-)s=sufixk+1while (sm)if (pk+1= = ps) break;matchjumps = min (matchjumps, s-(
16、k+1);s = sufxs;sufxk=s-1;,initialized as impossible values,Remember: slidek=k-r here: k is s, and r is k+1,Computing the matchjump: Whole Procedure,void computeMatchjumps(char P, int m, int matchjump)int k,r,s,low,shift;int sufx = new intm+1low=1; shift=sufx0;while (shiftm)for (k=low; kshift; k+)mat
17、chjumpk = min(matchjumpk, shift);low=shift+1; shift=sufxshift;for (k=1; km; k+)matchjumpk+=(m-k);return,computing slides for sufix matched shorter prefix,turn into matchjump by adding m-k,Boyer-Moore Scan Algorithm,int boyerMooreScan(char P, char T, int charjump, int matchjump)int match, j, k;match=-1;j=m; k=m; / first comparison locationwhile (endText(T,j) =false)if (k1)match = j+1 /successbreak;if (tj = = pk ) j-; k-;elsej+=max(charjumptj, matchjumpk);k=m;return match;,scan from right to left,take the better of the two heuristics,Home Assignment,pp.508- 11.16 11.19 11.20 11.25,