1、Hashing,Algorithm : Design & Analysis 10,In the last class,Implementing Dictionary ADT Definition of red-black tree Black height Insertion into a red-black tree Deletion from a red-black tree,Hashing,Hashing Collision Handling for Hashing Closed Address Hashing Open Address Hashing Hash Functions Ar
2、ray Doubling and Amortized Analysis,Hashing: the Idea,Key Space,Hash Function,E0,E1,Eh-1,Value of a specific key,A calculated array index for the key,Very large, but only a small part is used in an application,In feasible size,Index distributionCollision handling,Ek,x,H(x)=k,Collision Handling: Clos
3、ed Address,Each address is a linked list,Closed Address: Analysis,Assumption: simple uniform hashing: for j=0,1,2,.,h-1, the average length of the list at Ej is n/h. The average cost of an unsuccessful search: Any key that is not in the table is equally likely to hash to any of the h address. The av
4、erage cost to determine that the key is not in the list Eh(k) is the cost to search to the end of the list. So, the cost of an unsuccessful search is n/h.,Closed Address: Analysis(cont.),The average cost of a successful search: Define =n/h as load factor,Number of elements in front of the searched o
5、ne in the same linked list.,Cost for computing hashing,Collision Handling: Open Address,All elements are stored in the hash table, no linked list is used. So, , the load factor, can not be larger than 1. Collision is settled by “rehashing”: a function is used to get a new hashing address for each co
6、llided address, i.e. the hash table slots are probed successively, until a valid location is found. The probe sequence can be seen as a permutation of (0,1,2,., h-1),Commonly Used Probing,Linear probing:Given an ordinary hash function h, which is called an auxiliary hash function, the hash function
7、is: h(k,i) = (h(k)+i) mod m (i=0,1,.,m-1) Quadratic Probing:Given auxiliary function h and nonzero auxiliary constant c1 and c2, the hash function is: h(k,i) = (h(k)+c1i+ c2i2) mod m (i=0,1,.,m-1) Double hashing:Given auxiliary functions h1 and h2, the hash function is: h(k,i) = (h1(k)+ ih2(k) mod m
8、 (i=0,1,.,m-1),Linear Probing: an Example,H,Index,0,1,2,3,4,5,6,7,Hash function: h(x)=5x mod 8,1055,1492,1776,1918,1812,1945,Rehash function: rh(j)=(j+1) mod 8,hashing,rehashing,1812,chain of rehashings,1945,hashing,Equally Likely Permutations,Assumption: each key is equally likely to have any of th
9、e m! permutations of (1,2.,m-1) as its probe sequence. Note: both linear and quadratic probing have only m distinct probe sequence, as determined by the first probe.,Analysis for Open Address Hash,Assuming uniform hashing, the average number of probes in an unsuccessful search is at most 1/(1-) (=n/
10、m1),Analysis for Open Address Hash,Assuming uniform hashing, the average cost of probes in an successful search is at most (=n/m1),For your reference: Half full: 1.387; 90% full: 2.559,Hashing Function,A good hash function satisfies the assumption of simple uniform hashing. Heuristic hashing functio
11、ns The divesion method: h(k)=k mod m The multiplication method: h(k)=m(kA mod 1) (0A1) No single function can avoid the worst case (n), so, “Universal hashing” is proposed.Rich resource about hashing function:Gonnet and Baeza-Yates: Handbook of Algorithms and Data Structures, Addison-Wesley, 1991,Ar
12、ray Doubling,Cost for search in a hash table is (1+), then if we can keep constant, the cost will be (1) Space allocation techniques such as array doubling may be needed. The problem of “unusually expensive” individual operation.,Looking at the Memory Allocation,hashingInsert(HASHTABLE H, ITEM x)int
13、eger size=0, num=0;if size=0 then allocate a block of size 1; size=1;if num=size thenallocate a block of size 2size;move all item into new table;size=2size;insert x into the table;num=num+1; return,Elementary insertion: cost 1,Insertion with expansion: cost size,Worst-case Analysis of the Insertion,
14、For n execution of insertion operations A bad analysis: the worst case for one insertion is the case when expansion is required, up to n So, the worst case cost is in O(n2). Not the expansion is required during the ith operation only if i=2k, and the cost of the ith operation,Of course NOT !,Amortiz
15、ed Time Analysis,Amortized equation: amortized cost = actual cost + accounting cost Design goals for accounting cost In any legal sequence of operations, the sum of the accounting costs is nonnegative. The amortized cost of each operation is fairly regular, in spite of the wide fluctuate possible fo
16、r the actual cost of individual operations.,Accounting Scheme for Stack Push,Push operation with array doubling No resize triggered: 1 Resize(n2n) triggered: tn+1 (t is a constant) Accounting scheme (specifying accounting cost) No resize triggered: 2t Resize(n2n) triggered: -nt+2t So, the amortized cost of each individual push operation is 1+2t(1),Home Assignment,pp.302- 6.1 6.2 6.18 6.19,