<< Chapter < Page Chapter >> Page >

key := (key + (key_3<<7))

key := key + (j OR key_4>>2) * (key_4) * (j + key_1) XOR j

key := key AND _prime_ // _prime_ is a prime number

j := (j+1)

while collision

return key

Rebuilding the table requires allocating a larger array and recursively using the set operation to insert all the elements of the old array into the new larger array. It is common to increase the array size exponentially, for example by doubling the old array size.

function remove(key)

i := find_slot(key)

if slot[i] is unoccupied

return // key is not in the table

j := i

loop

j := (j+1) modulo num_slots

if slot[j] is unoccupied

exit loop

k := hash(slot[j].key) modulo num_slots

if (j>i and (k<= i or k>j)) or

(j<i and (k<= i and k>j)) (note 2)

slot[i] := slot[j]

i := j

mark slot[i] as unoccupied

For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). At this point in the pseudocode, i is a vacant slot that might be invalidating this property for subsequent records in the cluster. j is such a subsequent record. k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant.

Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records. The methods above provide O(1) updating and removal of existing records, with occasional rebuilding if the high water mark of the table size grows.

The O(1) remove method above is only possible in linearly probed hash tables with single-slot stepping. In the case where many records are to be deleted in one operation, marking the slots for deletion and later rebuilding may be more efficient.

Open addressing versus chaining

Chained hash tables have the following benefits over open addressing:

  • They are simple to implement effectively and only require basic data structures.
  • From the point of view of writing suitable hash functions, chained hash tables are insensitive to clustering, only requiring minimization of collisions. Open addressing depends upon better hash functions to avoid clustering. This is particularly important if novice programmers can add their own hash functions, but even experienced programmers can be caught out by unexpected clustering effects.
  • They degrade in performance more gracefully. Although chains grow longer as the table fills, a chained hash table cannot "fill up" and does not exhibit the sudden increases in lookup times that occur in a near-full table with open addressing. (see right)
  • If the hash table stores large records, about 5 or more words per record, chaining uses less memory than open addressing.
  • If the hash table is sparse (that is, it has a big array with many free array slots), chaining uses less memory than open addressing even for small records of 2 to 4 words per record due to its external storage.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Data structures and algorithms. OpenStax CNX. Jul 29, 2009 Download for free at http://cnx.org/content/col10765/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Data structures and algorithms' conversation and receive update notifications?

Ask