From Rotor(Shared Source CLI) :
/*
77:
Implementation Notes:78:
Dictionary was copied from Hashtable's source - any bug fixes here79:
probably need to be made to Dictionary as well.80:
81:
This Hashtable uses double hashing. There are hashsize buckets in the82:
table, and each bucket can contain 0 or 1 element. We a bit to mark83:
whether there's been a collision when we inserted multiple elements84:
(ie, an inserted item was hashed at least a second time and we probed85:
this bucket, but it was already in use). Using the collision bit, we86:
can terminate lookups & removes for elements that aren't in the hash87:
table more quickly. We steal the most significant bit from the hash code88:
to store the collision bit.89:
90:
Our hash function is of the following form:91:
92:
h(key, n) = h1(key) + n*h2(key)93:
94:
where n is the number of times we've hit a collided bucket and rehashed95:
(on this particular lookup). Here are our hash functions:96:
97: h1(key) = GetHash(key); // default implementation calls key.GetHashCode();
98: h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1));99:
100:
The h1 can return any number. h2 must return a number between 1 and101:
hashsize - 1 that is relatively prime to hashsize (not a problem if102:
hashsize is prime). (Knuth's Art of Computer Programming, Vol. 3, p. 528-9)103:
If this is true, then we are guaranteed to visit every bucket in exactly104:
hashsize probes, since the least common multiple of hashsize and h2(key)105:
will be hashsize * h2(key). (This is the first number where adding h2 to106:
h1 mod hashsize will be 0 and we will search the same bucket twice).107:
108:
We previously used a different h2(key, n) that was not constant. That is a109:
horrifically bad idea, unless you can prove that series will never produce110:
any identical numbers that overlap when you mod them by hashsize, for all111:
subranges from i to i+hashsize, for all i. It's not worth investigating,112:
since there was no clear benefit from using that hash function, and it was113:
broken.114:
115:
For efficiency reasons, we've implemented this by storing h1 and h2 in a116:
temporary, and setting a variable called seed equal to h1. We do a probe,117:
and if we collided, we simply add h2 to seed each time through the loop.118:
119:
A good test for h2() is to subclass Hashtable, provide your own implementation120:
of GetHash() that returns a constant, then add many items to the hash table.121:
Make sure Count equals the number of items you inserted.122:
123:
Note that when we remove an item from the hash table, we set the key124:
equal to buckets, if there was a collision in this bucket. Otherwise125:
we'd either wipe out the collision bit, or we'd still have an item in126:
the hash table.127:
*/ The Insert Method of HashTable:718:
// Inserts an entry into this hashtable. This method is called from the Set719:
// and Add methods. If the add parameter is true and the given key already720:
// exists in the hashtable, an exception is thrown.721:
private void Insert (
Object key,
Object nvalue,
bool add) {
722:
if (key ==
null) {
723: throw
new ArgumentNullException("key",
Environment.GetResourceString("ArgumentNull_Key"));
724: }
725:
if (count >= loadsize)
726: expand();
727: uint seed;
728: uint incr;
729:
// Assume we only have one thread writing concurrently. Modify730:
// buckets to contain new data, as long as we insert in the right order.731: uint hashcode = InitHash(key, buckets.Length, out seed, out incr);
732:
int ntry = 0;
733:
int emptySlotNumber = -1; // We use the empty slot number to cache the first empty slot. We chose to reuse slots
734:
// create by remove that have the collision bit set over using up new slots.735:
736: do {
737:
int bucketNumber = (
int) (seed % (uint)buckets.Length);
738:
739:
if (emptySlotNumber == -1 && (buckets[bucketNumber].key == buckets) && (buckets[bucketNumber].hash_coll < 0))//(((buckets[bucketNumber].hash_coll &
unchecked(0x80000000))!=0)))
740: emptySlotNumber = bucketNumber;
741:
742:
//We need to check if the collision bit is set because we have the possibility where the first743:
//item in the hash-chain has been deleted.744:
if ((buckets[bucketNumber].key ==
null) ||
745: (buckets[bucketNumber].key == buckets && ((buckets[bucketNumber].hash_coll &
unchecked(0x80000000))==0))) {
746:
if (emptySlotNumber != -1) // Reuse slot
747: bucketNumber = emptySlotNumber;
748:
749:
// We pretty much have to insert in this order. Don't set hash750:
// code until the value & key are set appropriately.751: buckets[bucketNumber].val = nvalue;
752: buckets[bucketNumber].key = key;
753: buckets[bucketNumber].hash_coll |= (
int) hashcode;
754: count++;
755: version++;
756:
return;
757: }
758:
if (((buckets[bucketNumber].hash_coll & 0x7FFFFFFF) == hashcode) &&
759: KeyEquals (key, buckets[bucketNumber].key)) {
760:
if (add) {
761: throw
new ArgumentException(
Environment.GetResourceString("Argument_AddingDuplicate__", buckets[bucketNumber].key, key));
762: }
763: buckets[bucketNumber].val = nvalue;
764: version++;
765:
return;
766: }
767:
if (emptySlotNumber == -1) // We don't need to
set the collision bit here since we already have an empty slot
768: buckets[bucketNumber].hash_coll |=
unchecked((
int)0x80000000);
769: seed += incr;
770: }
while (++ntry < buckets.Length);
771:
772:
if (emptySlotNumber != -1)
773: {
774:
// We pretty much have to insert in this order. Don't set hash775:
// code until the value & key are set appropriately.776: buckets[emptySlotNumber].val = nvalue;
777: buckets[emptySlotNumber].key = key;
778: buckets[emptySlotNumber].hash_coll |= (
int) hashcode;
779: count++;
780: version++;
781:
return;
782:
783: }
784:
785:
// If you see this assert, make sure load factor & count are reasonable.786:
// Then verify that our double hash function (h2, described at top of file)787:
// meets the requirements described above. You should never see this assert.788: BCLDebug.Assert(
false, "hash table insert failed! Load factor too high, or our double hashing function is incorrect.");
789: throw
new InvalidOperationException(
Environment.GetResourceString("InvalidOperation_HashInsertFailed"));
790: }
791:
Double Hashing in <<Introduction to Algorithmics>>:
Double hashing is one of the best methods available for open addressing because the permutations produced have many of the characteristics of randomly chosen permutations. Double hashing uses a hash function of the form
h(k, i) = (h1(k) + ih2(k)) mod m,
where h1 and h2 are auxiliary hash functions. The initial probe is to position T[h1(k)]; successive probe positions are offset from previous positions by the amount h2(k), modulo m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. Figure 11.5 gives an example of insertion by double hashing.
Figure 11.5: Insertion by double hashing. Here we have a hash table of size 13 with h1(k) = k mod 13 and h2(k) = 1 + (k mod 11). Since 14 ≡ 1 (mod 13) and 14 ≡ 3 (mod 11), the key 14 is inserted into empty slot 9, after slots 1 and 5 are examined and found to be occupied.
The value h2(k) must be relatively prime to the hash-table size m for the entire hash table to be searched. (See Exercise 11.4-3.) A convenient way to ensure this condition is to let m be a power of 2 and to design h2 so that it always produces an odd number. Another way is to let m be prime and to design h2 so that it always returns a positive integer less than m. For example, we could choose m prime and let
where m' is chosen to be slightly less than m (say, m - 1). For example, if k = 123456, m = 701, and m' = 700, we have h1(k) = 80 and h2(k) = 257, so the first probe is to position 80, and then every 257th slot (modulo m) is examined until the key is found or every slot is examined.
Double hashing improves over linear or quadratic probing in that Θ(m2) probe sequences are used, rather than Θ(m), since each possible (h1(k), h2(k)) pair yields a distinct probe sequence. As a result, the performance of double hashing appears to be very close to the performance of the "ideal" scheme of uniform hashing.