[转]Wiki: MurmurHash

原文：http://en.wikipedia.org/wiki/MurmurHash

From Wikipedia, the free encyclopedia

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.^[1]^[2]^[3] It was created by Austin Appleby in 2008,^[4]^[5] and exists in a number of variants,^[6] all of which have been released into the public domain. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.^[7]

[hide]

Variants[edit]

The current version is MurmurHash3,^[8]^[9] which yields a 32-bit or 128-bit hash value.

The older MurmurHash2^[10] yields a 32-bit or 64-bit value. Slower versions of MurmurHash2 are available for big-endian and aligned-only machines. The MurmurHash2A variant adds the Merkle–Damgård construction so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash2-160 generates the 160-bit hash, and MurmurHash1 is obsolete.

Implementations[edit]

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,^[11] C,^[12] C#,^[9]^[13] Perl,^[14] Ruby,^[15] PHP,^[16] Haskell,^[17] Scala,^[18] Java,^[19]^[20] Erlang,^[21] andJavaScript.^[22]^[23]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), Perl,^[24] nginx (ver 1.0.1),^[25] Rubinius,^[26] libmemcached (the C driver for Memcached),^[27] maatkit,^[28] Hadoop,^[1] Kyoto Cabinet,^[29] RaptorDB,^[30]OlegDB,^[31] Cassandra^[32] and Clojure ^[33]

Algorithm[edit]

Murmur3_32(key, len, seed)
    // Note: In this version, all integer arithmetic is performed with unsigned 32 bit integers.
    //       In the case of overflow, the result is constrained by the application of modulo

2^{32}

 arithmetic.
    
    c1

\gets

 0xcc9e2d51
    c2

\gets

 0x1b873593
    r1

\gets

 15
    r2

\gets

 13
    m

\gets

 5
    n

\gets

 0xe6546b64
 
    hash

\gets

 seed

    for each fourByteChunk of key
        k

\gets

 fourByteChunk

        k

\gets

 k * c1
        k

\gets

 (k << r1) OR (k >> (32-r1))
        k

\gets

 k * c2

        hash

\gets

 hash XOR k
        hash

\gets

 (hash << r2) OR (hash >> (32-r2))
        hash

\gets

 hash * m + n

    with any remainingBytesInKey
        remainingBytes

\gets

 SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.
        
        remainingBytes

\gets

 remainingBytes * c1
        remainingBytes

\gets

 (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes

\gets

 remainingBytes * c2

        hash

\gets

 hash XOR remainingBytes
 
    hash

\gets

 hash XOR len

    hash

\gets

 hash XOR (hash >> 16)
    hash

\gets

 hash * 0x85ebca6b
    hash

\gets

 hash XOR (hash >> 13)
    hash

\gets

 hash * 0xc2b2ae35
    hash

\gets

 hash XOR (hash >> 16)

A sample C implementation follows:

uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
	static const uint32_t c1 = 0xcc9e2d51;
	static const uint32_t c2 = 0x1b873593;
	static const uint32_t r1 = 15;
	static const uint32_t r2 = 13;
	static const uint32_t m = 5;
	static const uint32_t n = 0xe6546b64;
 
	uint32_t hash = seed;
 
	const int nblocks = len / 4;
	const uint32_t *blocks = (const uint32_t *) key;
	int i;
	for (i = 0; i < nblocks; i++) {
		uint32_t k = blocks[i];
		k *= c1;
		k = (k << r1) | (k >> (32 - r1));
		k *= c2;
 
		hash ^= k;
		hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
	}
 
	const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
	uint32_t k1 = 0;
 
	switch (len & 3) {
	case 3:
		k1 ^= tail[2] << 16;
	case 2:
		k1 ^= tail[1] << 8;
	case 1:
		k1 ^= tail[0];
 
		k1 *= c1;
		k1 = (k1 << r1) | (k1 >> (32 - r1));
		k1 *= c2;
		hash ^= k1;
	}
 
	hash ^= len;
	hash ^= (hash >> 16);
	hash *= 0x85ebca6b;
	hash ^= (hash >> 13);
	hash *= 0xc2b2ae35;
	hash ^= (hash >> 16);
 
	return hash;
}

posted @ 2014-05-13 22:08 Scan. 阅读(534) 评论(0) 收藏举报

刷新页面返回顶部

Scan

[转]Wiki: MurmurHash

Contents

Variants[edit]

Implementations[edit]

Algorithm[edit]

公告