一些位运算技巧（一）

http://graphics.stanford.edu/~seander/bithacks.htmlBit Twiddling Hacks By Sean Eron Anderson
seander@cs.stanford.edu Individually, the code snippets here are in the public domain (unless otherwise noted) — feel free to use them however you please. The aggregate collection and descriptions are © 1997-2005 Sean Eron Anderson. The code and descriptions are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY and without even the implied warranty of merchantability or fitness for a particular purpose. As of May 5, 2005, all the code has been tested thoroughly. Thousands of people have read it. Moreover, Professor Randal Bryant, the Dean of Computer Science at Carnegie Mellon University, has personally tested almost everything with his Uclid code verification system. What he hasn't tested, I have checked against all possible inputs. To the first person to inform me of a legitimate bug in the code, I'll pay a bounty of US$10 (by check or Paypal). Contents

About the operation counting methodology

Compute the sign of an integer

Compute the integer absolute value (abs) without branching

Compute the minimum (min) or maximum (max) of two integers without branching

Determining if an integer is a power of 2

Sign extending

Sign extending from a constant bit width

Sign extending from a variable bit-width

Sign extending from a variable bit-width in 3 operations

Conditionally set or clear bits without branching

Merge bits from two values according to a mask

Counting bits set

Counting bits set, naive way

Counting bits set by lookup table

Counting bits set, Brian Kernighan's way

Counting bits set in 12, 24, or 32-bit words using 64-bit instructions

Counting bits set, in parallel

Computing parity (1 if an odd number of bits set, 0 otherwise)

Compute parity of a word the naive way

Compute parity by lookup table

Compute parity of a byte using 64-bit multiply and modulus division

Compute parity in parallel

Swapping Values

Swapping values with XOR

Swapping individual bits with XOR

Reversing bit sequences

Reverse bits the obvious way

Reverse bits in word by lookup table

Reverse the bits in a byte with 3 operations (64-bit muliply and modulus division)

Reverse the bits in a byte with 4 operations (64-bit multiply, no division)

Reverse the bits in a byte with 7 operations (no 64-bit, only 32)

Reverse an N-bit quantity in parallel with 5 * lg(N) operations

Modulus division (aka computing remainders)

Computing modulus division by 1 << s without a division operation (obvious)

Computing modulus division by (1 << s) - 1 without a division operation

Computing modulus division by (1 << s) - 1 in parallel without a division operation

Finding integer log base 2 of an integer (aka the position of the highest bit set)

Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)

Find the integer log base 2 of an integer with an 64-bit IEEE float

Find the log base 2 of an integer with a lookup table

Find the log base 2 of an N-bit integer in O(lg(N)) operations

Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

Find integer log base 10 of an integer

Find integer log base 2 of a 32-bit IEEE float

Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r)

Counting consecutive trailing zero bits (or finding bit indices)

Count the consecutive zero bits (trailing) on the right in parallel

Count the consecutive zero bits (trailing) on the right by binary search

Count the consecutive zero bits (trailing) on the right by casting to a float

Count the consecutive zero bits (trailing) on the right with modulus division and lookup

Count the consecutive zero bits (trailing) on the right with multiply and lookup

Round up to the next highest power of 2 by float casting

Round up to the next highest power of 2

Interleaving bits (aka computing Morton Numbers)

Interleave bits the obvious way

Interleave bits by table lookup

Interleave bits with 64-bit multiply

Interleave bits by Binary Magic Numbers

Testing for ranges of bytes in a word (and counting occurances found)

Determine if a word has a zero byte

Determine if a word has byte less than n

Determine if a word has a byte greater than n

Determine if a word has a byte between m and n

About the operation counting methodology When totaling the number of operations for algorithms here, any C operator is counted as one operation. Intermediate assignments, which need not be written to RAM, are not counted. Of course, this operation counting approach only serves as an approximation of the actual number of machine instructions and CPU time. All operations are assumed to take the same amount of time, which is not true in reality, but CPUs have been heading increasingly in this direction over time. There are many nuances that determine how fast a system will run a given sample of code, such as cache sizes, memory bandwidths, instruction sets, etc. In the end, benchmarking is the best way to determine whether one method is really faster than another, so consider the techniques below as possibilities to test on your target architecture. Compute the sign of an integer int v;      // we want to find the sign of vint sign;   // the result goes here // CHAR_BIT is the number of bits per byte (normally 8).sign = -(v < 0);  // if v < 0 then -1, else 0. // or, to avoid branching on CPUs with flag registers (IA32):sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));// or, for one less instruction (but not portable):sign = v >> (sizeof(int) * CHAR_BIT - 1);
The last expression above evaluates to sign = v >> 31 for 32-bit integers. This is one operation faster than the obvious way, sign = -(v < 0). This trick works because when signed integers are shifted right, the value of the far left bit is copied to the other bits. The far left bit is 1 when the value is negative and 0 otherwise; all 1 bits gives -1. Unfortunately, this behavior is architecture-specific. Alternatively, if you prefer the result be either -1 or +1, then use: sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1));  // if v < 0 then -1, else +1
Alternatively, if you prefer the result be either -1, 0, or +1, then use: sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));// Or, for more speed but less portability:sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1));  // -1, 0, or +1// Or, for portability, brevity, and (perhaps) speed:sign = (v > 0) - (v < 0); // -1, 0, or +1
Caveat: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. For greater portability, Toby Speight suggested on September 28, 2005 that CHAR_BIT be used here and throughout rather than assuming bytes were 8 bits long. Angus recommended the more portable versions above, involving casting on March 4, 2006. Compute the integer absolute value (abs) without branching int v;      // we want to find the absolute value of vint r;      // the result goes here r = (v ^ (v >> (sizeof(int) * CHAR_BIT - 1))) -     (v >> (sizeof(int) * CHAR_BIT - 1));
Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, r = (v < 0) ? -v : v, even though the number of operations is the same. Caveats: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement). On March 14, 2004, Keith H. Duggar sent me the solution above; it is superior to the one I initially came up with, r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v, because a multiply is not used. Unfortunately, this method has been patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to Sun Microsystems. (For an unpatented non-multiplying alternative, consider (v ^ (v >> (sizeof(int) * CHAR_BIT - 1))) + ((unsigned) v >> (sizeof(int) * CHAR_BIT - 1)), suggested by Thorbjørn Willoch on June 21, 2004 or my variation, (v ^ (v >> (sizeof(int) * CHAR_BIT - 1))) + (v < 0), both of which take one more operation than the patented one, though it may be computed in parallel on modern CPUs.) Compute the minimum (min) or maximum (max) of two integers without branching int x;  // we want to find the minimum of x and yint y;   int r;  // the result goes here r = y + ((x - y) & -(x < y)); // min(x, y)
On machines where branching is expensive, the above expression can be faster than the obvious approach, r = (x < y) ? x : y, even though it involves two more instructions. It works because if x < y, then -(x < y) will be all ones, so r = y + (x - y) & ~0 = y + x - y = x. Otherwise, if x >= y, then -(x < y) will be all zeros, so r = y + (x - y) & 0 = y. On machines like the Pentium, evaluating (x < y) as 0 or 1 requires a branch instruction, so there may be no advantage. MIPS, ARM, and IA64, however, do not require branches. To find the maximum, use: r = x - ((x - y) & -(x < y)); // max(x, y)
Quick and dirty versions:If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which are faster because (x - y) only needs to be evaluated once. (Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable.) r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
On March 7, 2003, Angus Duggan pointed out the right-shift portability issue. On May 3, 2005, Randal E. Bryant alerted me to the need for the precondition, INT_MIN <= x - y <= INT_MAX, and suggested the non-quick and dirty version as a fix. Both of these issues concern only the quick and dirty version. Nigel Horspoon pointed out on July 6, 2005 that gcc produced the same code on a Pentium as the obvious solution because of how it evaluates (x < y). Determining if an integer is a power of 2 unsigned int v; // we want to see if v is a power of 2bool f;         // the result goes here f = (v & (v - 1)) == 0;
Note that 0 is incorrectly considered a power of 2 here. To remedy this, use: f = !(v & (v - 1)) && v;
Sign extending from a constant bit width Sign extension is automatic for built-in types, such as chars and ints. But suppose you have a signed two's complement number, x, that is stored using only b bits. Moreover, suppose you want to convert x to an int, which has more than b bits. A simple copy will work if x is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as 1101 in binary. If we have 8 bits, then -3 is 11111101. The most significant bit of the 4-bit representation is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit width is trivial, since bit fields may be specified in structs or unions. For example, to convert from 5 bits to an full integer: int x; // convert this from using 5 bits to a full intint r; // resulting sign extended number goes herestruct {signed int x:5;} s;r = s.x = x;
The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course). template <typename T, unsigned B>inline T signextend(const T x){  struct {T x:B;} s;  return s.x = x;}int r = signextend<signed int,5>(x);  // sign extend 5 bit number x to r
John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined. Sign extending from a variable bit-width Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, b, in which it is represented. (Or we could be programming in Java, which lacks bitfields.) unsigned b; // number of bits representing the number in xint x;      // sign extend this b-bit number to rint r;      // resulting sign-extended numberint const m = 1 << (b - 1); // mask can be pre-computed if b is fixedr = -(x & m) | x;
The code above requires five operations. Sean A. Irvine suggested that I add sign extension methods to this page on June 13, 2004, and he provided m = (1 << (b - 1)) - 1; r = -(x & ~m) | x; as a starting point from which I optimized. Sign extending from a variable bit-width in 3 operations The following may be slow on some machines, due to the effort required for multiplication and division. This version is 4 operations. If you know that your initial bit width, b, is greater than 1, you might do this type of sign extension in 3 operations by using r = (x * multipliers) / multipliers, which requires only one array lookup. unsigned b; // number of bits representing the number in xint x;      // sign extend this b-bit number to rint r;      // resulting sign-extended number#define M(B) (1 << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byteint const multipliers[] = {  0,     M(1),  M(2),  M(3),  M(4),  M(5),  M(6),  M(7),  M(8),  M(9),  M(10), M(11), M(12), M(13), M(14), M(15),  M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),  M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),  M(32)}; // (add more if using more than 64 bits)int const divisors[] = {  1,    ~M(1),  M(2),  M(3),  M(4),  M(5),  M(6),  M(7),  M(8),  M(9),  M(10), M(11), M(12), M(13), M(14), M(15),  M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),  M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),  M(32)}; // (add more for 64 bits)#undef Mr = (x * multipliers) / divisors;
The following variation is not portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast. const int s = -b; // OR:  sizeof(x) * CHAR_BIT - b;r = (x << s) >> s;
Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version (that used multipliers[] for divisors[]), where it failed on the case of x=1 and b=1. Conditionally set or clear bits without branching bool f;         // conditional flagunsigned int m; // the bit maskunsigned int w; // the word to modify:  if (f) w |= m; else w &= ~m; w ^= (-f ^ w) & m;
On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10% faster. Glenn Slayden informed me of this expression on December 11, 2003. Merge bits from two values according to a mask unsigned int a;    // value to merge in non-masked bitsunsigned int b;    // value to merge in masked bitsunsigned int mask; // 1 where bits from b should be selected; 0 where from a.unsigned int r;    // result of (a & ~mask) | (b & mask) goes herer = a ^ ((a ^ b) & mask);
This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage. Ron Jeffery sent this to me on February 9, 2006. Counting bits set (naive way) unsigned int v; // count the number of bits set in vunsigned int c; // c accumulates the total bits set in vfor (c = 0; v; v >>= 1){  c += v & 1;}
The naive approach requires one iteration per bit, until no more bits are set. So on a 32-bit word with only the high set, it will go through 32 iterations. Counting bits set by lookup table const unsigned char BitsSetTable256[] = {  0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,   1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,   2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,   3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,   4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};unsigned int v; // count the number of bits set in 32-bit value vunsigned int c; // c is the total bits set in v// Option 1:c = BitsSetTable256[v & 0xff] +     BitsSetTable256[(v >> 8) & 0xff] +     BitsSetTable256[(v >> 16) & 0xff] +     BitsSetTable256[v >> 24]; // Option 2:unsigned char * p = (unsigned char *) &v;c = BitsSetTable256[p[0]] +     BitsSetTable256[p[1]] +     BitsSetTable256[p[2]] +    BitsSetTable256[p[3]];// To initially generate the table algorithmically:BitsSetTable256[0] = 0;for (int i = 0; i < 256; i++){  BitsSetTable256 = (i & 1) + BitsSetTable256[i / 2];}
Counting bits set, Brian Kernighan's way unsigned int v; // count the number of bits set in vunsigned int c; // c accumulates the total bits set in vfor (c = 0; v; c++){  v &= v - 1; // clear the least significant bit set}
Brian Kernighan's method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop. Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322. (Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)" Counting bits set in 12, 24, or 32-bit words using 64-bit instructions unsigned int v; // count the number of bits set in vunsigned int c; // c accumulates the total bits set in v// option 1, for at most 12-bit values in v:c = (v * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f; // option 2, for at most 24-bit values in v:c =  ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL)      % 0x1f;// option 3, for at most 32-bit values in v:c =  ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %      0x1f;c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
This method requires a 64-bit CPU with fast modulus division to be efficient. The first option takes only 3 operations; the second option takes 10; and the third option takes 15. Rich Schroeppel originally created a 9-bit version, similiar to option 1; see the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. His method was the inspiration for the variants above, devised by Sean Anderson. Randal E. Bryant offered a couple bug fixes on May 3, 2005. Counting bits set, in parallel unsigned int v; // count bits set in this (32-bit value)unsigned int c; // store the total hereconst int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbersconst int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};c = v;c = v - ((v >> 1) & B[0]);c = ((c >> S[1]) & B[1]) + (c & B[1]);c = ((c >> S[2]) + c) & B[2];c = ((c >> S[3]) + c) & B[3];c = ((c >> S[4]) + c) & B[4];
The B array, expressed as binary, is: B[0] = 0x55555555 = 01010101 01010101 01010101 01010101B[1] = 0x33333333 = 00110011 00110011 00110011 00110011B[2] = 0x0F0F0F0F = 00001111 00001111 00001111 00001111B[3] = 0x00FF00FF = 00000000 11111111 00000000 11111111B[4] = 0x0000FFFF = 00000000 00000000 11111111 11111111
We can adjust the method for larger integer sizes by continuing with the patterns for the Binary Magic Numbers, B and S. If there are k bits, then we need the arrays S and B to be ceil(lg(k)) elements long, and we must compute the same number of expressions for c as S or B are long. For a 32-bit v, 16 operations are used. The best method for counting bits in a 32-bit integer v is the following: unsigned int const w = v - ((v >> 1) & 0x55555555);                    // tempunsigned int const x = (w & 0x33333333) + ((w >> 2) & 0x33333333);     // tempunsigned int const c = ((x + (x >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
The best bit counting method takes only 12 operations, which is the same as the lookup-table method, but avoids the memory and potential cache misses of a table. It is a hybrid between the purely parallel method above and the earlier methods using multiplies (in the section on counting bits with 64-bit instructions), though it doesn't use 64-bit instructions. The counts of bits set in the bytes is done in parallel, and the sum total of the bits set in the bytes is computed by multiplying by 0x1010101 and shifting right 24 bits. See Ian Ashdown's nice newsgroup post for more information on counting the number of bits set (also known as sideways addition). The best bit counting method was brought to my attention on October 5, 2005 by Andrew Shapira; he found it in pages 187-188 of Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors. Charlie Gordon suggested a way to shave off one operation from the purely parallel version on December 14, 2005, and Don Clugston trimmed three more from it on December 30, 2005. He also pointed out that if rotate operations were available, changing the shifts to rotates would eliminate the need for the AND masks because the overcounted sum total could instead be right-shifted 3 at the end. I made a typo with Don's suggestion that Eric Cole spotted on January 8, 2006. Computing parity the naive way unsigned int v;       // word value to compute the parity ofbool parity = false;  // parity will be the parity of bwhile (v){  parity = !parity;  v = v & (v - 1);}
The above code uses an approach like Brian Kernigan's bit counting, above. The time it takes is proportional to the number of bits set. Compute parity of a byte using 64-bit multiply and modulus division unsigned char b;  // byte value to compute the parity ofbool parity =   (((b * 0x0101010101010101ULL) & 0x8040201008040201ULL) % 0x1FF) & 1;
The method above takes around 4 operations, but only works on bytes. Compute parity in parallel unsigned int v;  // word value to compute the parity ofv ^= v >> 16;v ^= v >> 8;v ^= v >> 4;v &= 0xf;return (0x6996 >> v) & 1;
The method above takes around 9 operations, and works for 32-bit words. It may be optimized to work just on bytes in 5 operations by removing the two lines immediately following "unsigned int v;". The method first shifts and XORs the eight nibbles of the 32-bit value together, leaving the result in the lowest nibble of v. Next, the binary number 0110 1001 1001 0110 (0x6996 in hex) is shifted to the right by the value represented in the lowest nibble of v. This number is like a miniature 16-bit parity-table indexed by the low four bits in v. The result has the parity of v in bit 1, which is masked and returned. Thanks to Mathew Hendry for pointing out the shift-lookup idea at the end on Dec. 15, 2002. That optimization shaves two operations off using only shifting and XORing to find the parity. Compute parity by lookup table const bool ParityTable[] = {  0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,   0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0};unsigned char b;  // byte value to compute the parity ofbool parity = ParityTable;// OR, for 32-bit words:unsigned int v;   // word value to compute the parity ofbool parity = ParityTable[(v & 0x000000ff)] ^               ParityTable[(v & 0x0000ff00) >>  8] ^               ParityTable[(v & 0x00ff0000) >> 16] ^               ParityTable[(v & 0xff000000) >> 24];// Variation:unsigned char * p = (unsigned char *) &v;bool parity = ParityTable[p[0]] ^               ParityTable[p[1]] ^               ParityTable[p[2]] ^               ParityTable[p[3]];
Randal E. Byrant encouraged the addition of the (admittedly) obvious last variation with variable p on May 3, 2005. Bruce Rawles found a typo in an instance of the table variable's name on September 27, 2005, and he received a $10 bug bounty. Swapping values with XOR #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))
This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable. On January 20, 2005, Iain A. Fleming pointed out that the macro above doesn't work when you swap with the same memory location, such as SWAP(a, a[j]) with i == j. So if that may occur, consider defining the macro as (((a) == (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))). Swapping individual bits with XOR unsigned int i, j; // positons of bit sequences to swapunsigned int n;    // number of consecutive bits in each sequenceunsigned int b;    // bits to swap reside in bunsigned int r;    // bit-swapped result goes hereint x = ((b >> i) ^ (b >> j)) & ((1 << n) - 1); // XOR temporaryr = b ^ ((x << i) | (x << j));
As an example of swapping ranges of bits suppose we have have b = 00101111 (expressed in binary) and we want to swap the n = 3 consecutive bits starting at i = 1 (the second bit from the right) with the 3 consecutive bits starting at j = 5; the result would be r = 11100011 (binary). This method of swapping is similar to the general purpose XOR swap trick, but intended for operating on individual bits.  The variable x stores the result of XORing the pairs of bit values we want to swap, and then the bits are set to the result of themselves XORed with x.  Of course, the result is undefined if the sequences overlap. Reverse bits the obvious way unsigned int v;           // reverse the bits in thisunsigned int t = v << 1;  // t will have the reversed bits of vint i;v >>= 1;for (i = sizeof(v) * CHAR_BIT - 2; i; i--){  t |= v & 1;  t <<= 1;  v >>= 1;}t |= v;
On October 15, 2004, Michael Hoisie pointed out a bug in the original version. Randal E. Byrant suggested removing an extra operation on May 3, 2005. Behdad Esfabod suggested a slight change that eliminated one iteration of the loop on May 18, 2005. Reverse bits in word by lookup table const unsigned char BitReverseTable256[] = {  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,   0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,   0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,   0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,   0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,   0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,   0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,   0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,   0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,   0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF};unsigned int v; // reverse 32-bit value, 8 bits at timeunsigned int c; // c will get v reversed// Option 1:c = (BitReverseTable256[v & 0xff] << 24) |     (BitReverseTable256[(v >> 8) & 0xff] << 16) |     (BitReverseTable256[(v >> 16) & 0xff] << 8) |    (BitReverseTable256[(v >> 24) & 0xff]);// Option 2:unsigned char * p = (unsigned char *) &v;unsigned char * q = (unsigned char *) &c;q[3] = BitReverseTable256[p[0]]; q[2] = BitReverseTable256[p[1]]; q[1] = BitReverseTable256[p[2]]; q[0] = BitReverseTable256[p[3]];
The first method takes about 17 operations, and the second takes about 12, assuming your CPU can load and store bytes easily. Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division): unsigned char b; // reverse this (8-bit) byte b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
The multiply operation creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value. The AND operation selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits. The multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set. The last step, which involves modulus division by 2^10 - 1, has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like or operations. This method was attributed to Rich Schroeppel in the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. Reverse the bits in a byte with 4 operations (64-bit multiply, no division): unsigned char b; // reverse this byte b = ((b * 0x80200802ULL) & 0x0884422110ULL) * 0x0101010101ULL >> 32;
The following shows the flow of the bit values with the boolean variables a, b, c, d, e, f, g, and h, which comprise an 8-bit byte. Notice how the first multiply fans out the bit pattern to multiple copies, while the last multiply combines them in the fifth byte from the right.                                                                                         abcd efgh (-> hgfe dcba)*                                                      1000 0000  0010 0000  0000 1000  0000 0010 (0x80200802)-------------------------------------------------------------------------------------------------                                            0abc defg  h00a bcde  fgh0 0abc  defg h00a  bcde fgh0&                                           0000 1000  1000 0100  0100 0010  0010 0001  0001 0000 (0x0884422110)-------------------------------------------------------------------------------------------------                                            0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000*                                           0000 0001  0000 0001  0000 0001  0000 0001  0000 0001 (0x0101010101)-------------------------------------------------------------------------------------------------                                            0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000                                 0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000                      0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000           0000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 00000000 d000  h000 0c00  0g00 00b0  00f0 000a  000e 0000-------------------------------------------------------------------------------------------------0000 d000  h000 dc00  hg00 dcb0  hgf0 dcba  hgfe dcba  hgfe 0cba  0gfe 00ba  00fe 000a  000e 0000>> 32-------------------------------------------------------------------------------------------------                                            0000 d000  h000 dc00  hg00 dcb0  hgf0 dcba  hgfe dcba  &                                                                                       1111 1111-------------------------------------------------------------------------------------------------                                                                                        hgfe dcba
Note that the last two steps can be combined on some processors because the registers can be accessed as bytes; just multiply so that a register stores the upper 32 bits of the result and the take the low byte. Thus, it may take only 6 operations. Devised by Sean Anderson, July 13, 2001. Reverse the bits in a byte with 7 operations (no 64-bit): b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16
Devised by Sean Anderson, July 13, 2001. Typo spotted and correction supplied by Mike Keith, January 3, 2002. Reverse an N-bit quantity in parallel in 5 * lg(N) operations: unsigned int v; // 32 bit word to reverse bit order// swap odd and even bitsv = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);// swap consecutive pairsv = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);// swap nibbles ... v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);// swap bytesv = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);// swap 2-byte long pairsv = ( v >> 16             ) | ( v               << 16);
The following variation is also O(lg(N)), however it requires more operations to reverse v. Its virtue is in taking less slightly memory by computing the constants on the fly. unsigned int s = sizeof(v) * CHAR_BIT; // bit size; must be power of 2 unsigned int mask = ~0;         while ((s >>= 1) > 0) {  mask ^= (mask << s);  v = ((v >> s) & mask) | ((v << s) & ~mask);}
These methods above are best suited to situations where N is large. See Dr. Dobb's Journal 1983, Edwin Freed's article on Binary Magic Numbers for more information. The second variation was suggested by Ken Raeburn on September 13, 2005. Veldmeijer mentioned that the the first version could do without ANDS in the last line on March 19, 2006. Compute modulus division by 1 << s without a division operator const unsigned int n;          // numeratorconst unsigned int s;const unsigned int d = 1 << s; // So d will be one of: 1, 2, 4, 8, 16, 32, ...unsigned int m;                // m will be n % dm = n & (d - 1);
Most programmers learn this trick early, but it was included for the sake of completeness. Compute modulus division by (1 << s) - 1 without a division operator unsigned int n;                      // numeratorconst unsigned int s;                // s > 0const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).unsigned int m;                      // n % d goes here.for (m = n; n > d; n = m){  for (m = 0; n; n >>= s)  {    m += n & d;  }}// Now m is a value from 0 to d, but since with modulus division// we want m to be 0 when it is d.m = m == d ? 0 : m;
This method of modulus division by an integer that is one less than a power of 2 takes at most 5 + (4 + 5 * ceil(N / s)) * ceil(lg(N / s)) operations, where N is the number of bits in the numerator. In other words, it takes at most O(N * lg(N)) time. Devised by Sean Anderson, August 15, 2001. Before Sean A. Irvine corrected me on June 17, 2004, I mistakenly commented that we could alternatively assign m = ((m + 1) & d) - 1; at the end. Michael Miller spotted a typo in the code April 25, 2005.

posted on 2010-01-12 23:01 一个人的天空@ 阅读(733) 评论(0) 收藏举报

刷新页面返回顶部

一些位运算技巧（一）

公告