一些位运算技巧（二）

Find the log base 2 of an integer with a lookup table static const char LogTable256[] = {  0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,  4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,  5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7};unsigned int v; // 32-bit word to find the log ofunsigned r = 0; // r will be lg(v)register unsigned int t, tt; // temporariesif (tt = v >> 16){  r = (t = v >> 24) ? 24 + LogTable256[t] : 16 + LogTable256[tt & 0xFF];}else {  r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];}
The lookup table method takes only about 7 operations to find the log of a 32-bit value. If extended for 64-bit quantities, it would take roughly 9 operations. Another operation can be trimmed off by using four tables, with the possible additions incorporated into each. Using int table elements may be faster, depending on your architecture.
// To initially generate the log table algorithmically:LogTable256[0] = LogTable256[1] = 0;for (int i = 2; i < 256; i++) {  LogTable256 = 1 + LogTable256[i / 2];}
Behdad Esfahbod and I shaved off a fraction of an operation (on average) on May 18, 2005.
Find the log base 2 of an N-bit integer in O(lg(N)) operations unsigned int v;  // 32-bit value to find the log2 of const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};const unsigned int S[] = {1, 2, 4, 8, 16};int i;register unsigned int r = 0; // result of log2(v) will go herefor (i = 4; i >= 0; i--) // unroll for speed...{  if (v & b)  {    v >>= S;    r |= S;  } }// OR (IF YOUR CPU BRANCHES SLOWLY):register unsigned int shift;shift = ( ( v & 0xFFFF0000 ) != 0 ) << 4; v >>= shift; r |= shift;shift = ( ( v & 0xFF00     ) != 0 ) << 3; v >>= shift; r |= shift;shift = ( ( v & 0xF0       ) != 0 ) << 2; v >>= shift; r |= shift;shift = ( ( v & 0xC        ) != 0 ) << 1; v >>= shift; r |= shift;shift = ( ( v & 0x2        ) != 0 ) << 0; v >>= shift; r |= shift;// OR (IF YOU KNOW v IS A POWER OF 2):const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0, 0xFF00FF00,   0xFFFF0000};register unsigned int r = (v & b[0]) != 0;for (i = 4; i > 0; i--) // unroll for speed...{  r |= ((v & b) != 0) << i;}
Of course, to extend the code to find the log of a 33- to 64-bit number, we would append another element, 0xFFFFFFFF00000000, to b, append 32 to S, and loop from 5 to 0. This method is much slower than the earlier table-lookup version, but if you don't want big table or your architecture is slow to access memory, it's a good choice. The second variation involves more operations, but it may be faster on machines with high branch costs (e.g. PowerPC), and it was sent to me by Eric Cole on January 7, 2006. The third variation was suggested to me by John Owens on April 24, 2002; it's faster, but it is only suitable when the input is known to be a power of 2. On May 25, 2003, Ken Raeburn suggested improving the general case by using smaller numbers for b[], which load faster on some architectures (for instance if the word size is 16 bits, then only one load instruction may be needed). These values work for the general version, but not for the special-case version below it, where v is a power of 2; Glenn Slayden brought this oversight to my attention on December 12, 2003.
Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup unsigned int v; // find the log base 2 of 32-bit vint r;          // result goes herestatic const int MultiplyDeBruijnBitPosition[32] = {  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,   31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};v |= v >> 1; // first round down to power of 2 v |= v >> 2;v |= v >> 4;v |= v >> 8;v |= v >> 16;v = (v >> 1) + 1;r = MultiplyDeBruijnBitPosition[(v * 0x077CB531UL) >> 27];
The code above computes the log base 2 of a 32-bit integer with a small table lookup and multiply. It requires only 15 operations, compared to (up to) 20 for the previous method. The purely table-based method requires the fewest operations, but this offers a reasonable compromise between table size and speed. If v is known to be a power of 2, then only the last line is needed (3 operations).
Eric Cole devised this January 8, 2006 after reading about the entry below to round up to a power of 2 and the method below for computing the number of trailing bits with a multiply and lookup using a DeBruijn sequence. Find integer log base 10 of an integer unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of int r;          // result goes hereint t;          // temporarystatic unsigned int const PowersOf10[] =     {1, 10, 100, 1000, 10000, 100000,     1000000, 10000000, 100000000, 1000000000};t = (IntegerLogBase2(v) + 1) * 1233 >> 12; // (use a lg2 method from above)r = t - (v < PowersOf10[t]);
The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship log10(v) = log2(v) / log2(10), we need to multiply it by 1/log2(10), which is approximately 1233/4096, or 1233 followed by a right shift of 12. Adding one is needed because the IntegerLogBase2 rounds down. Finally, since the value t is only an approximation that may be off by one, the exact value is found by subtracting the result of v < PowersOf10[t].
This method takes 6 more operations than IntegerLogBase2. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for t (that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of v). Eric Cole suggested I add a version of this on January 7, 2006. Find integer log base 2 of a 32-bit IEEE float const float v; // find int(log2(v)), where v > 0.0 && finite(v) && isnormal(v)int c;         // 32-bit int c gets the result;c = *(const int *) &v;  // OR, for portability:  memcpy(&c, &v, sizeof c);c = (c >> 23) - 127;
The above is fast, but IEEE 754-compliant architectures utilize subnormal (also called denormal) floating point numbers. These have the exponent bits set to zero (signifying pow(2,-127)), and the mantissa is not normalized, so it contains leading zeros and thus the log2 must be computed from the mantissa. To accomodate for subnormal numbers, use the following:
const float v;              // find int(log2(v)), where v > 0.0 && finite(v)int c;                      // 32-bit int c gets the result;int x = *(const int *) &v;  // OR, for portability:  memcpy(&x, &v, sizeof x);c = x >> 23;          if (c){  c -= 127;}else{ // subnormal, so recompute using mantissa: c = intlog2(x) - 149;  register unsigned int t; // temporary  // Note that LogTable256 was defined earlier  if (t = x >> 16)  {    c = LogTable256[t] - 133;  }  else  {    c = (t = x >> 8) ? LogTable256[t] - 141 : LogTable256[x] - 149;  }}
On June 20, 2004, Sean A. Irvine suggested that I include code to handle subnormal numbers. On June 11, 2005, Falk Hüffner pointed out that ISO C99 6.5/7 specified undefined behavior for the common type punning idiom *(int *)&, though it has worked on 99.9% of C compilers. He proposed using memcpy for maximum portability or a union with a float and an int for better code generation than memcpy on some compilers.
Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r) const int r;const float v; // find int(log2(pow((double) v, 1. / pow(2, r)))),                // where isnormal(v) and v > 0int c;         // 32-bit int c gets the result;c = *(const int *) &v;  // OR, for portability:  memcpy(&c, &v, sizeof c);c = ((((c - 0x3f800000) >> r) + 0x3f800000) >> 23) - 127;
So, if r is 0, for example, we have c = int(log2((double) v)). If r is 1, then we have c = int(log2(sqrt((double) v))). If r is 2, then we have c = int(log2(pow((double) v, 1./4))).
On June 11, 2005, Falk Hüffner pointed out that ISO C99 6.5/7 left the type punning idiom *(int *)& undefined, and he suggested using memcpy. Count the consecutive zero bits (trailing) on the right in parallel unsigned int v;      // 32-bit word input to count zero bits on rightunsigned int c = 32; // c will be the number of zero bits on the right,                     // so if v is 1101000 (base 2), then c will be 3  const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};const unsigned int S[] = {1, 2, 4, 8, 16}; // Our Magic Binary Numbersfor (int i = 4; i >= 0; --i) // unroll for more speed{  if (v & B)  {    v <<= S;    c -= S;  }}if (v){  c--;}
Here, we are basically doing the same operations as finding the log base 2 in parallel, but the values of b are inverted (in order to count from the right rather than the left), we shift v up rather than down, and c starts at the maximum and is decreased. We also have the additional step at the end, decrementing c if there is anything left in v. The number of operations is at most 4 * lg(N) + 2, roughly, for N bit words.
Count the consecutive zero bits (trailing) on the right by binary search unsigned int c = 0; // c will be the number of zero bits on the right,                    // so if v is 1101000 (base 2), then c will be 3if (v & 0x1) {  c = 0;}else{  if ((v & 0xffff) == 0)   {      v >>= 16;      c += 16;  }  if ((v & 0xff) == 0)   {      v >>= 8;      c += 8;  }  if ((v & 0xf) == 0)   {      v >>= 4;    c += 4;  }  if ((v & 0x3) == 0)   {      v >>= 2;    c += 2;  }  if ((v & 0x1) == 0)  {    c++;  }}
The code above is similar to the previous method, but it computes the number of trailing zeros by accumulating c in a manner akin to binary search. In the first step, it checks if the bottom 16 bits of v are zeros, and if so, shifts v right 16 bits and adds 16 to c, which reduces the number of bits in v to consider by half. Each of the subsequent conditional steps likewise halves the number of bits until there is only 1. This method is faster than the last one (by about 33%) because the bodies of the if statements are executed less often.
Matt Whitlock suggested this on January 25, 2006. Count the consecutive zero bits (trailing) on the right by casting to a float unsigned int v;            // find the number of trailing zeros in vint r;                     // the result goes herefloat f = (float)(v & -v); // cast the least significant bit in v to a floatr = (*(unsigned int *)&f >> 23) - 0x7f;
Although this only takes about 6 operations, the time to convert an integer to a float can be high on some machines. The exponent of the 32-bit IEEE floating point representation is shifted down, and the bias is subtracted to give the position of the least significant 1 bit set in v. If v is zero, then the result is -127.
Count the consecutive zero bits (trailing) on the right with modulus division and lookup unsigned int v;  // find the number of trailing zeros in vint r;           // put the result in rconst int Mod37BitPosition[] = // maps a bit value mod 37 to its position{  32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,  7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5,  20, 8, 19, 18};r = Mod37BitPosition[(-v & v) % 37];
The code above finds the number of zeros that are trailing on the right, so binary 0100 would produce 2. It makes use of the fact that the first 32 bit position values are relatively prime with 37, so performing a modulus division with 37 gives a unique number from 0 to 36 for each. These numbers may then be mapped to the number of zeros using a small lookup table. It uses only 4 operations, however indexing into a table and performing modulus division may make it unsuitable for some situations. I came up with this independently and then searched for a subsequence of the table values, and found it was invented earlier by Reiser, according to Hacker's Delight.
Count the consecutive zero bits (trailing) on the right with multiply and lookup unsigned int v;  // find the number of trailing zeros in 32-bit v int r;           // result goes hereconst int MultiplyDeBruijnBitPosition[32] = {  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,   31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9};r = MultiplyDeBruijnBitPosition[((v & -v) * 0x077CB531UL) >> 27];
Converting bit vectors to indices of set bits is an example use for this. It requires one more operation than the earlier one involving modulus division, but the multiply may be faster. The expression (v & -v) extracts the least significant 1 bit from v. The constant 0x077CB531UL is a de Bruijn sequence, which produces a unique pattern of bits into the high 5 bits for each possible bit position that it is multiplied against. When there are no bits set, it returns 0. More information can be found by reading the paper Using de Bruijn Sequences to Index 1 in a Computer Word by Charles E. Leiserson, Harald Prokof, and Keith H. Randall.
On October 8, 2005 Andrew Shapira suggested I add this. Round up to the next highest power of 2 by float casting unsigned int const v; // Round this 32-bit value to the next highest power of 2unsigned int r;       // Put the result here. (So v=3 -> r=4; v=8 -> r=8)if (v > 1) {  float f = (float)v;  unsigned int const t = 1 << ((*(unsigned int *)&f >> 23) - 0x7f);  r = t << (t < v);}else {  r = 1;}
The code above uses 8 operations, but works on all v <= (1<<31).
Quick and dirty version, for domain of 1 < v < (1<<25): float f = (float)(v - 1);  r = 1 << ((*(unsigned int*)(&f) >> 23) - 126);
Although the quick and dirty version only uses around 6 operations, it is roughly three times slower than the technique below (which involves 12 operations) when benchmarked on an Athlon™ XP 2100+ CPU. Some CPUs will fare better with it, though.
On September 27, 2005 Andi Smithers suggested I include a technique for casting to floats to find the lg of a number for rounding up to a power of 2. Similar to the quick and dirty version here, his version worked with values less than (1<<25), due to mantissa rounding, but it used one more operation. Round up to the next highest power of 2 unsigned int v; // compute the next highest power of 2 of 32-bit vv--;v |= v >> 1;v |= v >> 2;v |= v >> 4;v |= v >> 8;v |= v >> 16;v++;
In 12 operations, this code computes the next highest power of 2 for a 32-bit integer. The result may be expressed by the formula 1 << (lg(v - 1) + 1). Note that in the edge case where v is 0, it returns 0, which isn't a power of 2; you might append the expression v += (v == 0) to remedy this if it matters. It would be faster by 2 operations to use the formula and the log base 2 methed that uses a lookup table, but in some situations, lookup tables are not suitable, so the above code may be best. (On a Athlon™ XP 2100+ I've found the above shift-left and then OR code is as fast as using a single BSR assembly language instruction, which scans in reverse to find the highest set bit.) It works by copying the highest set bit to all of the lower bits, and then adding one, which results in carries that set all of the lower bits to 0 and one bit beyond the highest set bit to 1. If the original number was a power of 2, then the decrement will reduce it to one less, so that we round up to the same original value.
Devised by Sean Anderson, Sepember 14, 2001. Interleave bits the obvious way unsigned short x;   // Interleave bits of x and y, so that all of theunsigned short y;   // bits of x are in the even positions and y in the odd;unsigned int z = 0; // z gets the resulting Morton Number.for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...{  z |= (x & 1 << i) << i | (y & 1 << i) << (i + 1);}
Interleaved bits (aka Morton numbers) are useful for linearizing 2D integer coordinates, so x and y are combined into a single number that can be compared easily and has the property that a number is usually close to another if their x and y values are close.
Interleave bits by table lookup const unsigned short MortonTable256[] = {  0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,   0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,   0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,   0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,   0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,   0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,   0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,   0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,   0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,   0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,   0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,   0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,   0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,   0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,   0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,   0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,   0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,   0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,   0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,   0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,   0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,   0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,   0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,   0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,   0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,   0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,   0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,   0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,   0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,   0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,   0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,   0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555};unsigned short x; // Interleave bits of x and y, so that all of theunsigned short y; // bits of x are in the even positions and y in the odd;unsigned int z;   // z gets the resulting 32-bit Morton Number.z = MortonTable256[y >> 8]   << 17 |     MortonTable256[x >> 8]   << 16 |    MortonTable256[y & 0xFF] <<  1 |     MortonTable256[x & 0xFF];
For more speed, use an additional table with values that are MortonTable256 pre-shifted one bit to the left. This second table could then be used for the y lookups, thus reducing the operations by two, but almost doubling the memory required. Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.
Interleave bits with 64-bit multiply In 11 operations, this version interleaves bits of two bytes (rather than shorts, as in the other versions), but many of the operations are 64-bit multiplies so it isn't appropriate for all machines.
unsigned char x;  // Interleave bits of (8-bit) x and y, so that all of theunsigned char y;  // bits of x are in the even positions and y in the odd;unsigned short z; // z gets the resulting 16-bit Morton Number.z = ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) *      0x0102040810204081ULL >> 49) & 0x5555 |    ((y * 0x0101010101010101ULL & 0x8040201008040201ULL) *      0x0102040810204081ULL >> 48) & 0xAAAA;
Holger Bettag was inspired to suggest this technique on October 10, 2004 after reading the multiply-based bit reversals here.
Interleave bits by Binary Magic Numbers const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};const unsigned int S[] = {1, 2, 4, 8};unsigned int x; // Interleave lower 16 bits of x and y, so the bits of xunsigned int y; // are in the even positions and bits from y in the odd;unsigned int z; // z gets the resulting 32-bit Morton Number.x = (x | (x << S[3])) & B[3];x = (x | (x << S[2])) & B[2];x = (x | (x << S[1])) & B[1];x = (x | (x << S[0])) & B[0];y = (y | (y << S[3])) & B[3];y = (y | (y << S[2])) & B[2];y = (y | (y << S[1])) & B[1];y = (y | (y << S[0])) & B[0];z = x | (y << 1);
Determine if a word has a zero byte // Fewer operations:unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
The code above may be useful when doing a fast string copy in which a word is copied at a time; it uses 5 operations. On the other hand, testing for a null byte in the obvious ways (which follow) have at least 7 operations (when counted in the most sparing way), and at most 12.
// More operations:bool hasNoZeroByte = ((v & 0xff) && (v & 0xff00) && (v & 0xff0000) && (v & 0xff000000))// OR:unsigned char * p = (unsigned char *) &v;  bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);
The code at the beginning of this section (labeled "Fewer operations") works by first zeroing the high bits of the 4 bytes in the word. Subsequently, it adds a number that will result in an overflow to the high bit of a byte if any of the low bits were initialy set. Next the high bits of the original word are ORed with these values; thus, the high bit of a byte is set iff any bit in the byte was set. Finally, we determine if any of these high bits are zero by ORing with ones everywhere except the high bits and inverting the result. Extending to 64 bits is trivial; simply increase the constants to be 0x7F7F7F7F7F7F7F7F.
For an additional improvement, a fast pretest that requires only 4 operations may be performed to determine if the word may have a zero byte. The test also returns true if the high byte is 0x80, so there are occasional false positives, but the slower and more reliable version above may then be used on candidates for an overall increase in speed with correct output. bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;if (hasZeroByte) // or may just have 0x80 in the high byte{  hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);}
There is yet a faster method — use hasless(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to bool hasZeroByte = (v - 0x01010101UL) & ~v & 0x80808080UL;
The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.
Paul Messmer suggested the fast pretest improvement on October 2, 2004. Juha Järvi later suggested hasless(v, 1) on April 6, 2005, which he found on Paul Hsieh's Assembly Lab; previously it was written in a newsgroup post on April 27, 1987 by Alan Mycroft. Determine if a word has a byte less than n Test if a word x contains an unsigned byte with value < n. Specifically for n=1, it can be used to find a 0-byte by examining one long at a time, or any byte by XORing x with a mask first. Uses 4 arithmetic/logical operations when n is constant.
Requirements: x>=0; 0<=n<=128 #define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)
To count the number of bytes in x that are less than n in 7 operations, use
#define countless(x,n) \(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
Juha Järvi sent this clever technique to me on April 6, 2005. The countless macro was added by Sean Anderson on April 10, 2005, inspired by Juha's countmore, below. Determine if a word has a byte greater than n Test if a word x contains an unsigned byte with value > n. Uses 3 arithmetic/logical operations when n is constant.
Requirements: x>=0; 0<=n<=127 #define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128)
To count the number of bytes in x that are more than n in 6 operations, use:
#define countmore(x,n) \(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
The macro hasmore was suggested by Juha Järvi on April 6, 2005, and he added countmore on April 8, 2005. Determine if a word has a byte between m and n When m < n, this technique tests if a word x contains an unsigned byte value, such that m < value < n. It uses 7 arithmetic/logical operations when n and m are constant.
Note: Bytes that equal n can be reported by likelyhasbetween as false positives, so this should be checked by character if a certain result is needed. Requirements: x>=0; 0<=m<=127; 0<=n<=128 #define likelyhasbetween(x,m,n) \ ((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
This technique would be suitable for a fast pretest. A variation that takes one more operation (8 total for constant m and n) but provides the exact answer is:
#define hasbetween(x,m,n) \((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-m))&~0UL/255*128)
To count the number of bytes in x that are between m and n (exclusive) in 10 operations, use:
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)
Juha Järvi suggested likelyhasbetween on April 6, 2005. From there, Sean Anderson created hasbetween and countbetween on April 10, 2005.

posted on 2010-01-12 23:04 一个人的天空@ 阅读(649) 评论(0) 收藏举报

刷新页面返回顶部

一些位运算技巧（二）

公告