Bit Twiddling Hacks 位操作技巧经典文章
2011-03-09 21:44
417 查看
http://graphics.stanford.edu/~seander/bithacks.html
By Sean Eron Anderson
Individually, the code snippets here are in the public domain (unless otherwise noted) — feel free to use them however you please. The aggregate collection and descriptions are © 1997-2005 Sean Eron Anderson.
The code and descriptions are distributed in the hope that they will be useful, but
WITHOUT ANY WARRANTY and without even the implied warranty of merchantability or fitness for a particular purpose. As of May 5, 2005, all the code has been tested thoroughly. Thousands of people have read it. Moreover,
Professor Randal Bryant, the Dean of Computer Science at Carnegie Mellon University, has personally tested almost everything with his
Uclid code verification system. What he hasn't tested, I have checked against all possible inputs on a 32-bit machine.
To the first person to inform me of a legitimate bug in the code, I'll pay a bounty of US$10 (by check or Paypal). If directed to a charity, I'll pay US$20.
Compute the sign of an integer
Detect if two integers have opposite signs
Compute the integer absolute value (abs) without branching
Compute the minimum (min) or maximum (max) of two integers without branching
Determining if an integer is a power of 2
Sign extending
Sign extending from a constant bit-width
Sign extending from a variable bit-width
Sign extending from a variable bit-width in 3 operations
Conditionally set or clear bits without branching
Conditionally negate a value without branching
Merge bits from two values according to a mask
Counting bits set
Counting bits set, naive way
Counting bits set by lookup table
Counting bits set, Brian Kernighan's way
Counting bits set in 12, 24, or 32-bit words using 64-bit instructions
Counting bits set, in parallel
Count bits set (rank) from the most-significant bit upto a given position
Select the bit position (from the most-significant bit) with the given count (rank)
Computing parity (1 if an odd number of bits set, 0 otherwise)
Compute parity of a word the naive way
Compute parity by lookup table
Compute parity of a byte using 64-bit multiply and modulus division
Compute parity of word with a multiply
Compute parity in parallel
Swapping Values
Swapping values with subtraction and addition
Swapping values with XOR
Swapping individual bits with XOR
Reversing bit sequences
Reverse bits the obvious way
Reverse bits in word by lookup table
Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division)
Reverse the bits in a byte with 4 operations (64-bit multiply, no division)
Reverse the bits in a byte with 7 operations (no 64-bit, only 32)
Reverse an N-bit quantity in parallel with 5 * lg(N) operations
Modulus division (aka computing remainders)
Computing modulus division by 1 << s without a division operation (obvious)
Computing modulus division by (1 << s) - 1 without a division operation
Computing modulus division by (1 << s) - 1 in parallel without a division operation
Finding integer log base 2 of an integer (aka the position of the highest bit set)
Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)
Find the integer log base 2 of an integer with an 64-bit IEEE float
Find the log base 2 of an integer with a lookup table
Find the log base 2 of an N-bit integer in O(lg(N)) operations
Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup
Find integer log base 10 of an integer
Find integer log base 10 of an integer the obvious way
Find integer log base 2 of a 32-bit IEEE float
Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r)
Counting consecutive trailing zero bits (or finding bit indices)
Count the consecutive zero bits (trailing) on the right linearly
Count the consecutive zero bits (trailing) on the right in parallel
Count the consecutive zero bits (trailing) on the right by binary search
Count the consecutive zero bits (trailing) on the right by casting to a float
Count the consecutive zero bits (trailing) on the right with modulus division and lookup
Count the consecutive zero bits (trailing) on the right with multiply and lookup
Round up to the next highest power of 2 by float casting
Round up to the next highest power of 2
Interleaving bits (aka computing Morton Numbers)
Interleave bits the obvious way
Interleave bits by table lookup
Interleave bits with 64-bit multiply
Interleave bits by Binary Magic Numbers
Testing for ranges of bytes in a word (and counting occurances found)
Determine if a word has a zero byte
Determine if a word has a byte equal to n
Determine if a word has byte less than n
Determine if a word has a byte greater than n
Determine if a word has a byte between m and n
Compute the lexicographically next bit permutation
of the actual number of machine instructions and CPU time. All operations are assumed to take the same amount of time, which is not true in reality, but CPUs have been heading increasingly in this direction over time. There are many nuances that determine
how fast a system will run a given sample of code, such as cache sizes, memory bandwidths, instruction sets, etc. In the end, benchmarking is the best way to determine whether one method is really faster than another, so consider the techniques below as possibilities
to test on your target architecture.
The last expression above evaluates to sign = v >> 31 for 32-bit integers. This is one operation faster than the obvious way, sign = -(v < 0). This trick works because when signed integers are shifted right, the value of the far left bit is copied to the
other bits. The far left bit is 1 when the value is negative and 0 otherwise; all 1 bits gives -1. Unfortunately, this behavior is architecture-specific.
Alternatively, if you prefer the result be either -1 or +1, then use:
On the other hand, if you prefer the result be either -1, 0, or +1, then use:
If instead you want to know if something is non-negative, resulting in +1 or else 0, then use:
Caveat: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. For greater portability, Toby Speight suggested on September
28, 2005 that CHAR_BIT be used here and throughout rather than assuming bytes were 8 bits long. Angus recommended the more portable versions above, involving casting on March 4, 2006.
Rohit Garg suggested the version for non-negative integers on September 12, 2009.
Detect if two integers have opposite signs
Manfred Weis suggested I add this entry on November 26, 2009.
Compute the integer absolute value (abs) without branching
Patented variation:
Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, r = (v < 0) ? -(unsigned)v : v, even though the number
of operations is the same.
On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as
two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement). On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is superior to the one I initially came
up with,
patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to
Sun Microsystems. On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in
How to Optimize for the Pentium Processor by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which Vladimir could have read. Moreover, the Internet
Archive also has an old
link to it. On January 30, 2007, Peter Kankowski shared with me an
abs version he discovered that was inspired by Microsoft's Visual C++ compiler output. It is featured here as the primary solution. On December 6, 2007, Hai Jin complained that the result was signed, so when computing the abs of the most negative value,
it was still negative. On April 15, 2008 Andrew Shapira pointed out that the obvious approach could overflow, as it lacked an (unsigned) cast then; for maximum portability he suggested
first converts the negative value of v to an unsigned by adding 2**N, yielding a 2s complement representation of v's value that I'll call U. Then, U is negated, giving the desired result, -U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v).
Compute the minimum (min) or maximum (max) of two integers without branching
On some rare machines where branching is very expensive and no condition move instructions exist, the above expression might be faster than the obvious approach, r = (x < y) ? x : y, even though it involves two more instructions. (Typically, the obvious
approach is best, though.) It works because if x < y, then -(x < y) will be all ones, so r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x. Otherwise, if x >= y, then -(x < y) will be all zeros, so r = y ^ ((x ^ y) & 0) = y. On some machines, evaluating (x < y) as 0 or
1 requires a branch instruction, so there may be no advantage.
To find the maximum, use:
Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily
throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.
On March 7, 2003, Angus Duggan pointed out the right-shift portability issue. On May 3, 2005, Randal E. Bryant alerted me to the need for the precondition, INT_MIN <= x - y <= INT_MAX, and suggested the non-quick and dirty version as a fix. Both of these
issues concern only the quick and dirty version. Nigel Horspoon observed on July 6, 2005 that gcc produced the same code on a Pentium as the obvious solution because of how it evaluates (x < y). On July 9, 2008 Vincent Lefèvre pointed out the potential for
overflow exceptions with subtractions in r = y + ((x - y) & -(x < y)), which was the previous version. Timothy B. Terriberry suggested using xor rather than add and subract to avoid casting and the risk of overflows on June 2, 2009.
Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:
will work if x is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as 1101 in binary. If we have 8 bits, then -3 is 11111101. The most-significant bit of the 4-bit representation
is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit-width is trivial, since bit fields may be specified in structs or unions. For example,
to convert from 5 bits to an full integer:
The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course).
John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined.
The code above requires four operations, but when the bitwidth is a constant rather than variable, it requires only two fast operations, assuming the upper bits are already zeroes.
A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is:
Sean A. Irvine suggested that I add sign extension methods to this page on June 13, 2004, and he provided
Sharma suggested I add a step to deal with situations where x had possible ones in bits other than the b bits we wanted to sign-extend on Oct. 15, 2008. On December 31, 2009 Chris Pirazzi suggested I add the faster version, which requires two operations for
constant bit-widths and three for variable widths.
Sign extending from a variable bit-width in 3 operations
The following may be slow on some machines, due to the effort required for multiplication and division. This version is 4 operations. If you know that your initial bit-width, b, is greater than 1, you might do this type of sign extension in 3 operations
by using r = (x * multipliers[b]) / multipliers[b], which requires only one array lookup.
The following variation is not portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast.
Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version (that used multipliers[] for divisors[]), where it failed on the case of x=1 and b=1.
Conditionally set or clear bits without branching
On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10% faster. An Intel Core 2 Duo ran the superscalar version
about 16% faster than the first. Glenn Slayden informed me of the first expression on December 11, 2003. Marco Yu shared the superscalar version with me on April 3, 2007 and alerted me to a typo 2 days later.
Conditionally negate a value without branching
If you need to negate only when a flag is false, then use the following to avoid branching:
If you need to negate only when a flag is true, then use this:
Avraham Plotnitzky suggested I add the first version on June 2, 2009. Motivated to avoid the multiply, I came up with the second version on June 8, 2009. Alfonso De Gregorio pointed out that some parens were missing on November 26, 2009, and received a bug
bounty.
This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage.
Ron Jeffery sent this to me on February 9, 2006.
The naive approach requires one iteration per bit, until no more bits are set. So on a 32-bit word with only the high set, it will go through 32 iterations.
On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.
Counting bits set, Brian Kernighan's way
Brian Kernighan's method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.
Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322.
(Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)"
Counting bits set in 14, 24, or 32-bit words using 64-bit instructions
This method requires a 64-bit CPU with fast modulus division to be efficient. The first option takes only 3 operations; the second option takes 10; and the third option takes 15.
Rich Schroeppel originally created a 9-bit version, similiar to option 1; see the Programming Hacks section of
Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. His method was the inspiration for the variants above, devised by Sean Anderson. Randal E. Bryant offered
a couple bug fixes on May 3, 2005. Bruce Dawson tweaked what had been a 12-bit version and made it suitable for 14 bits using the same number of operations on Feburary 1, 2007.
It uses 7 arithmetic/logical operations when n and m are constant.
Note: Bytes that equal n can be reported by
Requirements: x>=0; 0<=m<=127; 0<=n<=128
This technique would be suitable for a fast pretest. A variation that takes one more operation (8 total for constant m and n) but provides the exact answer is:
To count the number of bytes in x that are between m and n (exclusive) in 10 operations, use:
Juha Järvi suggested
Compute the lexicographically next bit permutation
Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of N 1 bits in a lexicographical sense. For example, if N is 3 and the bit pattern is 00010011, the next patterns would be 00010101, 00010110, 00011001,00011010,
00011100, 00100011, and so forth. The following is a fast way to compute the next permutation.
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for x86, the intrinsic is _BitScanForward. These both emit a bsf instruction, but equivalents may be available for other
architectures. If not, then consider using one of the methods for counting the consecutive zero bits mentioned earlier.
Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros.
Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.
Bit Twiddling Hacks
By Sean Eron Anderson
seander@cs.stanford.edu
Individually, the code snippets here are in the public domain (unless otherwise noted) — feel free to use them however you please. The aggregate collection and descriptions are © 1997-2005 Sean Eron Anderson.The code and descriptions are distributed in the hope that they will be useful, but
WITHOUT ANY WARRANTY and without even the implied warranty of merchantability or fitness for a particular purpose. As of May 5, 2005, all the code has been tested thoroughly. Thousands of people have read it. Moreover,
Professor Randal Bryant, the Dean of Computer Science at Carnegie Mellon University, has personally tested almost everything with his
Uclid code verification system. What he hasn't tested, I have checked against all possible inputs on a 32-bit machine.
To the first person to inform me of a legitimate bug in the code, I'll pay a bounty of US$10 (by check or Paypal). If directed to a charity, I'll pay US$20.
Contents
About the operation counting methodologyCompute the sign of an integer
Detect if two integers have opposite signs
Compute the integer absolute value (abs) without branching
Compute the minimum (min) or maximum (max) of two integers without branching
Determining if an integer is a power of 2
Sign extending
Sign extending from a constant bit-width
Sign extending from a variable bit-width
Sign extending from a variable bit-width in 3 operations
Conditionally set or clear bits without branching
Conditionally negate a value without branching
Merge bits from two values according to a mask
Counting bits set
Counting bits set, naive way
Counting bits set by lookup table
Counting bits set, Brian Kernighan's way
Counting bits set in 12, 24, or 32-bit words using 64-bit instructions
Counting bits set, in parallel
Count bits set (rank) from the most-significant bit upto a given position
Select the bit position (from the most-significant bit) with the given count (rank)
Computing parity (1 if an odd number of bits set, 0 otherwise)
Compute parity of a word the naive way
Compute parity by lookup table
Compute parity of a byte using 64-bit multiply and modulus division
Compute parity of word with a multiply
Compute parity in parallel
Swapping Values
Swapping values with subtraction and addition
Swapping values with XOR
Swapping individual bits with XOR
Reversing bit sequences
Reverse bits the obvious way
Reverse bits in word by lookup table
Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division)
Reverse the bits in a byte with 4 operations (64-bit multiply, no division)
Reverse the bits in a byte with 7 operations (no 64-bit, only 32)
Reverse an N-bit quantity in parallel with 5 * lg(N) operations
Modulus division (aka computing remainders)
Computing modulus division by 1 << s without a division operation (obvious)
Computing modulus division by (1 << s) - 1 without a division operation
Computing modulus division by (1 << s) - 1 in parallel without a division operation
Finding integer log base 2 of an integer (aka the position of the highest bit set)
Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)
Find the integer log base 2 of an integer with an 64-bit IEEE float
Find the log base 2 of an integer with a lookup table
Find the log base 2 of an N-bit integer in O(lg(N)) operations
Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup
Find integer log base 10 of an integer
Find integer log base 10 of an integer the obvious way
Find integer log base 2 of a 32-bit IEEE float
Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r)
Counting consecutive trailing zero bits (or finding bit indices)
Count the consecutive zero bits (trailing) on the right linearly
Count the consecutive zero bits (trailing) on the right in parallel
Count the consecutive zero bits (trailing) on the right by binary search
Count the consecutive zero bits (trailing) on the right by casting to a float
Count the consecutive zero bits (trailing) on the right with modulus division and lookup
Count the consecutive zero bits (trailing) on the right with multiply and lookup
Round up to the next highest power of 2 by float casting
Round up to the next highest power of 2
Interleaving bits (aka computing Morton Numbers)
Interleave bits the obvious way
Interleave bits by table lookup
Interleave bits with 64-bit multiply
Interleave bits by Binary Magic Numbers
Testing for ranges of bytes in a word (and counting occurances found)
Determine if a word has a zero byte
Determine if a word has a byte equal to n
Determine if a word has byte less than n
Determine if a word has a byte greater than n
Determine if a word has a byte between m and n
Compute the lexicographically next bit permutation
About the operation counting methodology
When totaling the number of operations for algorithms here, any C operator is counted as one operation. Intermediate assignments, which need not be written to RAM, are not counted. Of course, this operation counting approach only serves as an approximationof the actual number of machine instructions and CPU time. All operations are assumed to take the same amount of time, which is not true in reality, but CPUs have been heading increasingly in this direction over time. There are many nuances that determine
how fast a system will run a given sample of code, such as cache sizes, memory bandwidths, instruction sets, etc. In the end, benchmarking is the best way to determine whether one method is really faster than another, so consider the techniques below as possibilities
to test on your target architecture.
Compute the sign of an integer
int v; // we want to find the sign of v int sign; // the result goes here // CHAR_BIT is the number of bits per byte (normally 8). sign = -(v < 0); // if v < 0 then -1, else 0. // or, to avoid branching on CPUs with flag registers (IA32): sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); // or, for one less instruction (but not portable): sign = v >> (sizeof(int) * CHAR_BIT - 1);
The last expression above evaluates to sign = v >> 31 for 32-bit integers. This is one operation faster than the obvious way, sign = -(v < 0). This trick works because when signed integers are shifted right, the value of the far left bit is copied to the
other bits. The far left bit is 1 when the value is negative and 0 otherwise; all 1 bits gives -1. Unfortunately, this behavior is architecture-specific.
Alternatively, if you prefer the result be either -1 or +1, then use:
sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then -1, else +1
On the other hand, if you prefer the result be either -1, 0, or +1, then use:
sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); // Or, for more speed but less portability: sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1, 0, or +1 // Or, for portability, brevity, and (perhaps) speed: sign = (v > 0) - (v < 0); // -1, 0, or +1
If instead you want to know if something is non-negative, resulting in +1 or else 0, then use:
sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0, else 1
Caveat: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. For greater portability, Toby Speight suggested on September
28, 2005 that CHAR_BIT be used here and throughout rather than assuming bytes were 8 bits long. Angus recommended the more portable versions above, involving casting on March 4, 2006.
Rohit Garg suggested the version for non-negative integers on September 12, 2009.
Detect if two integers have opposite signs
int x, y; // input values to compare signs bool f = ((x ^ y) < 0); // true iff x and y have opposite signs
Manfred Weis suggested I add this entry on November 26, 2009.
Compute the integer absolute value (abs) without branching
int v; // we want to find the absolute value of v unsigned int r; // the result goes here int const mask = v >> sizeof(int) * CHAR_BIT - 1; r = (v + mask) ^ mask;
Patented variation:
r = (v ^ mask) - mask;
Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, r = (v < 0) ? -(unsigned)v : v, even though the number
of operations is the same.
On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as
two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement). On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is superior to the one I initially came
up with,
r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v, because a multiply is not used. Unfortunately, this method has been
patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to
Sun Microsystems. On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in
How to Optimize for the Pentium Processor by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which Vladimir could have read. Moreover, the Internet
Archive also has an old
link to it. On January 30, 2007, Peter Kankowski shared with me an
abs version he discovered that was inspired by Microsoft's Visual C++ compiler output. It is featured here as the primary solution. On December 6, 2007, Hai Jin complained that the result was signed, so when computing the abs of the most negative value,
it was still negative. On April 15, 2008 Andrew Shapira pointed out that the obvious approach could overflow, as it lacked an (unsigned) cast then; for maximum portability he suggested
(v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v. But citing the ISO C99 spec on July 9, 2008, Vincent Lefèvre convinced me to remove it becasue even on non-2s-complement machines -(unsigned)v will do the right thing. The evaluation of -(unsigned)v
first converts the negative value of v to an unsigned by adding 2**N, yielding a 2s complement representation of v's value that I'll call U. Then, U is negated, giving the desired result, -U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v).
Compute the minimum (min) or maximum (max) of two integers without branching
int x; // we want to find the minimum of x and y int y; int r; // the result goes here r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
On some rare machines where branching is very expensive and no condition move instructions exist, the above expression might be faster than the obvious approach, r = (x < y) ? x : y, even though it involves two more instructions. (Typically, the obvious
approach is best, though.) It works because if x < y, then -(x < y) will be all ones, so r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x. Otherwise, if x >= y, then -(x < y) will be all zeros, so r = y ^ ((x ^ y) & 0) = y. On some machines, evaluating (x < y) as 0 or
1 requires a branch instruction, so there may be no advantage.
To find the maximum, use:
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
Quick and dirty versions:
If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which are faster because (x - y) only needs to be evaluated once.r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y) r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily
throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.
On March 7, 2003, Angus Duggan pointed out the right-shift portability issue. On May 3, 2005, Randal E. Bryant alerted me to the need for the precondition, INT_MIN <= x - y <= INT_MAX, and suggested the non-quick and dirty version as a fix. Both of these
issues concern only the quick and dirty version. Nigel Horspoon observed on July 6, 2005 that gcc produced the same code on a Pentium as the obvious solution because of how it evaluates (x < y). On July 9, 2008 Vincent Lefèvre pointed out the potential for
overflow exceptions with subtractions in r = y + ((x - y) & -(x < y)), which was the previous version. Timothy B. Terriberry suggested using xor rather than add and subract to avoid casting and the risk of overflows on June 2, 2009.
Determining if an integer is a power of 2
unsigned int v; // we want to see if v is a power of 2 bool f; // the result goes here f = (v & (v - 1)) == 0;
Note that 0 is incorrectly considered a power of 2 here. To remedy this, use:
f = v && !(v & (v - 1));
Sign extending from a constant bit-width
Sign extension is automatic for built-in types, such as chars and ints. But suppose you have a signed two's complement number, x, that is stored using only b bits. Moreover, suppose you want to convert x to an int, which has more than b bits. A simple copywill work if x is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as 1101 in binary. If we have 8 bits, then -3 is 11111101. The most-significant bit of the 4-bit representation
is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit-width is trivial, since bit fields may be specified in structs or unions. For example,
to convert from 5 bits to an full integer:
int x; // convert this from using 5 bits to a full int int r; // resulting sign extended number goes here struct {signed int x:5;} s; r = s.x = x;
The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course).
template <typename T, unsigned B> inline T signextend(const T x) { struct {T x:B;} s; return s.x = x; } int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r
John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005. On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined.
Sign extending from a variable bit-width
Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, b, in which it is represented. (Or we could be programming in a language like Java, which lacks bitfields.)unsigned b; // number of bits representing the number in x int x; // sign extend this b-bit number to r int r; // resulting sign-extended number int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are already zero.) r = (x ^ m) - m;
The code above requires four operations, but when the bitwidth is a constant rather than variable, it requires only two fast operations, assuming the upper bits are already zeroes.
A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is:
int const m = CHAR_BIT * sizeof(x) - b; r = (x << m) >> m;
Sean A. Irvine suggested that I add sign extension methods to this page on June 13, 2004, and he provided
m = (1 << (b - 1)) - 1; r = -(x & ~m) | x;as a starting point from which I optimized to get m = 1U << (b - 1); r = -(x & m) | x. But then on May 11, 2007, Shay Green suggested the version above, which requires one less operation than mine. Vipin
Sharma suggested I add a step to deal with situations where x had possible ones in bits other than the b bits we wanted to sign-extend on Oct. 15, 2008. On December 31, 2009 Chris Pirazzi suggested I add the faster version, which requires two operations for
constant bit-widths and three for variable widths.
Sign extending from a variable bit-width in 3 operations
The following may be slow on some machines, due to the effort required for multiplication and division. This version is 4 operations. If you know that your initial bit-width, b, is greater than 1, you might do this type of sign extension in 3 operationsby using r = (x * multipliers[b]) / multipliers[b], which requires only one array lookup.
unsigned b; // number of bits representing the number in x int x; // sign extend this b-bit number to r int r; // resulting sign-extended number #define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte static int const multipliers[] = { 0, M(1), M(2), M(3), M(4), M(5), M(6), M(7), M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15), M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23), M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31), M(32) }; // (add more if using more than 64 bits) static int const divisors[] = { 1, ~M(1), M(2), M(3), M(4), M(5), M(6), M(7), M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15), M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23), M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31), M(32) }; // (add more for 64 bits) #undef M r = (x * multipliers[b]) / divisors[b];
The following variation is not portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast.
const int s = -b; // OR: sizeof(x) * CHAR_BIT - b; r = (x << s) >> s;
Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version (that used multipliers[] for divisors[]), where it failed on the case of x=1 and b=1.
Conditionally set or clear bits without branching
bool f; // conditional flag unsigned int m; // the bit mask unsigned int w; // the word to modify: if (f) w |= m; else w &= ~m; w ^= (-f ^ w) & m; // OR, for superscalar CPUs: w = (w & ~m) | (-f & m);
On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10% faster. An Intel Core 2 Duo ran the superscalar version
about 16% faster than the first. Glenn Slayden informed me of the first expression on December 11, 2003. Marco Yu shared the superscalar version with me on April 3, 2007 and alerted me to a typo 2 days later.
Conditionally negate a value without branching
If you need to negate only when a flag is false, then use the following to avoid branching:bool fDontNegate; // Flag indicating we should not negate v. int v; // Input value to negate if fDontNegate is false. int r; // result = fDontNegate ? v : -v; r = (fDontNegate ^ (fDontNegate - 1)) * v;
If you need to negate only when a flag is true, then use this:
bool fNegate; // Flag indicating if we should negate v. int v; // Input value to negate if fNegate is true. int r; // result = fNegate ? -v : v; r = (v ^ -fNegate) + fNegate;
Avraham Plotnitzky suggested I add the first version on June 2, 2009. Motivated to avoid the multiply, I came up with the second version on June 8, 2009. Alfonso De Gregorio pointed out that some parens were missing on November 26, 2009, and received a bug
bounty.
Merge bits from two values according to a mask
unsigned int a; // value to merge in non-masked bits unsigned int b; // value to merge in masked bits unsigned int mask; // 1 where bits from b should be selected; 0 where from a. unsigned int r; // result of (a & ~mask) | (b & mask) goes here r = a ^ ((a ^ b) & mask);
This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage.
Ron Jeffery sent this to me on February 9, 2006.
Counting bits set (naive way)
unsigned int v; // count the number of bits set in v unsigned int c; // c accumulates the total bits set in v for (c = 0; v; v >>= 1) { c += v & 1; }
The naive approach requires one iteration per bit, until no more bits are set. So on a 32-bit word with only the high set, it will go through 32 iterations.
Counting bits set by lookup table
static const unsigned char BitsSetTable256[256] = { # define B2(n) n, n+1, n+1, n+2 # define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2) # define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2) B6(0), B6(1), B6(1), B6(2) }; unsigned int v; // count the number of bits set in 32-bit value v unsigned int c; // c is the total bits set in v // Option 1: c = BitsSetTable256[v & 0xff] + BitsSetTable256[(v >> 8) & 0xff] + BitsSetTable256[(v >> 16) & 0xff] + BitsSetTable256[v >> 24]; // Option 2: unsigned char * p = (unsigned char *) &v; c = BitsSetTable256[p[0]] + BitsSetTable256[p[1]] + BitsSetTable256[p[2]] + BitsSetTable256[p[3]]; // To initially generate the table algorithmically: BitsSetTable256[0] = 0; for (int i = 0; i < 256; i++) { BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2]; }
On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.
Counting bits set, Brian Kernighan's way
unsigned int v; // count the number of bits set in v unsigned int c; // c accumulates the total bits set in v for (c = 0; v; c++) { v &= v - 1; // clear the least significant bit set }
Brian Kernighan's method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.
Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322.
(Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)"
Counting bits set in 14, 24, or 32-bit words using 64-bit instructions
unsigned int v; // count the number of bits set in v unsigned int c; // c accumulates the total bits set in v // option 1, for at most 14-bit values in v: c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf; // option 2, for at most 24-bit values in v: c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f; c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f; // option 3, for at most 32-bit values in v: c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f; c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f; c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
This method requires a 64-bit CPU with fast modulus division to be efficient. The first option takes only 3 operations; the second option takes 10; and the third option takes 15.
Rich Schroeppel originally created a 9-bit version, similiar to option 1; see the Programming Hacks section of
Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. His method was the inspiration for the variants above, devised by Sean Anderson. Randal E. Bryant offered
a couple bug fixes on May 3, 2005. Bruce Dawson tweaked what had been a 12-bit version and made it suitable for 14 bits using the same number of operations on Feburary 1, 2007.
It uses 7 arithmetic/logical operations when n and m are constant.
Note: Bytes that equal n can be reported by
likelyhasbetweenas false positives, so this should be checked by character if a certain result is needed.
Requirements: x>=0; 0<=m<=127; 0<=n<=128
#define likelyhasbetween(x,m,n) / ((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
This technique would be suitable for a fast pretest. A variation that takes one more operation (8 total for constant m and n) but provides the exact answer is:
#define hasbetween(x,m,n) / ((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
To count the number of bytes in x that are between m and n (exclusive) in 10 operations, use:
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)
Juha Järvi suggested
likelyhasbetweenon April 6, 2005. From there, Sean Anderson created
hasbetweenand
countbetweenon April 10, 2005.
Compute the lexicographically next bit permutation
Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of N 1 bits in a lexicographical sense. For example, if N is 3 and the bit pattern is 00010011, the next patterns would be 00010101, 00010110, 00011001,00011010,00011100, 00100011, and so forth. The following is a fast way to compute the next permutation.
unsigned int v; // current permutation of bits unsigned int w; // next permutation of bits unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1 // Next set to 1 the most significant bit to change, // set to 0 the least significant ones, and add the necessary 1 bits. w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for x86, the intrinsic is _BitScanForward. These both emit a bsf instruction, but equivalents may be available for other
architectures. If not, then consider using one of the methods for counting the consecutive zero bits mentioned earlier.
Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros.
unsigned int t = (v | (v - 1)) + 1; w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.
相关文章推荐
- 后台撰写、编辑文章操作技巧
- 常用函数公式及操作技巧系列文章【共十篇】
- Excel表格经典实用操作技巧19招
- [推荐][提供下载](Excel):常用函数公式及操作技巧系列文章【共十篇】
- C#操作Access数据库,收集了3篇经典文章
- [推荐][提供下载](Excel):常用函数公式及操作技巧系列文章【共十篇】
- Android编程之文件读写操作与技巧总结【经典收藏】
- 转的 文章 javasript 55个经典技巧
- [推荐][提供下载](Excel):常用函数公式及操作技巧系列文章【共十篇】
- [经典文章]PHP高级技巧全放送
- SqlServer数据库操作大全——常用语句/技巧集锦/经典语句 【转载】
- sqlserver数据库操作大全——常用语句/技巧集锦/经典语句
- Word这些操作技巧,你会与不会只差这篇文章的距离,关键你想不想
- C#操作XML简要教程(经典文章值的一读)
- 后台撰写、编辑文章操作技巧
- windows XP 超级100个技巧(超级经典的好文章)
- C#操作XML简要教程(经典文章值的一读)
- PHP操作文章列表相关技巧分享
- JavaScript最常用的55个经典技巧
- 技巧:用Flash制作动画的经典问题问答