七种bit count快速计算方法比较
2015-01-13 00:03
453 查看
转自:http://blog.chinaunix.net/u/13991/showart_115947.html http://idning.iteye.com/blog/732769
代码:http://infolab.stanford.edu/~manku/bitcount/bitcount.c
Compiled from various sources by Gurmeet Singh Manku
A common problem asked in job interviews is to count the number of bits that are on in an unsigned integer. Here are seven solutions to this problem. Source
code in C is available.
Iterated Count runs in time proportional to the total number of bits. It simply loops through all the bits, terminating slightly earlier because of the
while condition. Useful if 1's are sparse and among the least significant bits.Sparse Ones runs in time proportional to the number of 1 bits. The line n &= (n - 1) simply sets
the rightmost 1 bit in n to 0. Dense Ones runs in time proportional to the number of 0 bits. It is the same as Sparse Ones, except that it first toggles all bits (n ~= -1), and
continually subtracts the number of 1 bits from sizeof(int).Precompute_8bit assumes an array bits_in_char such that bits_in_char[i] contains the number of 1 bits in the binary
representation for i. It repeatedly updates count by masking out the last eight bits in n, and indexing into bits_in_char.
Precompute_16bit is a variant of Precompute_8bit in that an array bits_in_16bits[] stores the number of 1 bits in successive 16 bit numbers (shorts).
Parallel Count carries out bit counting in a parallel fashion. Consider n after the first line has finished executing. Imagine splitting n into pairs of
bits. Each pair contains the number of ones in those two bit positions in the original n. After the second line has finished executing, each nibble contains thenumber of ones in those four bits positions in the original n. Continuing this
for five iterations, the 64 bits contain the number of ones among these sixty-four bit positions in the original n. That is what we wanted to compute.
Nifty Parallel Count works the same way as Parallel Count for the first three iterations. At the end of the third line (just before the return), each byte
of n contains the number of ones in those eight bit positions in the original n. A little thought then explains why the remainder modulo 255 works.
MIT Hackmem Count is funky. Consider a 3 bit number as being 4a+2b+c. If we shift it right 1 bit, we have 2a+b. Subtracting this from the original gives
2a+b+c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a+b+c, which is the number of bits in the original number. How is this insight employed? The first assignment statement in the routine computestmp.
Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1's in the corresponding three bit positions in n. The last return statement sums these octal digits to produce the final answer. The
key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. This is accomplished by right-shifting tmp by three bits, adding it to tmp itself and ANDing with a suitable mask. This yields a number
in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus
1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970's.
Which of the several bit counting routines is the fastest? Results of speed trials on an i686 are summarized in the table on left. "No Optimization" was compiled with plain gcc. "Some Optimizations" was gcc -O3. "Heavy
Optimizations" corresponds to gcc -O3 -mcpu=i686 -march=i686 -fforce-addr -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4.
Thanks to Seth Robertson who suggested performing speed trials by extending bitcount.c.
Seth also pointed me to MIT_Hackmem routine. Thanks to Denny Gursky who suggested the idea of Precompute_11bit.
That would require three sums (11-bit, 11-bit and 10-bit precomputed counts). I then tried Precompute_16bit which turned out to be even faster.
If you have niftier solutions up your sleeves, please send me an e-mail
这里的(unsigned int) (-1) /3为啥是01010101的样子???
做了下实验:
>>> bin(int('11111111',2)/3)
'0b1010101'
>>> bin(int('11111111',2)/5)
'0b110011'
这个只是一个小技巧,具体在纸上除一下:
001
101 | 11111111
101
10
00110
101 | 11111111
101
101
0
01
这里的%63 是什么作用??
1. 假设最后结果n为:
000111 001111
b a
n = b*64+a
= 63b + (a+b)
所以
n%63 = [63b + (a+b)]%63
= 63b % 63 + (a+b) % 63 根据模的性质((a%m + b%m)%m = (a+b)%m)
= (a+b)
2. 假设结果n为:
000011 000111 001111
c b a
n = c*642 + b*64 + a
= c*(642-1+1) + 64b + a
= c*(642-1) + c + 64b + a
= c*(64-1)(64+1) + c + 64b + a
= c*65*63 + 63b + (a + b + c )
所以 n%63 = a+b+c
3. 现在我们看644, 645 ...
644 = (644 -1 +1) = (644 -1 ) + 1
而(644 - 1) 一定可以分解为(644 - 1) *... , 必然能被63整除.
所以n % 63 = n的64进制各个数位上的数字之和.
这也解释了为什么必须是63, 当数字是用64进制表示的时候,就只能选择64-1 = 63
模的基本性质:
(a + b) % n = (a % n + b % n) % n (1)
(a - b) % n = (a % n - b % n) % n (2)
(a * b) % n = (a % n * b % n) % n (3)
代码:http://infolab.stanford.edu/~manku/bitcount/bitcount.c
Fast Bit Counting Routines
Compiled from various sources by Gurmeet Singh MankuA common problem asked in job interviews is to count the number of bits that are on in an unsigned integer. Here are seven solutions to this problem. Source
code in C is available.
|
| ||||
|
|
while condition. Useful if 1's are sparse and among the least significant bits.Sparse Ones runs in time proportional to the number of 1 bits. The line n &= (n - 1) simply sets
the rightmost 1 bit in n to 0. Dense Ones runs in time proportional to the number of 0 bits. It is the same as Sparse Ones, except that it first toggles all bits (n ~= -1), and
continually subtracts the number of 1 bits from sizeof(int).Precompute_8bit assumes an array bits_in_char such that bits_in_char[i] contains the number of 1 bits in the binary
representation for i. It repeatedly updates count by masking out the last eight bits in n, and indexing into bits_in_char.
Precompute_16bit |
// static char bits_in_16bits [0x1u << 16] ; int bitcount (unsigned int n) { // works only for 32-bit ints return bits_in_16bits [n & 0xffffu] + bits_in_16bits [(n >> 16) & 0xffffu] ; } |
Parallel Count |
#define TWO(c) (0x1u << (c)) #define MASK(c) (((unsigned int)(-1)) / (TWO(TWO(c)) + 1u)) #define COUNT(x,c) ((x) & MASK(c)) + (((x) >> (TWO(c))) & MASK(c)) int bitcount (unsigned int n) { n = COUNT(n, 0) ; n = COUNT(n, 1) ; n = COUNT(n, 2) ; n = COUNT(n, 3) ; n = COUNT(n, 4) ; /* n = COUNT(n, 5) ; for 64-bit integers */ return n ; } |
bits. Each pair contains the number of ones in those two bit positions in the original n. After the second line has finished executing, each nibble contains thenumber of ones in those four bits positions in the original n. Continuing this
for five iterations, the 64 bits contain the number of ones among these sixty-four bit positions in the original n. That is what we wanted to compute.
Nifty Parallel Count |
#define MASK_01010101 (((unsigned int)(-1))/3) #define MASK_00110011 (((unsigned int)(-1))/5) #define MASK_00001111 (((unsigned int)(-1))/17) int bitcount (unsigned int n) { n = (n & MASK_01010101) + ((n >> 1) & MASK_01010101) ; n = (n & MASK_00110011) + ((n >> 2) & MASK_00110011) ; n = (n & MASK_00001111) + ((n >> 4) & MASK_00001111) ; return n % 255 ; } |
of n contains the number of ones in those eight bit positions in the original n. A little thought then explains why the remainder modulo 255 works.
MIT HACKMEM Count |
int bitcount(unsigned int n) { /* works for 32-bit numbers only */ /* fix last line for 64-bit numbers */ register unsigned int tmp; tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); return ((tmp + (tmp >> 3)) & 030707070707) % 63; } |
2a+b+c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a+b+c, which is the number of bits in the original number. How is this insight employed? The first assignment statement in the routine computestmp.
Consider the octal representation of tmp. Each digit in the octal representation is simply the number of 1's in the corresponding three bit positions in n. The last return statement sums these octal digits to produce the final answer. The
key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63. This is accomplished by right-shifting tmp by three bits, adding it to tmp itself and ANDing with a suitable mask. This yields a number
in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus
1023. This is HACKMEM 169, as used in X11 sources. Source: MIT AI Lab memo, late 1970's.
No Optimization Some Optimization Heavy Optimization Precomp_16 52.94 Mcps Precomp_16 76.22 Mcps Precomp_16 80.58 Mcps Precomp_8 29.74 Mcps Precomp_8 49.83 Mcps Precomp_8 51.65 Mcps Parallel 19.30 Mcps Parallel 36.00 Mcps Parallel 38.55 Mcps MIT 16.93 Mcps MIT 17.10 Mcps Nifty 31.82 Mcps Nifty 12.78 Mcps Nifty 16.07 Mcps MIT 29.71 Mcps Sparse 5.70 Mcps Sparse 15.01 Mcps Sparse 14.62 Mcps Dense 5.30 Mcps Dense 14.11 Mcps Dense 14.56 Mcps Iterated 3.60 Mcps Iterated 3.84 Mcps Iterated 9.24 Mcps Mcps = Million counts per second |
Optimizations" corresponds to gcc -O3 -mcpu=i686 -march=i686 -fforce-addr -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -malign-functions=4.
Thanks to Seth Robertson who suggested performing speed trials by extending bitcount.c.
Seth also pointed me to MIT_Hackmem routine. Thanks to Denny Gursky who suggested the idea of Precompute_11bit.
That would require three sums (11-bit, 11-bit and 10-bit precomputed counts). I then tried Precompute_16bit which turned out to be even faster.
If you have niftier solutions up your sleeves, please send me an e-mail
问题1:
#define MASK_01010101 (((unsigned int)(-1))/3) #define MASK_00110011 (((unsigned int)(-1))/5) #define MASK_00001111 (((unsigned int)(-1))/17)
这里的(unsigned int) (-1) /3为啥是01010101的样子???
做了下实验:
>>> bin(int('11111111',2)/3)
'0b1010101'
>>> bin(int('11111111',2)/5)
'0b110011'
这个只是一个小技巧,具体在纸上除一下:
001
101 | 11111111
101
10
00110
101 | 11111111
101
101
0
01
问题2:
n % 255 ;
n%63
这里的%63 是什么作用??
1. 假设最后结果n为:
000111 001111
b a
n = b*64+a
= 63b + (a+b)
所以
n%63 = [63b + (a+b)]%63
= 63b % 63 + (a+b) % 63 根据模的性质((a%m + b%m)%m = (a+b)%m)
= (a+b)
2. 假设结果n为:
000011 000111 001111
c b a
n = c*642 + b*64 + a
= c*(642-1+1) + 64b + a
= c*(642-1) + c + 64b + a
= c*(64-1)(64+1) + c + 64b + a
= c*65*63 + 63b + (a + b + c )
所以 n%63 = a+b+c
3. 现在我们看644, 645 ...
644 = (644 -1 +1) = (644 -1 ) + 1
而(644 - 1) 一定可以分解为(644 - 1) *... , 必然能被63整除.
所以n % 63 = n的64进制各个数位上的数字之和.
这也解释了为什么必须是63, 当数字是用64进制表示的时候,就只能选择64-1 = 63
模的基本性质:
(a + b) % n = (a % n + b % n) % n (1)
(a - b) % n = (a % n - b % n) % n (2)
(a * b) % n = (a % n * b % n) % n (3)
相关文章推荐
- 阶乘数的快速计算方法
- 网络的快速计算方法
- Haar-like矩形特征的特征值的快速计算方法
- 23.用最简单,最快速的方法计算出下面这个圆形是否和正方形相交
- 快速排序模板方法,只实现了int比较的仿函数
- sqlserver大数据量计算行数的快速方法
- 极限优化:Haar特征的另一种的快速计算方法—boxfilter
- 快速傅氏变换之旅(二) 七种FFT算法速度比较(含代码)
- C#中几种比较时间以及计算时间差的方法
- Haar-like矩形特征的特征值的快速计算方法
- 比较全面DataRow的映射的helper,兼测lambda方式的快速SetValue方法(测试结果大大出乎意料)
- 一种写程序快速计算常系数线性齐次递推关系的指定项的方法
- 子网掩码的快速计算方法
- delphi导出数据至Excel的七种方法及比较
- 超强快速计算方法
- 快速计算子网掩码的方法以及工具(3个,附件中)
- Haar-like矩形特征的特征值的快速计算方法
- 子网掩码和网络ID的快速计算方法
- 时序裕量计算之三:两种计算方法的比较
- 计算机基础—任意整数补码的快速计算方法