boost::flat_map性能测试
2015-06-30 10:11
1266 查看
文章转自:boost::flat_map and its performance compared to map and unordered_map
have run a benchmark on different data structures very recently at my company so I feel I need to drop a word. It is very complicated to benchmark something correctly.
1) You need to consider about cache warming
Most people running benchmarks are afraid of timer discrepancy, therefore they run their stuff thousands of times and take the whole time, they just are careful to take the same thousand of times for every operation, and then consider that comparable.
The truth is, in real world it makes little sense, because your cache will not be warm, and your operation will likely be called just once. Therefore you need to benchmark using RDTSC, and time stuff calling them once only. Intel has made a paper describing how to use RDTSC (using a cpuid instruction to flush the pipeline, and calling it at least 3 times at the beginning of the program to stabilize it).
2) rdtsc accuracy measure
I also recommend doing this:
C++
<span class="pln">u64 g_correctionFactor</span><span class="pun">;</span> <span class="com">// number of clocks to offset after each measurment to remove the overhead of the measurer itself.</span><span class="pln">
u64 g_accuracy</span><span class="pun">;</span>
<span class="kwd">static</span><span class="pln"> u64 </span><span class="kwd">const</span><span class="pln"> errormeasure </span><span class="pun">=</span> <span class="pun">~((</span><span class="pln">u64</span><span class="pun">)</span><span class="lit">0</span><span class="pun">);</span>
<span class="com">#ifdef</span><span class="pln"> _MSC_VER
</span><span class="com">#pragma</span><span class="pln"> intrinsic</span><span class="pun">(</span><span class="pln">__rdtsc</span><span class="pun">)</span>
<span class="kwd">inline</span><span class="pln"> u64 </span><span class="typ">GetRDTSC</span><span class="pun">()</span>
<span class="pun">{</span>
<span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// flush OOO instruction pipeline</span>
<span class="kwd">return</span><span class="pln"> __rdtsc</span><span class="pun">();</span>
<span class="pun">}</span>
<span class="kwd">inline</span> <span class="kwd">void</span> <span class="typ">WarmupRDTSC</span><span class="pun">()</span>
<span class="pun">{</span>
<span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// warmup cpuid.</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span>
<span class="com">// measure the measurer overhead with the measurer (crazy he..)</span><span class="pln">
u64 minDiff </span><span class="pun">=</span><span class="pln"> LLONG_MAX</span><span class="pun">;</span><span class="pln">
u64 maxDiff </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span> <span class="com">// this is going to help calculate our PRECISION ERROR MARGIN</span>
<span class="kwd">for</span> <span class="pun">(</span><span class="kwd">int</span><span class="pln"> i </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span><span class="pln"> i </span><span class="pun"><</span> <span class="lit">80</span><span class="pun">;</span> <span class="pun">++</span><span class="pln">i</span><span class="pun">)</span>
<span class="pun">{</span><span class="pln">
u64 tick1 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln">
u64 tick2 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln">
minDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Min</span><span class="pun">(</span><span class="pln">minDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span> <span class="com">// make many takes, take the smallest that ever come.</span><span class="pln">
maxDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Max</span><span class="pun">(</span><span class="pln">maxDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span>
<span class="pun">}</span><span class="pln">
g_correctionFactor </span><span class="pun">=</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln">
printf</span><span class="pun">(</span><span class="str">"Correction factor %llu clocks\n"</span><span class="pun">,</span><span class="pln"> g_correctionFactor</span><span class="pun">);</span><span class="pln">
g_accuracy </span><span class="pun">=</span><span class="pln"> maxDiff </span><span class="pun">-</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln">
printf</span><span class="pun">(</span><span class="str">"Measurement Accuracy (in clocks) : %llu\n"</span><span class="pun">,</span><span class="pln"> g_accuracy</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="com">#endif</span>
This is a discrepancy measurer, and it will take the minimum of all measured values, to avoid to get a -10**18 (64 bits first negatives values) from time to time.
Notice the use of intrinsics and not inline assembly. First inline assembly is rarely supported by compilers nowadays, but much worse of all, the compiler creates a full ordering barrier around inline assembly because it cannot static analyze the inside, so this is a problem to benchmark real world stuff, especially when calling stuff just once. So an intrinsic is suited here, because it doesn’t break the compiler free-re-ordering of instructions.
3) parameters
The last problem is people usually test for too few variations of the scenario. A container performance is affected by:
Allocator
size of contained type
cost of implementation of copy operation, assignment operation, move operation, construction operation, of the contained type.
number of elements in the container (size of the problem)
type has trivial 3.-operations
type is POD
Point 1 is important because containers do allocate from time to time, and it matters a lot if they allocate using the CRT “new” or some user defined operation, like pool allocation or freelist or other…
Point 2 is because some containers (say A) will loose time copying stuff around, and the bigger the type the bigger the overhead. The problem is that when comparing to another container B, A may win over B for small types, and loose for larger types.
Point 3 is the same than point 2, except it multiplies the the cost by some weighting factor.
Point 4 is a question of big O mixed with cache issues. Some bad complexities containers can largely outperform low complexity containers for small number of types (like
Point 5 is about compilers being able to ellude stuff that are empty or trivial at compile time. This can optimize greatly some operations, because the containers are template, therefore each type will have its own performance profile.
Point 6 same as point 5, PODS can benefit from the fact that copy construction is just a memcpy, and some containers can have a specific implementation for these cases, using partial template specializations, or SFINAE to select algorithms according to traits of T.
This is still an ordered container. Most people usually don’t need to ordering part, therefore the existence of
Have you considered that maybe you need a
The problem of open address hash maps, is that at the time of
The criterion of a rehash in an open address hash map is when the capacity overpasses the size of the bucket vector multiplied by the load factor.
A typical load factor is
The advantage of closed address maps (std::unordered..) is that you don’t have to care about those parameters.
But the
tested types information:
C++
<span class="kwd">typeid</span><span class="pun">=</span><span class="pln">__int64 </span><span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">8</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes
</span><span class="kwd">typeid</span><span class="pun">=</span><span class="kwd">struct</span> <span class="typ">MediumTypePod</span> <span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">184</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes</span>
Insertion
EDIT:
Ok, because my previous results included a bug, they actually tested ordered insertion, which exhibited a very fast behavior for the flat maps.
I left those results thereunder because they are interesting.
This is the correct test:
I have checked the implementation, there is no such thing as a deferred sort implemented in the flat maps here. Each insertion sorts on the fly, therefore this benchmark exhibits the asymptotic tendencies:
map : N * log(N)
hashmaps : amortized N
vector and flatmaps : N * N
Warning: hereafter the test for
We can see that ordered insertion, results in back pushing, and is extremely fast. However, from non-charted results of my benchmark, I can also say that this is not near the optimiality of back-insertion in a vector. Which is 3Million cycles, we observe 4.8M here for boost (160% of the optimal).
Random search of 3 elements (clocks renormalized to 1)
in size = 100
in size = 10000
Iteration
over size 100 (only MediumPod type)
over size 10000 (only MediumPod type)
best regards
have run a benchmark on different data structures very recently at my company so I feel I need to drop a word. It is very complicated to benchmark something correctly.
Benchmarking
On the web we rarely find (if ever) a well engineered benchmark. Until today I only found benchmarks that were done the journalist way (pretty quickly and sweeping dozens of variables under the carpet).1) You need to consider about cache warming
Most people running benchmarks are afraid of timer discrepancy, therefore they run their stuff thousands of times and take the whole time, they just are careful to take the same thousand of times for every operation, and then consider that comparable.
The truth is, in real world it makes little sense, because your cache will not be warm, and your operation will likely be called just once. Therefore you need to benchmark using RDTSC, and time stuff calling them once only. Intel has made a paper describing how to use RDTSC (using a cpuid instruction to flush the pipeline, and calling it at least 3 times at the beginning of the program to stabilize it).
2) rdtsc accuracy measure
I also recommend doing this:
C++
<span class="pln">u64 g_correctionFactor</span><span class="pun">;</span> <span class="com">// number of clocks to offset after each measurment to remove the overhead of the measurer itself.</span><span class="pln">
u64 g_accuracy</span><span class="pun">;</span>
<span class="kwd">static</span><span class="pln"> u64 </span><span class="kwd">const</span><span class="pln"> errormeasure </span><span class="pun">=</span> <span class="pun">~((</span><span class="pln">u64</span><span class="pun">)</span><span class="lit">0</span><span class="pun">);</span>
<span class="com">#ifdef</span><span class="pln"> _MSC_VER
</span><span class="com">#pragma</span><span class="pln"> intrinsic</span><span class="pun">(</span><span class="pln">__rdtsc</span><span class="pun">)</span>
<span class="kwd">inline</span><span class="pln"> u64 </span><span class="typ">GetRDTSC</span><span class="pun">()</span>
<span class="pun">{</span>
<span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// flush OOO instruction pipeline</span>
<span class="kwd">return</span><span class="pln"> __rdtsc</span><span class="pun">();</span>
<span class="pun">}</span>
<span class="kwd">inline</span> <span class="kwd">void</span> <span class="typ">WarmupRDTSC</span><span class="pun">()</span>
<span class="pun">{</span>
<span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// warmup cpuid.</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span><span class="pln">
__cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span>
<span class="com">// measure the measurer overhead with the measurer (crazy he..)</span><span class="pln">
u64 minDiff </span><span class="pun">=</span><span class="pln"> LLONG_MAX</span><span class="pun">;</span><span class="pln">
u64 maxDiff </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span> <span class="com">// this is going to help calculate our PRECISION ERROR MARGIN</span>
<span class="kwd">for</span> <span class="pun">(</span><span class="kwd">int</span><span class="pln"> i </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span><span class="pln"> i </span><span class="pun"><</span> <span class="lit">80</span><span class="pun">;</span> <span class="pun">++</span><span class="pln">i</span><span class="pun">)</span>
<span class="pun">{</span><span class="pln">
u64 tick1 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln">
u64 tick2 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln">
minDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Min</span><span class="pun">(</span><span class="pln">minDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span> <span class="com">// make many takes, take the smallest that ever come.</span><span class="pln">
maxDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Max</span><span class="pun">(</span><span class="pln">maxDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span>
<span class="pun">}</span><span class="pln">
g_correctionFactor </span><span class="pun">=</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln">
printf</span><span class="pun">(</span><span class="str">"Correction factor %llu clocks\n"</span><span class="pun">,</span><span class="pln"> g_correctionFactor</span><span class="pun">);</span><span class="pln">
g_accuracy </span><span class="pun">=</span><span class="pln"> maxDiff </span><span class="pun">-</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln">
printf</span><span class="pun">(</span><span class="str">"Measurement Accuracy (in clocks) : %llu\n"</span><span class="pun">,</span><span class="pln"> g_accuracy</span><span class="pun">);</span>
<span class="pun">}</span>
<span class="com">#endif</span>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | <span class="pln">u64 g_correctionFactor</span><span class="pun">;</span> <span class="com">// number of clocks to offset after each measurment to remove the overhead of the measurer itself.</span><span class="pln"> u64 g_accuracy</span><span class="pun">;</span> <span class="kwd">static</span><span class="pln"> u64 </span><span class="kwd">const</span><span class="pln"> errormeasure </span><span class="pun">=</span> <span class="pun">~((</span><span class="pln">u64</span><span class="pun">)</span><span class="lit">0</span><span class="pun">);</span> <span class="com">#ifdef</span><span class="pln"> _MSC_VER </span><span class="com">#pragma</span><span class="pln"> intrinsic</span><span class="pun">(</span><span class="pln">__rdtsc</span><span class="pun">)</span> <span class="kwd">inline</span><span class="pln"> u64 </span><span class="typ">GetRDTSC</span><span class="pun">()</span> <span class="pun">{</span> <span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln"> __cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// flush OOO instruction pipeline</span> <span class="kwd">return</span><span class="pln"> __rdtsc</span><span class="pun">();</span> <span class="pun">}</span> <span class="kwd">inline</span> <span class="kwd">void</span> <span class="typ">WarmupRDTSC</span><span class="pun">()</span> <span class="pun">{</span> <span class="kwd">int</span><span class="pln"> a</span><span class="pun">[</span><span class="lit">4</span><span class="pun">];</span><span class="pln"> __cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// warmup cpuid.</span><span class="pln"> __cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span><span class="pln"> __cpuid</span><span class="pun">(</span><span class="pln">a</span><span class="pun">,</span> <span class="lit">0x80000000</span><span class="pun">);</span> <span class="com">// measure the measurer overhead with the measurer (crazy he..)</span><span class="pln"> u64 minDiff </span><span class="pun">=</span><span class="pln"> LLONG_MAX</span><span class="pun">;</span><span class="pln"> u64 maxDiff </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span> <span class="com">// this is going to help calculate our PRECISION ERROR MARGIN</span> <span class="kwd">for</span> <span class="pun">(</span><span class="kwd">int</span><span class="pln"> i </span><span class="pun">=</span> <span class="lit">0</span><span class="pun">;</span><span class="pln"> i </span><span class="pun"><</span> <span class="lit">80</span><span class="pun">;</span> <span class="pun">++</span><span class="pln">i</span><span class="pun">)</span> <span class="pun">{</span><span class="pln"> u64 tick1 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln"> u64 tick2 </span><span class="pun">=</span> <span class="typ">GetRDTSC</span><span class="pun">();</span><span class="pln"> minDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Min</span><span class="pun">(</span><span class="pln">minDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span> <span class="com">// make many takes, take the smallest that ever come.</span><span class="pln"> maxDiff </span><span class="pun">=</span> <span class="typ">Aska</span><span class="pun">::</span><span class="typ">Max</span><span class="pun">(</span><span class="pln">maxDiff</span><span class="pun">,</span><span class="pln"> tick2 </span><span class="pun">-</span><span class="pln"> tick1</span><span class="pun">);</span> <span class="pun">}</span><span class="pln"> g_correctionFactor </span><span class="pun">=</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln"> printf</span><span class="pun">(</span><span class="str">"Correction factor %llu clocks\n"</span><span class="pun">,</span><span class="pln"> g_correctionFactor</span><span class="pun">);</span><span class="pln"> g_accuracy </span><span class="pun">=</span><span class="pln"> maxDiff </span><span class="pun">-</span><span class="pln"> minDiff</span><span class="pun">;</span><span class="pln"> printf</span><span class="pun">(</span><span class="str">"Measurement Accuracy (in clocks) : %llu\n"</span><span class="pun">,</span><span class="pln"> g_accuracy</span><span class="pun">);</span> <span class="pun">}</span> <span class="com">#endif</span> |
Notice the use of intrinsics and not inline assembly. First inline assembly is rarely supported by compilers nowadays, but much worse of all, the compiler creates a full ordering barrier around inline assembly because it cannot static analyze the inside, so this is a problem to benchmark real world stuff, especially when calling stuff just once. So an intrinsic is suited here, because it doesn’t break the compiler free-re-ordering of instructions.
3) parameters
The last problem is people usually test for too few variations of the scenario. A container performance is affected by:
Allocator
size of contained type
cost of implementation of copy operation, assignment operation, move operation, construction operation, of the contained type.
number of elements in the container (size of the problem)
type has trivial 3.-operations
type is POD
Point 1 is important because containers do allocate from time to time, and it matters a lot if they allocate using the CRT “new” or some user defined operation, like pool allocation or freelist or other…
Point 2 is because some containers (say A) will loose time copying stuff around, and the bigger the type the bigger the overhead. The problem is that when comparing to another container B, A may win over B for small types, and loose for larger types.
Point 3 is the same than point 2, except it multiplies the the cost by some weighting factor.
Point 4 is a question of big O mixed with cache issues. Some bad complexities containers can largely outperform low complexity containers for small number of types (like
mapvs.
vector, because their cache locality is good, but
mapfragments the memory). And then at some crossing point, they will lose, because the contained overall size starts to “leak” to main memory and cause cache misses, that plus the fact that the asymptoptic complexity can start to be felt.
Point 5 is about compilers being able to ellude stuff that are empty or trivial at compile time. This can optimize greatly some operations, because the containers are template, therefore each type will have its own performance profile.
Point 6 same as point 5, PODS can benefit from the fact that copy construction is just a memcpy, and some containers can have a specific implementation for these cases, using partial template specializations, or SFINAE to select algorithms according to traits of T.
About the flat map
Apparently the flat map is a sorted vector wrapper, like Loki AssocVector, but with some supplementary modernizations coming with C++11, exploiting move semantics to accelerate insert and delete of single elements.This is still an ordered container. Most people usually don’t need to ordering part, therefore the existence of
unordered...
Have you considered that maybe you need a
flat_unorderedmap? which would be something like
google::sparse_mapor something like that. An open address hash map.
The problem of open address hash maps, is that at the time of
rehashthey have to copy all around to the new extended flat land. When a standard unordered map just have to recreate the hash index, but the allocated data stays where it is. The disadvantage of course is that the memory is fragmented like hell.
The criterion of a rehash in an open address hash map is when the capacity overpasses the size of the bucket vector multiplied by the load factor.
A typical load factor is
0.8therefore, you need to care about that, if you can pre-size your hash map before filling it, always presize to:
intended_filling * (1/0.8) + epsilonthis will give you a guarantee of never having to spuriously rehash and recopy everything during filling.
The advantage of closed address maps (std::unordered..) is that you don’t have to care about those parameters.
But the
boost flat_mapis an ordered vector therefore it will always have a log(N) asymptoptic complexity, which is less good than the open address hash map (amortized constant time). You should consider that as well.
Benchmark results
This is a test involving different maps (with int key and __int64/somestruct as value) and std::vector.tested types information:
C++
<span class="kwd">typeid</span><span class="pun">=</span><span class="pln">__int64 </span><span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">8</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes
</span><span class="kwd">typeid</span><span class="pun">=</span><span class="kwd">struct</span> <span class="typ">MediumTypePod</span> <span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">184</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes</span>
1 2 | <span class="kwd">typeid</span><span class="pun">=</span><span class="pln">__int64 </span><span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">8</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes </span><span class="kwd">typeid</span><span class="pun">=</span><span class="kwd">struct</span> <span class="typ">MediumTypePod</span> <span class="pun">.</span> <span class="kwd">sizeof</span><span class="pun">=</span><span class="lit">184</span> <span class="pun">.</span><span class="pln"> ispod</span><span class="pun">=</span><span class="pln">yes</span> |
EDIT:
Ok, because my previous results included a bug, they actually tested ordered insertion, which exhibited a very fast behavior for the flat maps.
I left those results thereunder because they are interesting.
This is the correct test:
I have checked the implementation, there is no such thing as a deferred sort implemented in the flat maps here. Each insertion sorts on the fly, therefore this benchmark exhibits the asymptotic tendencies:
map : N * log(N)
hashmaps : amortized N
vector and flatmaps : N * N
Warning: hereafter the test for
std::mapand both
flat_maps here is buggy and actually testsordered insertion:
We can see that ordered insertion, results in back pushing, and is extremely fast. However, from non-charted results of my benchmark, I can also say that this is not near the optimiality of back-insertion in a vector. Which is 3Million cycles, we observe 4.8M here for boost (160% of the optimal).
Random search of 3 elements (clocks renormalized to 1)
in size = 100
in size = 10000
Iteration
over size 100 (only MediumPod type)
over size 10000 (only MediumPod type)
best regards
相关文章推荐
- boost相关小知识(长期顶置更新)
- 选定虚拟主机 性能凸显优势
- 修改一行代码提升 Postgres 性能 100 倍
- 推荐Sql server一些常见性能问题的解决方法
- SQL Server误区30日谈 第9天 数据库文件收缩不会影响性能
- 和表值函数连接引发的性能问题分析
- SQLServer 2000 升级到 SQLServer 2008 性能之需要注意的地方之一
- 数据库性能优化三:程序操作优化提升性能
- VBS中的字符串连接的性能问题
- mysql 性能的检查和调优方法
- 数据库性能优化二:数据库表优化提升性能
- 如何用分表存储来提高性能 推荐
- ASP中使用FileSystemObject时提高性能的方法
- 如何改进javascript代码的性能
- JavaScript脚本性能优化注意事项
- JQuery Tips(4) 一些关于提高JQuery性能的Tips
- jQuery性能优化28条建议你值得借鉴
- 十个迅速提升JQuery性能让你的JQuery跑得更快
- jquery选择器的选择使用及性能介绍
- 做好七件事帮你提升jQuery的性能