IDictionary Options - Performance Test - SortedList vs. SortedDictionary vs. Dictionary vs. Hashtabl
2012-06-21 10:53
246 查看
http://blog.bodurov.com/Performance-SortedList-SortedDictionary-Dictionary-Hashtable
Please note that the advantage of Hashtable over generic Dictionary for insert and search operations demonstrated here is actually because the tests are based on NON generic IDictionary interface, so each insert or search action is accompanied with check for the key type. For more information see the Sean's comment below. (Thanks Sean!) Without that Dictionary seem to perform better than the Hashtable. The rest of the test conclusions will not be affected. So to summarize, generic Dictionary is the absolute winner for insert and search operations if used with generic dictionary interface or directly. This is a sequence of tests comparing the performance results for four different implementations of IDictionary and in particular generic Dictionary, generic SortedDictionary, the old non-generic Hashtable and generic SortedList. I performed several tests, comparing the following parameters: memory used in bytes, time for the insertion in ticks, time for the item search in ticks, and the time for looping with foreach in ticks. The test was performed 8000 times for all the four implementation, and the order was random so each implementation has been tested at least 1000 times. I performed the tests in five stages, to observe the relationship between the number of entries and the performance. In the first stage the collections had 50 items, in the second 500, in the third 5,000 in the fourth 50,000 items. In this particular test, lower numbers of memory usage or time taken for the execution means better performance. So if we want to present visually a performance chart for each of the parameters we have to deduce a performance coefficient from the raw data. I have used the following code to calculate the performance coefficient: Performance Coefficient for value x = min(value 1, value 2, … value n) / value x This way, because the best performing value is the lowest one it will be transformed as value 1 of the performance coefficient and any other value will be a fraction of that value. This is the chart of the memory usage: The results stay consistent with the increase of the number of items in collections. Best memory footprint we see in the SortedList, followed by Hashtable, SortedDictionary and the Dictionary has highest memory usage. Despite all that, we have to note, that the differences are not significant and unless your solution requires extreme sensitivity about the memory usage you should consider the other two parameters: time taken for the insert operations and time taken for searching a key as more important. It is important to note that this test does not take into consideration the effects of garbage collection and thus can only be taken in a very general way. This is the chart for the time taken for insert operations: When the number of records is small the differences between all four implementations are not significant but with the increase of items in the collection the performance of the SortedList drops dramatically, SortedDictionary is better but still taking significantly more time for inserts than the other two implementations. Hashtable is the next in the list and the ultimate leader is the generic Dictionary. This is the chart for the time taken for search operations: The absolute leader is Hashtable, but the test does not consider the type of item being stored. That could be a possibility for a future test. The next best performer is the generic Dictionary followed by the other two implementations. The differences here between SortedList and SortedDictionary are not significant. This is the chart for the time taken for for-each collection loop operations: Here the leader is SortedList then Dictionary consistantly better than Hashtable and the words performer isSortedDictionary Here you can see the task manager during the test it shows us a picture of the memory usage. Area A is during the insertion phase when more and more memory has been allocating. Area B is from the end of the insertion until the garbage collection of the object. This is the code I used for this test. The variable NumberInsertedKeys was changed to 50, 500, 5000, 50000, or 10000 for the different stages. using System; using System.Collections; using System.Collections.Generic; using System.Diagnostics; /* to compile enter in the command prompt: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\csc IDictTest.cs to run enter in the command prompt: IDictTest */ namespace IDictTest{ public class RunResult{ public decimal MemoryUsed; public decimal InsertTicks; public decimal SearchTicks; public decimal ForEachTicks; } public class Program { private static int SearchIndex = 27; //private const int NumberInsertedKeys = 50; //private const int NumberInsertedKeys = 500; //private const int NumberInsertedKeys = 5000; private const int NumberInsertedKeys = 50000; //private const int NumberInsertedKeys = 100000; private const int NumberTests = 8000; private static readonly string[] Letters = {"A","B","C","D","E","F","G","H","I","J"}; public static void Main(string[] args) { try{ // TRY STARTS HERE ---------- List<RunResult> listDictionary = new List<RunResult>(); List<RunResult> listSortedDictionary = new List<RunResult>(); List<RunResult> listHashtable = new List<RunResult>(); List<RunResult> listSorderList = new List<RunResult>(); Stopwatch watch = Stopwatch.StartNew(); for(int i = 0; i < NumberTests; i++){ SearchIndex += 1; Random rand = new Random(); int randInt = rand.Next(0, 4); if(randInt == 0){ listDictionary.Add( Test("Dictionary", new Dictionary<string, string>(), i)); }else if(randInt == 1){ listSortedDictionary.Add( Test("SortedDictionary", new SortedDictionary<string, string>(), i)); }else if(randInt == 2){ listHashtable.Add( Test("Hashtable", new Hashtable(), i)); }else if(randInt == 3){ listSorderList.Add( Test("SortedList", new SortedList(), i)); } } Console.Clear(); Msg("Time taken (minutes): {0} or about {1} minutes and {2} seconds", watch.Elapsed.TotalMinutes, watch.Elapsed.Minutes, watch.Elapsed.Seconds); RunResult resultDict = CalculateAvg(listDictionary); RunResult resultSortDict = CalculateAvg(listSortedDictionary); RunResult resultHash = CalculateAvg(listHashtable); RunResult resultSortList = CalculateAvg(listSorderList); RunResult min = new RunResult{ MemoryUsed = Math.Min(Math.Min(Math.Min(resultDict.MemoryUsed, resultSortDict.MemoryUsed),resultHash.MemoryUsed),resultSortList.MemoryUsed), InsertTicks = Math.Min(Math.Min(Math.Min(resultDict.InsertTicks, resultSortDict.InsertTicks), resultHash.InsertTicks), resultSortList.InsertTicks), SearchTicks = Math.Min(Math.Min(Math.Min(resultDict.SearchTicks, resultSortDict.SearchTicks), resultHash.SearchTicks), resultSortList.SearchTicks), ForEachTicks = Math.Min(Math.Min(Math.Min(resultDict.ForEachTicks, resultSortDict.ForEachTicks), resultHash.ForEachTicks), resultSortList.ForEachTicks) }; // print the results PrintResults(resultDict, listDictionary.Count, min, "Dictionary"); PrintResults(resultSortDict, listDictionary.Count, min, "SortedDictionary"); PrintResults(resultHash, listDictionary.Count, min, "Hashtable"); PrintResults(resultSortList, listDictionary.Count, min, "SortedList"); // TRY ENDS HERE ---------- }catch(Exception ex){ Msg("{0}", ex); } } private static RunResult CalculateAvg(List<RunResult> list){ decimal sumMemory = 0; decimal sumInsert = 0; decimal sumSearch = 0; decimal sumForEach = 0; for(int i = 0; i < list.Count; i++){ RunResult curr = list[i]; sumMemory += curr.MemoryUsed; sumInsert += curr.InsertTicks; sumSearch += curr.SearchTicks; sumForEach += curr.ForEachTicks; // uncomment to print each line //Msg("{0,11} {1,13} {2,14}", //curr.MemoryUsed, curr.InsertTicks, curr.SearchTicks); } return new RunResult{ MemoryUsed = sumMemory / list.Count, InsertTicks = sumInsert / list.Count, SearchTicks = sumSearch / list.Count, ForEachTicks = sumForEach / list.Count, }; } private static void PrintResults(RunResult result, int count, RunResult min, string name){ Msg("--------- Results for {0}", name); Msg("# Tests {0}", count); Msg("Memory Used Insert Ticks Search Ticks ForEach Ticks"); Msg("Average Values:"); Msg("{0,11:N} {1,13:N} {2,14:N} {3,14:N}", result.MemoryUsed, result.InsertTicks, result.SearchTicks, result.ForEachTicks); Msg("Performance Coefficient:"); Msg("{0,11:N} {1,13:N} {2,14:N} {3,14:N}", min.MemoryUsed/result.MemoryUsed, min.InsertTicks/result.InsertTicks, min.SearchTicks/result.SearchTicks, min.ForEachTicks/result.ForEachTicks); Msg(""); } private static void Msg(string name, params object[] args){ Console.WriteLine(name, args); } private static RunResult Test(string name, IDictionary dict, int n){ Console.Clear(); Msg("Currently executing test {1} of {2} for {0} object", name, n + 1, NumberTests); RunResult rr = new RunResult(); Stopwatch watch; Random rand = new Random( ); long memoryStart = System.GC.GetTotalMemory(true); long insertTicksSum = 0; for(int i = 0; i < NumberInsertedKeys; i++){ string key = GetRandomLetter(rand, i)+"_key"+i; string value = "value"+i; watch = Stopwatch.StartNew(); dict.Add(key, value); watch.Stop(); insertTicksSum += watch.ElapsedTicks; } rr.MemoryUsed = System.GC.GetTotalMemory(true) - memoryStart; rr.InsertTicks = insertTicksSum; watch = Stopwatch.StartNew(); object searchResult = dict["C_key"+SearchIndex]; watch.Stop(); rr.SearchTicks = watch.ElapsedTicks; watch = Stopwatch.StartNew(); foreach(var curr in dict){} watch.Stop(); rr.ForEachTicks = watch.ElapsedTicks; return rr; } private static string GetRandomLetter(Random rand, int i){ if(i == SearchIndex){ return "C"; } return Letters[rand.Next(0, 10)]; } } } The computer used for the test has the following characteristics: Processor: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz 2.GHz Memory (RAM): 2.00 GB System type: 32-bit Operating System Windows Vista These is the raw data. Memory Used is in bytes and the insert, search and looping time in ticks: Time taken (minutes): 0.324318411666667 or about 0 minutes and 19 seconds --------- Results for Dictionary # Tests 2063 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 4,748.48 3,019.60 126.40 198.21 Performance Coefficient: 0.73 0.92 0.83 0.80 --------- Results for SortedDictionary # Tests 2063 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 4,345.04 5,000.54 193.05 377.99 Performance Coefficient: 0.80 0.55 0.55 0.42 --------- Results for Hashtable # Tests 2063 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 4,096.38 2,775.27 105.36 209.91 Performance Coefficient: 0.85 1.00 1.00 0.75 --------- Results for SortedList # Tests 2063 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 3,488.50 4,350.48 167.89 158.41 Performance Coefficient: 1.00 0.64 0.63 1.00 8000 tests 500 collection entries Time taken (minutes): 1.26410570333333 or about 1 minutes and 15 seconds --------- Results for Dictionary # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 53,353.52 28,345.86 123.94 1,051.16 Performance Coefficient: 0.73 0.96 0.92 0.71 --------- Results for SortedDictionary # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 48,949.40 67,007.68 232.22 2,562.93 Performance Coefficient: 0.80 0.41 0.49 0.29 --------- Results for Hashtable # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 48,053.46 27,341.10 114.07 1,303.74 Performance Coefficient: 0.81 1.00 1.00 0.57 --------- Results for SortedList # Tests 2158 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 39,077.27 54,255.06 209.31 744.81 Performance Coefficient: 1.00 0.50 0.54 1.00 8000 tests 5000 collection entries Time taken (minutes): 11.14550681 or about 11 minutes and 8 seconds --------- Results for Dictionary # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 527,352.00 274,339.83 171.81 9,724.85 Performance Coefficient: 0.80 0.98 0.85 0.72 --------- Results for SortedDictionary # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 498,944.56 853,251.69 288.00 22,777.03 Performance Coefficient: 0.85 0.32 0.50 0.31 --------- Results for Hashtable # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 480,048.17 268,945.01 145.19 11,911.69 Performance Coefficient: 0.88 1.00 1.00 0.59 --------- Results for SortedList # Tests 1578 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 424,512.12 913,408.34 247.98 7,011.65 Performance Coefficient: 1.00 0.29 0.59 1.00 8000 tests 50000 collection entries Time taken (minutes): 299.723793791667 or about 4 hours, 59 minutes and 43 seconds --------- Results for Dictionary # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,427,604.19 2,818,130.50 302.05 100,556.45 Performance Coefficient: 0.82 1.00 0.87 0.73 --------- Results for SortedDictionary # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,318,944.00 10,506,892.94 525.14 294,219.13 Performance Coefficient: 0.84 0.27 0.50 0.25 --------- Results for Hashtable # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 5,005,100.00 2,867,786.62 263.39 111,733.71 Performance Coefficient: 0.89 0.98 1.00 0.66 --------- Results for SortedList # Tests 1574 Memory Used Insert Ticks Search Ticks ForEach Ticks Average Values: 4,443,275.91 82,146,134.48 493.06 73,798.76 Performance Coefficient: 1.00 0.03 0.53 1.00 Share this post: digg Stumble Upon del.icio.us Technorati |
feedback temporarily disabled
Moritz Kroll Posted on 1/3/2008 11:45:36 AM Thanks for these interesting measurements! I was exactly looking for this kind of information. People considering which collection to use, might also be interested in the MSDN page "SortedList and SortedDictionary Collection Types" (http://msdn2.microsoft.com/en-us/library/5z658b67.aspx), which tells us the upper bounds of some operations on these collections in big O notation. |
Vinit Posted on 4/25/2008 1:17:46 AM Thanks for the above test! I was also looking for this option. I guess i will go with HashTable, as it looks the fastest of the lot |
Flavius Posted on 6/11/2008 5:18:08 AM Thanks! I just started to think about what to use and then found your artikle. It's good! |
Flavius Posted on 6/11/2008 5:55:46 AM You forgot to add measure units (bytes) for memory allocation chart. |
Vladimir Bodurov Posted on 6/11/2008 8:15:29 AM Thanks Flavius, I've added that now. |
Gary Chojnowski Posted on 8/12/2008 8:37:12 PM I found your results useful. However, it may be important to point out that the hashtable does not support generics. |
Tanveer Ansari Posted on 8/25/2008 6:23:02 AM Thanks for an article with immediate practical use. It helped in finalizing on which data structure to use for my app. As Gary has pointed out above, the generic Dictionary has the advantage of type safety so it would seem that we should also factor in the unboxing overhead for the Hashtable's value object. In my test for retrieval time using the hashtable with unboxing vs. generic dictionary the hashtable still performed three times as fast as the generic dictionary. --Time in ticks-- Time taken to fetch all elements of Dictionary<> was 10559 Time taken to fetch all elements of hashtable without unboxing was 1431 Time taken to fetch all elements of hashtable WITH unboxing was 2846 |
JR Posted on 2/15/2009 1:27:20 PM This article was awesome! Thanks for sharing your findings with us. It was very helpful to the code I am writing. One thing that would have been nice to see would have been insertion time for the sorted collections in the case that the strings were already sorted, as that happens sometimes. Thanks! |
Jeffery Posted on 4/17/2009 12:28:42 AM Hi, I just did the same test but using ASP.NET 3.5 and IIS 7.0, the results seems to indicate otherwise: Dictionary is better than SortedDictionary! Take a look at http://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/ for specifics |
Vladimir Bodurov Posted on 4/17/2009 12:47:25 AM Hi Jeffery, First of all thank you for the excellent work you have done! I like the attitude to never trust any source in the Internet and always check your self. Undoubtedly that’s the way to go. However there is one problem that I see with your test. You always process SortedDictionary before Dictionary. If you read the description of my tests I have tried those in different order and what is more important each was performed separately. When you start performing operations on the SortedDictionary you initiate actions of the virtual machine related to large memory allocation and then when you reach the Dictionary the hard work is already done. I would encourage trying the same experiment in different order. But it is true that my test was with .NET 2.0 so it may have changed in 3.5. |
ET Posted on 4/23/2009 1:00:30 PM I think you're missing an important variable/dimention in your tests - number of items in the collection. |
Vladimir Bodurov Posted on 4/23/2009 1:14:18 PM Yes you are right in the comming days I will make a revision of the test as it seems that as pointed by Jeffery tests done with bigger sample show Dictionary performing better than SortedDictionary |
Jeffery Posted on 4/23/2009 8:16:50 PM Hi Vladimir, Actually based on your results, i actually got my team to convert all Dictionary objects to SortedDictionary when they are doing their development. Because i remembered that dictionary does not maintain any sort order, and SortedDictionary, which uses a red-black tree, improves searching speed. However during when i was doing some coding, i realized that for some wierd reason, my new codes seem to be performing much slower, that's when i decided to take our code for a test run :) |
Vladimir Bodurov Posted on 4/23/2009 9:43:43 PM Sorry man, my data is just not big enough and when I was doing it something influenced SortedDictionary to perform better than Dictionary and because of the small sample that wasn’t normalized. I'll have soon results with very large sample, but my preliminary data shows that you seem to be right and Dictionary performs better than SortedDictionary. |
Jeffery Posted on 4/29/2009 2:32:08 PM Hi Valdimir, Thanks for your extremely detailed test which tallies with mine. Now it just makes me wonder. Since Dictionary is better than SortedDictionary, why is it still there? Any ideas? |
Vladimir Bodurov Posted on 4/29/2009 3:20:00 PM Because if you try foreach iteration with SortedDictionary the keys will be sorted while Dictionary will not give you sorted keys. |
Jeffery Posted on 4/30/2009 7:01:55 PM Well if i really did a foreach, that is a possibility, but it does not make sense for SortedDictionary to perform a sort on the key lists each and everytime we requests for it, i do not believe Microsoft developers will write such lousy codes. Interesting thing is that i cannot seem to be able to find the source for Dictionary using reflector, any idea where it is residing in? Will try to perform a test next week to test the differences in doing foreach and direct indexing for Dictionary and SortedDictionary, there must be some benefits of SortedDictionary |
Vladimir Bodurov Posted on 4/30/2009 10:37:13 PM SortedDictionary does not peform sort each time you request a key it simply stores the data as a binary tree, that's why it can iterate them in order. If you had to show the keys in order how would you do it with Dictionary? If the keys are not just 1,2,3,4 but something like 2,24,255,999 and you don't know what are the gaps? With SortedDictionary you just use foreach. The code of Dictionary is in mscorlib System.Collections.Generic |
Nugpot Posted on 5/5/2009 4:10:36 PM Your pictures looks nice, and it looks like you tried your best, but your results are definitely not accurate. I did similar tests today, and went on the web to see what other people found. Just to help everybody I am going to add my results. This is for adding 2000000 items to each list, timing how long it will take to load the data, search 3 values in the data (one at the end, one at the beginning and one that don't exist). Then seeing how long it take to unload the data. Here is the results. Not your pretty pictures, but at least values that make sense. 5/5/2009 2:22:39 PM - Array 5/5/2009 2:22:40 PM - End Adding data: 00:00:00.7499952 5/5/2009 2:22:40 PM - Search complete: 00:00:00.0312498 5/5/2009 2:22:40 PM - Data Cleared: 00:00:00.0312498 5/5/2009 2:22:40 PM - Total time: 00:00:00.8124948 5/5/2009 2:22:40 PM - ArrayList 5/5/2009 2:22:41 PM - End Adding data: 00:00:00.7343703 5/5/2009 2:22:41 PM - Search complete: 00:00:00.0312498 5/5/2009 2:22:41 PM - Data Cleared: 00:00:00.0312498 5/5/2009 2:22:41 PM - Total time: 00:00:00.7968699 5/5/2009 2:22:41 PM - List<> 5/5/2009 2:22:42 PM - End Adding data: 00:00:00.7343703 5/5/2009 2:22:42 PM - Search complete: 00:00:00.0624996 5/5/2009 2:22:42 PM - Data Cleared: 00:00:00.0312498 5/5/2009 2:22:42 PM - Total time: 00:00:00.8281197 5/5/2009 2:22:42 PM - HashTable 5/5/2009 2:22:44 PM - End Adding data: 00:00:02.2812354 5/5/2009 2:22:44 PM - Search complete: 00:00:00 5/5/2009 2:22:44 PM - Data Cleared: 00:00:00.0468747 5/5/2009 2:22:44 PM - Total time: 00:00:02.3281101 5/5/2009 2:22:44 PM - Dictionary<> 5/5/2009 2:22:46 PM - End Adding data: 00:00:01.4218659 5/5/2009 2:22:46 PM - Search complete: 00:00:00 5/5/2009 2:22:46 PM - Data Cleared: 00:00:00.0312498 5/5/2009 2:22:46 PM - Total time: 00:00:01.4687406 5/5/2009 2:22:46 PM - SortedList<> 5/5/2009 3:57:03 PM - End Adding data: 01:34:17.1991614 5/5/2009 3:57:03 PM - Search complete: 00:00:00.1562490 5/5/2009 3:57:05 PM - Data Cleared: 00:00:01.7343639 5/5/2009 3:57:05 PM - Total time: 01:34:19.1678988 5/5/2009 3:57:05 PM - SortedDictionary<> 5/5/2009 3:57:16 PM - End Adding data: 00:00:11.4061770 5/5/2009 3:57:16 PM - Search complete: 00:00:00 5/5/2009 3:57:16 PM - Data Cleared: 00:00:00.0156249 5/5/2009 3:57:16 PM - Total time: 00:00:11.4218019 |
Vladimir Bodurov Posted on 5/5/2009 4:34:43 PM Thanks for posting your results Nugpot, but I really don't understand what do you mean when you say "but your results are definitely not accurate", because your results are just like my results, in particular: Insert Performance: 1. Dictionary - best 2. HashTable - next 3. SortedDictionary - next 4. SortedList - worst Just like mine test. And you search performance is: 1. HashTable, Dictionary, SortedDictionary - you don't have precise measurement 2. SortedList Again fits to my test. If you have different data the numbers will be different but the order will be the same. It is also not clear what you mean by "unload data" as you didn't post reference to where your code is. Your insert and search results are just like mine I can't speak for the rest because I haven't seen it but if you compare things they must be similar not apples with screw drivers. |
Shital Shah Posted on 10/13/2009 1:15:41 PM What was the hardware used for these tests? |
Sean Posted on 1/5/2010 1:52:21 PM FYI your average calculation is wrong. Averaging 2,3,4 would come out as ((((2 + 3) / 2) + 4) / 2) or 3.25, instead of 3. Also you should time how long it takes to execute all the lookups at once and take the average off of that instead of timing each individual one. I ran some tests designed purely to test performance of lookups, no calling of GC(as .NET should pick when for you in the real world), although i did semi-track inserts. Test was inserting X keys, then doing x*2 lookups (x where the key existed x where it didn't) I let each of those tests of X keys run 10-15 times, and I came up with Key/Value Keys Lookup Insert string/string 20,000 dictionary is roughly 10% faster Dictionary is roughly 30% faster int/string 20,000 dictionary is roughly 50% faster Roughly equal string/int 20,000 dictionary is roughly 10% faster Dictionary is roughly 30% faster int/string 200,000 Dictionary is roughly 50% faster Dictionary is roughly 30% faster int/string 2,000 Dictionary is roughly 50% faster Roughly equal int/string 200 Dictionary is roughly 50% faster Roughly equal string/string 200 Dictionary is roughly 10% faster Roughly equal String/string 2,000 Dictionary is roughly 10% faster Roughly equal string/string 20,000 Dictionary is roughly 10% faster Hashtable is roughly 15% faster string/string 200,000 Dictionary edged it out by about Hashtable is roughly 15% faster 3% each time |
Vladimir Bodurov Posted on 1/6/2010 4:32:33 AM Thanks for sharing your results Sean. You are right about the avg formula. The way I was doing it would shift the result toward the latest run but at the end the result will still average out. Still I corrected my code to make it more clean and I also extended my test including foreach loop test to show why you are getting consistently better results for Dictionary compared to Hashtable. The thing is that you are not isolating the looping performance for the rest and because of that, as you can see in my latest run, Dictionary is better than Hashtable for looping you get a better result. All other tests are consistent with my previous results and the old one can be seen at : http://blog.bodurov.com/images/IDictTest01_old.gif http://blog.bodurov.com/images/IDictTest02_old.gif http://blog.bodurov.com/images/IDictTest03_old.gif |
Jeffery Posted on 1/7/2010 5:56:08 PM Hi Valdimir, appreciate if you can change the link to my post tohttp://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/ |
Sean Posted on 1/15/2010 12:58:05 PM I made a few slight changes to your code and came up with different results With string keys dictionary performed searches roughly 10% faster with all combinations of keys I tried. With int keys dictionary performed searches roughly 40% faster with all combinations of keys I tried. 1: I created another test method called TestGeneric which accepts IDictionary<String, String>. If you look in reflector, calling Dictionary as an IDictionary reference results in validation code being called at runtime to ensure the objects passed in are actually TKey(in this case string), as opposed to at compile time when its generic (below is a sample of what it is actually doing). This appears provided the biggest performance increase. void IDictionary.Add(object key, object value) { Dictionary<TKey, TValue>.VerifyKey(key); Dictionary<TKey, TValue>.VerifyValueType(value); this.Add((TKey) key, (TValue) value); } 2: I changed the key to C_key{i} instead of {random_letter}_key{i} (to make #3 possible) 3: I changed your search from a single lookup to NumberInsertedKeys lookups, and just checking for each value that had inserted as the key (so not using a foreach loop to eliminate that possible conflict) watch = Stopwatch.StartNew(); for (int i = 0; i < NumberInsertedKeys; i++) { string key = "C_key" + i.ToString(); string searchResult = dict[key]; } watch.Stop(); 4: I commented out the randomness to determine which Also when I commented out the garbage collect code to test memory usage Dictionary tests seemed to increase more (I personally prefer to test without garbage collection since in production my code will rely on .NET to select when to run it for me anyways). But that test does appear to be impacting the results of the other tests. (Disclaimer: I only ran it with string keys and 50000 entries) I switched it back to using random letters and looking up every entry and got roughly the same results, I had to make more changes to do that though by caching the random entries and reusing them. Random rand = new Random(); for (int i = 0; i < NumberInsertedKeys; i++) { Keys[i] = GetRandomLetter(rand, i) + "_key" + i; Values[i] = "value" + i; } ........................ for (int i = 0; i < NumberSearches; i++) { string key = Keys[i];// "C_key" + i.ToString();// GetRandomLetter(rand, i) + "_key" + i; string value = Values[i];// "value" + i; ......... } |
Eric Posted on 3/3/2010 8:10:05 AM Thank you this was really very helpful. Many of the constructors for these collections support an initial capacity. While I understand that it is not always possible to predict this capacity, I do quite often make a range estimate. If I recall, the default initial capacity is quite low. This would result in frequent extensions of the List style collections. The dictionary style collections would expand more slowly. It might be interesting to examine some effects of initial capacity. Thanks again for your previous work. |
Raj Aththanayake Posted on 4/10/2010 5:13:28 PM Excellent blog. I was looking for these type of information. I was comparing the SortedDictionary and SortedList. Now I know SortedList is much better on performances. |
Joshua DeLong Posted on 5/26/2010 8:07:56 AM Has anyone tried this type of thing with millions? I'm looking at putting roughly 30 million integer pairs into a hashtable for lookup purposes only. Any ideas which storage method would be best? |
phoog Posted on 11/25/2010 3:54:20 AM Another factor: string.GetHashCode() iterates through the entire string. So presumably, the longer the keys, the worse performance you'll find from Dictionary. Another point about SortedList: If the items are added in order, its insertion performance is better than SortedDictionary, because SortedDictionary rebalances the tree, and SortedList just adds the item on the end without reshuffling anything. |
Scanpst Posted on 4/13/2011 6:02:20 AM Sean, Thanks for the change in the code. I tried it and it worked great for me. It was very helpful. Joshua, I have a project where I need to do this with millions also. Did you ever figure out what storage method worked best? Any tips would be great! |
相关文章推荐
- .net学习笔记----有序集合SortedList、SortedList<TKey,TValue>、SortedDictionary<TKey,TValue>
- SortedDictionaryGeneric and SortedListGeneric
- 常用数据结构及复杂度 array、LinkedList、List、Stack、Queue、Dictionary、SortedDictionary、HashSet、SortedSet
- .NET源码中的SortedDictionary<TKey, TValue>和SortedList<TKey, TValue>
- C#集合类详解,Hashtable 和 SortedList 类,Dictionary 和 SortedLi,Array ,ArrayList ,List ,Stack,SortedDictionar
- CSharp - Collections(List, Dictionary) vs Generic collections(ArrayList, HashTable)
- Dictionary vs List 效率
- [C#]泛型与非泛型集合类的区别及使用例程,包括ArrayList,Hashtable,List<T>,Dictionary<Tkey,Tvalue>,SortedList<Tkey,Tvalue>,Queue<T>,Stack<T>等
- [C#]泛型与非泛型集合类的区别及使用例程,包括ArrayList,Hashtable,List<T>,Dictionary<Tkey,Tvalue>,SortedList<Tkey,Tvalue>,
- SortedList<TKey,TValue> 和 SortedDictionary<TKey,TValue>
- 泛型与非泛型集合类的区别及使用例程,包括ArrayList,Hashtable,List<T>,Dictionary<Tkey,Tvalue>,SortedList<Tkey,Tvalue>,Queu
- Dictionary Queue Stack SortedList ArrayList
- C#集合类详解,Hashtable、Dictionary、SortedList、SortedLi、Array、ArrayList、List、Stack、SortedDictionar
- Convert Sorted List to Binary Search Tree
- 【LeetCode】Convert Sorted Array to Binary Search Tree && Convert Sorted List to Binary Search Tree
- [LeetCode]Remove Duplicates from Sorted List
- convert-sorted-list-to-binary-search-tree
- LeetCode OJ:Convert Sorted List to Binary Search Tree(将排序好的链表转换成二叉搜索树)
- leetcode之Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II