您的位置：首页 > 其它

list 、set 、map 粗浅性能对比分析

2013-07-10 09:41 639 查看

分析

分类： j2se2013-07-08
00:45 732人阅读评论(0) 收藏举报

不知道有多少同学和我一样，工作五年了还没有仔细看过list、set的源码

，一直停留在老师教导的：“LinkedList插入性能比ArrayList好，LinkedList顺序遍历性能比ArrayList好”的世界里。可是真是如此么？本文很“肤浅”的对比和分析了几种常用的集合，“高手”可以就此打住不必往下阅读了。。。

本文分开介绍了List、Map、Set：

（测试环境：win7、jdk、4G、i3；文章示例为了节省篇幅，只会列出测试大体形式和遍历次数）

第一部分：List

1.add(E e)

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

[java] view
plaincopy

for (int i = 0; i < 10000; i++) {

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 10000; j++) {

list.add(j);

}

list = null;

}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	3420
new ArrayList<Integer>()	3048
new ArrayList<Integer>(10000)	2280
new Vector<Integer>()	4045
new Vector<Integer>(10000)	3859

（2）我们变化一下集合的长度和遍历次数：

[java] view
plaincopy

for (int i = 0; i < 1000000; i++) {

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 100; j++) {

list.add(j);

}

list = null;

}

new方式	消耗时间（毫秒）
new LinkedList<Integer>()	2286
new ArrayList<Integer>()	2832
new ArrayList<Integer>(10000)	1714
new Vector<Integer>()	3937
new Vector<Integer>(10000)	3184

*************************************************源码分析*************************************************

似乎并没有看到期待的差异，若非“冒充下大牛”纠结一下区别：

1、若集合长度比较小（小于百位），LinkedList比ArrayList的插入性能稍胜一筹。

2、但是当长度达到五位数时，ArrayList的插入性能反而会比LinkedList好一些。

3、若你能预知ArrayList的长度，可以完胜LinkedList。（原因后面会说）

4、Vector的作用不在一个纬度上，不过也没有想象中的那么差。

LinkedList.add:

[java] view
plaincopy

先调用了自身的add：

public boolean add(E e) {

addBefore(e, header);

return true;

}

然后又调用了addBefore，表示插入在谁之前（有点绕，因为是双项链）：

private Entry<E> addBefore(E e, Entry<E> entry) {

Entry<E> newEntry = new Entry<E>(e, entry, entry.previous);

newEntry.previous.next = newEntry;

newEntry.next.previous = newEntry;

size++;

modCount++;

return newEntry;

}

用于维护指定前后关系：

private static class Entry<E> {

E element;

Entry<E> next;

Entry<E> previous;

Entry(E element, Entry<E> next, Entry<E> previous) {

this.element = element;

this.next = next;

this.previous = previous;

}

}

也就是说LinkedList的上下级维护关系是借助了指针，若任意位置插入，则是先查找到before前的entry，然后重指原有位置前后元素的指针。

ArrayList.add：

首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）

[java] view
plaincopy

相比LinkedList来说就简单了狠多：

public boolean add(E e) {

ensureCapacity(size + 1); // Increments modCount!!

elementData[size++] = e;

return true;

}

//首先判断下是否超过了最大长度数组长度，若超过会重新扩展拷贝（这也就说明了，为什么当我们在newArrayList时指定合适的长度会提升性能）

public void ensureCapacity(int minCapacity) {

modCount++;

int oldCapacity = elementData.length;

if (minCapacity > oldCapacity) {

Object oldData[] = elementData;

int newCapacity = (oldCapacity * 3)/2 + 1;

if (newCapacity < minCapacity)

newCapacity = minCapacity;

// minCapacity is usually close to size, so this is a win:

elementData = Arrays.copyOf(elementData, newCapacity);

}

}

//默认长度10

public ArrayList() {

this(10);

}

通过上面的源码可以理解出为什么当集合的长度会左右LinkedList和ArrayList的插入性能PK。影响ArrayList性能损耗的一大元凶就是数组的不断增长拷贝，LinkedList由于自身的特性当数据插入时需要借助特殊的内部类“Entry”来维护插入数据的上下级链。

2.add(int index, E element)

通过上面的分析出现了一个想法，如果是

随机插入位置呢？

==============================性能测试=================================

(1)

[java] view
plaincopy

for (int i = 0; i < 100; i++) {

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 10000; j++) {

list.add(list.size() / 2, j);

}

list = null;

}

new方式	消耗时间
new LinkedList<Integer>()	15782
new ArrayList<Integer>()	3513
new ArrayList<Integer>(10000)	3490

(2)我们变化一下集合的长度和遍历次数：

[java] view
plaincopy

for (int i = 0; i < 100000; i++) {

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 100; j++) {

list.add(list.size() / 2, j);

}

list = null;

}

new方式	消耗时间
new LinkedList<Integer>()	880
new ArrayList<Integer>()	1262
new ArrayList<Integer>(10000)	2308

*************************************************源码分析*************************************************

从纯插入性能上来说，随机插入LikedList要比ArrayList性能好：（至于为什么当LinkedList集合长度为10000时性能变化会如此之大，我们后面的get会有介绍）

LikedList.add(int index, E element) :

大体分为两个步骤

[java] view
plaincopy

public void add(int index, E element) {

addBefore(element, (index==size ? header : entry(index)));

}

//第一步获取到addBefore需要的插入位置（entry），第二步如上

private Entry<E> entry(int index) {

if (index < 0 || index >= size)

throw new IndexOutOfBoundsException("Index: "+index+

", Size: "+size);

Entry<E> e = header;

if (index < (size >> 1)) {

for (int i = 0; i <= index; i++)

e = e.next;

} else {

for (int i = size; i > index; i--)

e = e.previous;

}

return e;

}

所以从代码上看，LikedList.add(int index, E element) 没有理由在集合长度为10000时会有这么大的差异，除非entry(int index)存在问题（我们后面可能要说的hash也正是为了避免遍历）。

ArrayList.add(int index, E element) :

大体分为三步

第二步暴漏了为什么现在初始指定了集合长度反而不一定是好事。

[java] view
plaincopy

public void add(int index, E element) {

if (index > size || index < 0)

throw new IndexOutOfBoundsException(

"Index: "+index+", Size: "+size);

//第一步：扩充长度

ensureCapacity(size+1); // Increments modCount!!

//第二步：挪动数据（这一步决定了为什么现在初始指定了集合长度反而不一定是好事）

System.arraycopy(elementData, index, elementData, index + 1,

size - index);

//第三步：插入

elementData[index] = element;

size++;

}

3.get(int index)

==============================性能测试=================================

[java] view
plaincopy

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 10000; j++) {

list.add(j);

}

for (int i = 0; i < 100; i++) {

for (int j = 0; j < 10000; j++) {

list.get(j);

}

}

new方式	消耗时间
new LinkedList<Integer>()	8349
new ArrayList<Integer>(10000)	15

*************************************************源码分析*************************************************

通过这个表首先能够知道如果直接或间接（如模板语言）调用for循环遍历，LikedList的性能要差很多。

LinkedList.get(int index)：

[java] view
plaincopy

public E get(int index) {

return entry(index).element;

}

//性能损耗在这里，每次都需要线性搜索（遍历链定位位置）

private Entry<E> entry(int index) {

if (index < 0 || index >= size)

throw new IndexOutOfBoundsException("Index: "+index+

", Size: "+size);

Entry<E> e = header;

if (index < (size >> 1)) {

for (int i = 0; i <= index; i++)

e = e.next;

} else {

for (int i = size; i > index; i--)

e = e.previous;

}

return e;

}

ArrayList.get(int index)：

[java] view
plaincopy

public E get(int index) {

//检查是否越界

RangeCheck(index);

//直接从数组中取出对应index的数据

return (E) elementData[index];

}

4.迭代器

==============================性能测试=================================

[java] view
plaincopy

List<Integer> list = new LinkedList<Integer>();

for (int j = 0; j < 100; j++) {

list.add(j);

}

for (int i = 0; i < 1000000; i++) {

Iterator<Integer> it = list.iterator();

while (it.hasNext()) {

it.next();

}

it = null;

}

new方式	消耗时间
new LinkedList<Integer>()	1335
new ArrayList<Integer>();	3940

*************************************************源码分析*************************************************

LinkedList.iterator():

[java] view
plaincopy

private class ListItr implements ListIterator<E> {

//header为LikedList的成员变量Entry

private Entry<E> lastReturned = header;

private Entry<E> next;

private int nextIndex;

private int expectedModCount = modCount;

//这里我们会默认传入0，

ListItr(int index) {

if (index < 0 || index > size)

throw new IndexOutOfBoundsException("Index: "+index+

", Size: "+size);

//下面的if和else很有意思，为了实现“所有操作都是按照双重链接列表的需要执行的。在列表中编索引的操作将从开头或结尾遍历列表（从靠近指定索引的一端）。”

//size为外部LikedList的成员变量

if (index < (size >> 1)) {

next = header.next;

for (nextIndex=0; nextIndex<index; nextIndex++)

next = next.next;

} else {

next = header;

for (nextIndex=size; nextIndex>index; nextIndex--)

next = next.previous;

}

}

//判断是否已经超过长度

public boolean hasNext() {

return nextIndex != size;

}

//便宜至下一个元素并返回element

public E next() {

checkForComodification();

if (nextIndex == size)

throw new NoSuchElementException();

lastReturned = next;

next = next.next;

nextIndex++;

return lastReturned.element;

}

...

final void checkForComodification() {

if (modCount != expectedModCount)

throw new ConcurrentModificationException();

}

}

也就是说我们的LinkedList的Iterator性能损耗主要在变换指针，所以对比for遍历方式来说性能提升了很多。

ArrayList.iterator():

[java] view
plaincopy

public Iterator<E> iterator() {

return new Itr();

}

private class Itr implements Iterator<E> {

/**

* Index of element to be returned by subsequent call to next.

*/

int cursor = 0;

/**

* Index of element returned by most recent call to next or

* previous. Reset to -1 if this element is deleted by a call

* to remove.

*/

int lastRet = -1;

/**

* The modCount value that the iterator believes that the backing

* List should have. If this expectation is violated, the iterator

* has detected concurrent modification.

*/

int expectedModCount = modCount;

public boolean hasNext() {

return cursor != size();

}

public E next() {

checkForComodification();

try {

//主要内容在这里，又绕回了ArrayList的获取方式

E next = get(cursor);

lastRet = cursor++;

return next;

} catch (IndexOutOfBoundsException e) {

checkForComodification();

throw new NoSuchElementException();

}

}

}

ArrayList“很懒”的没有做什么，主要借助了父类AbstractList，获取方式采用了模板方法设计模式（Template Method Pattern）。

总结：无论你遍历ArrayList还是LinkedList都可以尽量采用迭代器。

通过add和get的分析，我们最常用的场景就是单页数据获取，然后利用jstl或velocity等遍历展示：

1、如果能确定遍历采用的for循环建议使用ArrayList并采用带参的构造器指定集合长度（也就是每页的个数）；

2、若遍历采用迭代器建议采用LinkedList，因为我们还会有时对dao获取的数据采用缓存策略。

（jstl用的迭代器，可看源码org.apache.taglibs.standard.tag.common.core.ForEachSupport.toForEachIterator方法）

第二部分：Map

1.put(K key, V value)

==============================性能测试=================================

表格统计的差异在于new的方式不同：

[java] view
plaincopy

for (int i = 0; i < 100000; i++) {

Map<Integer, String> map = new TreeMap<Integer, String>();

for (int j = 0; j < 100; j++) {

map.put(j, "s" + j);

}

map = null;

}

new方式	运行时间
new TreeMap<Integer, String>()	2945
new HashMap<Integer, String>()	2029
new HashMap<Integer, String>(100)	1845

*************************************************源码分析*************************************************

TreeMap.put(K key, V value)：

首先确认一点TreeMap采用的“红黑树”二叉查找方式。（具体算法请自行google）

该映射根据其键的自然顺序进行排序，或者根据创建映射时提供的 Comparator 进行排序，具体取决于使用的构造方法。此实现为 containsKey、get、put 和 remove 操作提供受保证的 log(n) 时间开销。

插入大体分为四步：

1、判断是否为初次；

2、获取指定Comparator，若没有指定获取key的父类的实现方式。

3、利用Comparator确定数据存放位置；

4、树的旋转。

[java] view
plaincopy

public V put(K key, V value) {

Entry<K,V> t = root;

//判断是否存在根节点，或者集合是否存在了数据

if (t == null) {

root = new Entry<K,V>(key, value, null);

size = 1;

modCount++;

return null;

}

int cmp;

Entry<K,V> parent;

// split comparator and comparable paths

//这点是获取构造器传入的Comparator

Comparator<? super K> cpr = comparator;

if (cpr != null) {

do {

parent = t;

cmp = cpr.compare(key, t.key);

if (cmp < 0)

t = t.left;

else if (cmp > 0)

t = t.right;

else

return t.setValue(value);

} while (t != null);

}

//若没有指定Comparator，查找父类实现

else {

if (key == null)

throw new NullPointerException();

Comparable<? super K> k = (Comparable<? super K>) key;

//遍历放入确切位置

do {

parent = t;

cmp = k.compareTo(t.key);

if (cmp < 0)

t = t.left;

else if (cmp > 0)

t = t.right;

else

return t.setValue(value);

} while (t != null);

}

Entry<K,V> e = new Entry<K,V>(key, value, parent);

if (cmp < 0)

parent.left = e;

else

parent.right = e;

//树的旋转

fixAfterInsertion(e);

size++;

modCount++;

return null;

}

HashMap.put(K key, V value)：

1、HashMap 的实例有两个参数影响其性能：初始容量和加载因子。容量是哈希表中桶的数量，初始容量只是哈希表在创建时的容量。加载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行 rehash 操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数；

2、key的hashcode至关重要。

[java] view
plaincopy

//构造一个带指定初始容量和加载因子的空 HashMap

public HashMap(int initialCapacity, float loadFactor) {

if (initialCapacity < 0)

throw new IllegalArgumentException("Illegal initial capacity: " +

initialCapacity);

if (initialCapacity > MAXIMUM_CAPACITY)

initialCapacity = MAXIMUM_CAPACITY;

if (loadFactor <= 0 || Float.isNaN(loadFactor))

throw new IllegalArgumentException("Illegal load factor: " +

loadFactor);

// Find a power of 2 >= initialCapacity

int capacity = 1;

//取大于等于初始容量最近的2的倍数

while (capacity < initialCapacity)

capacity <<= 1;

this.loadFactor = loadFactor;

threshold = (int)(capacity * loadFactor);

table = new Entry[capacity];

init();

}

public V put(K key, V value) {

if (key == null)

return putForNullKey(value);

//计算key的hash值

int hash = hash(key.hashCode());

//计算出在第几个“水桶”，计算方式是&与了table.length-1，这应该也是水桶数据是2的倍数的原因

int i = indexFor(hash, table.length);

//通过这步可以理解为什么看到好多文档要求key的hashCode“均匀分布”便于均匀落入各个水桶

for (Entry<K,V> e = table[i]; e != null; e = e.next) {

Object k;

if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {

V oldValue = e.value;

e.value = value;

e.recordAccess(this);

return oldValue;

}

}

modCount++;

addEntry(hash, key, value, i);

return null;

}

2.get

==============================性能测试=================================

（1）表格统计的差异在于new的方式不同：

[java] view
plaincopy

Map<Integer, String> map = new TreeMap<Integer, String>();

for (int j = 0; j < 100; j++) {

map.put(j, "s" + j);

}

for (int i = 0; i < 100000; i++) {

for (int j = 0; j < 100; j++) {

map.get(j);

}

}

new方式	运行时间
new TreeMap<Integer, String>()	542
new HashMap<Integer, String>(100)	280

（2）变化下集合的大小（有时我们使用Map存放上万条乃至几十万条数据用于小数据本地快速缓存）：

[java] view
plaincopy

Map<Integer, String> map = new HashMap<Integer, String>(300000);

for (int j = 0; j < 300000; j++) {

map.put(j, "sasd");

}

beginTime = System.currentTimeMillis();

for (int i = 0; i < 100; i++) {

for (int j = 0; j < 300000; j++) {

map.get(j);

}

}

new方式	运行时间
new TreeMap<Integer, String>()	5100
new HashMap<Integer, String>(300000)	1400

（3）当hashcode为极端情况下：

[java] view
plaincopy

Map<HashKeyTest, String> map = new TreeMap<HashKeyTest, String>();

for (int j = 0; j < 10000; j++) {

map.put(new HashKeyTest(j), "sasd");

}

beginTime = System.currentTimeMillis();

for (int i = 0; i < 10; i++) {

for (int j = 0; j < 10000; j++) {

map.get(new HashKeyTest(j));

}

}

public class HashKeyTest implements Comparable<HashKeyTest> {

private int value;

...

@Override

public boolean equals(Object obj) {

if (this == obj)

return true;

if (obj == null)

return false;

if (getClass() != obj.getClass())

return false;

HashKeyTest other = (HashKeyTest) obj;

if (value != other.value)

return false;

return true;

}

//模拟hashCode的极端情况，返回相同的值

@Override

public int hashCode() {

return 345345435;

}

//用于TreeMap的二叉树对比排序

@Override

public int compareTo(HashKeyTest o) {

int thisVal = this.value;

int anotherVal = o.getValue();

return (thisVal < anotherVal ? -1 : (thisVal == anotherVal ? 0 : 1));

}

}

new方式	消耗时间
new TreeMap<HashKeyTest, String>()	18
new HashMap<HashKeyTest, String>()	3480

*************************************************源码分析*************************************************

TreeMap.get(Object key):

基于红黑树查找

[java] view
plaincopy

public V get(Object key) {

Entry<K,V> p = getEntry(key);

return (p==null ? null : p.value);

}

final Entry<K,V> getEntry(Object key) {

// Offload comparator-based version for sake of performance

if (comparator != null)

return getEntryUsingComparator(key);

if (key == null)

throw new NullPointerException();

Comparable<? super K> k = (Comparable<? super K>) key;

Entry<K,V> p = root;

while (p != null) {

int cmp = k.compareTo(p.key);

if (cmp < 0)

p = p.left;

else if (cmp > 0)

p = p.right;

else

return p;

}

return null;

}

HashMap.get(Object key)：

代码中很简明的可以看出，如果hashCode不均为出现的问题

[java] view
plaincopy

public V get(Object key) {

if (key == null)

return getForNullKey();

int hash = hash(key.hashCode());

//若hashCode相同，会导致查找方式就是这里的不停的遍历

for (Entry<K,V> e = table[indexFor(hash, table.length)];

e != null;

e = e.next) {

Object k;

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

return e.value;

}

return null;

}

总结：

1、key如果是自定义对象，一定要有效地重载hashCode()，可参考《覆盖equals时总要覆盖hashCode》；

2、尽量保证hashCode“均为分布”，以便于均匀填充“水桶”；

3、当我们确定Map存放的数据比较多且hashCode分配严重不均匀，切记不要使用HashMap，建议使用TreeMap；

4、正常情况下，也就是我们常把表id作为key时，建议使用HashMap。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航