您的位置：首页 > 编程语言 > Java开发

jdk1.6 的 HashMap 源码分析及1.7,1.8的主要更改

2017-07-02 23:35 706 查看

HashMap源码分析

基于jdk 1.6.0_45

[b]Map[/b]
一个Map可以返回keys的Set集合，values的Collection集合，或者key-value
pairs的Set集合

equals方法

public boolean equals(Object obj) {
return (this == obj);
}

为了在散列表中将自己的类作为键使用，必须同时覆盖hashCode()和equals()方法。equals()要满足以下5条

  自反性，对称性，传递性，

一致性:
对于任何非空引用值 x
和y，多次调用x.equals(y)始终返回
true或始终返回false，前提是对象上equals比较中所用的信息没有被修改

   对于任何非空引用值 x，x.equals(null)都应返回false。

    以下是Object类中equals()方法的代码:对于任何非空引用值x和y，当且仅当x和y引用同一个对象时，此方法才返回true

public boolean equals(Object obj) {
return (this == obj);
}

当此方法被覆写时，通常有必要重写 hashCode
方法，以维护 hashCode
方法的常规协定，该协定声明相等对象必须具有相等的哈希码。

@Override
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Complex))
return false;
Complex c = (Complex) o;
// ....
}

hashCode方法

    public int hashCode()

    返回该对象的哈希码值。支持此方法是为了提高哈希表（例如 java.util.Hashtable提供的哈希表）的性能。

    hashCode 的常规协定是：

    在 Java
应用程序执行期间，在对同一对象多次调用 hashCode
方法时，必须一致地返回相同的整数，前提是将对象进行 equals比较时所用的信息没有被修改。从某一应用程序的一次执行到同一应用程序的另一次执行，该整数无需保持一致。

    如果根据 equals(Object)
方法，两个对象是相等的，那么对这两个对象中的每个对象调用 hashCode方法都必须生成相同的整数结果。

    如果根据
equals(java.lang.Object) 方法，两个对象不相等，那么对这两个对象中的任一对象上调用 hashCode方法不要求一定生成不同的整数结果。但是，程序员应该意识到，为不相等的对象生成不同整数结果可以提高哈希表的性能。

实际上，由 Object
类定义的 hashCode
方法确实会针对不同的对象返回不同的整数。（这一般是通过将该对象的内部地址转换成一个整数来实现的，但是Java编程语言不需要这种实现技巧。）

[b]属性field[/b]
HashMap允许Key是null

transient int size;

该变量保存了该 HashMap
中所包含的 key-value
对的数量。

transient Entry[] table;
int threshold;
final float loadFactor;

capacity是table数组的length。threshold是HashMap能容纳的key-value对的最大值，它的值等于HashMap的capacity乘以负载因子（load
factor）;当size++ >= threshold
时，HashMap会自动调用resize方法扩充HashMap的容量。每扩充一次，HashMap的容量就增大一倍。hashmap是数组和链表的结合体,新建hashmap的时候会初始化一个数组Entry[] table

transient volatile int modCount;

这个hashMap结构上修改的次数，结构上修改是指key-value的数量的修改和rehash(调用resize方法容量增长一倍)用于iterators的快速失败(ConcurrentModificationException)

以下是静态常量

static final int DEFAULT_INITIAL_CAPACITY = 16;
static final int MAXIMUM_CAPACITY = 1 << 30;
static final float DEFAULT_LOAD_FACTOR = 0.75f;

以下是Entry静态内部类的定义

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
final int hash;
Entry<K,V> next;
//…
}

[b]构造方法[/b]

public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR;
threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
table = new Entry[DEFAULT_INITIAL_CAPACITY];
init();
}
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);

// Find a power of 2 >= initialCapacity
//找到大于等于initialCapacity的最小的那个2次幂
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;

this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}

//空方法，用于子类的初始化hook

void init() {
}

capacity是大于等于initialCapacity的最小的那个2的整数次方

[b]put方法[/b]
新加入的放在链头，这样最先加入的便会在链尾;
从hashmap中get元素时，首先计算key的hashcode，找到数组中对应的Entry，然后通过key的equals方法在对应位置的链表中找到需要的元素。从这里我们可以想象得到，如果每个位置上的链表只有一个元素，那么hashmap的get效率将是最高的

我们首先想到的就是把hashcode对数组长度取模运算，这样一来，元素的分布相对来说是比较均匀的。但是，"模"运算的消耗还是比较大的，用"按位与"更快

static int indexFor(int h, int length) {
return h & (length-1);
}

当 hashmap的数组大小(capacity)是2的某次方大小时，取模运算可以用"按位与"来完成

static int hash(int h) {
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

modCount++;
addEntry(hash, key, value, i);
return null;
}

//null key固定放在table[0]上
private V putForNullKey(V value) {
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(0, null, value, 0);
return null;
}

//空方法，entry的value被put方法override时会调用此方法
void recordAccess(HashMap<K,V> m) {
}

//如果for循环中没有找到，就要在table[i]中新增一个entry
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}

[b]扩容resize[/b]
那么hashmap什么时候进行扩容呢？当hashmap中的元素个数超过当前的threshold即
capacity*loadFactor时，就会进行数组扩容。如果当前的容量是MAXIMUM_CAPACITY，resize方法不会改变table的大小，仅会把threshold设为Integer.MAX_VALUE。一般扩容后threshold增加一倍，capacity增长一倍。loadFactor的默认值为0.75，也就是说，默认情况下，数组大小为16，那么当hashmap中元素个数超过16*0.75=12的时候，就把数组的大小扩展为2*16=32，即扩大一倍，然后重新计算每个元素在数组中的位置，而这是一个非常消耗性能的操作，所以如果我们已经预知hashmap中元素的个数，那么预设元素的个数能够有效的提高hashmap的性能

void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}

Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}

void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}

完全遍历原来hashmap的每一个bucket,在每个bucket中遍历每个entry,一个bucket中的所有entry不一定在新hashmap的同一个bucket。同addEntry方法一样，后复制过去的entry在bucket的第一个位置

遍历过程不涉及到object的copy,只是reference的copy

[b]get方法[/b]

public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}

private V getForNullKey() {
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null)
return e.value;
}
return null;
}

[b]remove[/b]

public V remove(Object key) {
Entry<K,V> e = removeEntryForKey(key);
return (e == null ? null : e.value);
}
final Entry<K,V> removeEntryForKey(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
int i = indexFor(hash, table.length);
Entry<K,V> prev = table[i];
Entry<K,V> e = prev;

while (e != null) {
Entry<K,V> next = e.next;
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k)))) {
modCount++;
size--;
if (prev == e)
table[i] = next;
else
prev.next = next;
e.recordRemoval(this);
return e;
}
prev = e;
e = next;
}

return e;
}

clear

public void clear() {
modCount++;
Entry[] tab = table;
for (int i = 0; i < tab.length; i++)
tab[i] = null;
size = 0;
}

[b]jdk7中的HashMap[/b]

public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +loadFactor);

this.loadFactor = loadFactor;
threshold = initialCapacity;
init();
}

threshold的计算与JDK 1.6中完全不同，它与合约因子无关，而是直接使用了初始大小作为阈值的大小，但是这仅是针对第一次改变大小前，因为在resize函数（改变容量大小的函数，扩充容量便是调用此函数）中，有如下代码：

threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
也即是说，在改变一次大小后，threshold的值仍然跟负载因子相关，与JDK 1.6中的计算方式相差无几（未讨论容量到达最大值1,073,741,824时的情况）。

而addEntry函数也与JDK 1.6中有所不同，其源码如下：

void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null != table[bucketIndex])) {
resize(2 * table.length);
hash = (null != key) ? hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
}

createEntry(hash, key, value, bucketIndex);
}

从上面的代码可以看出，在JDK 1.6中，判断是否扩充大小是直接判断当前数量是否大于或等于阈值，而JDK 1.7中可以看出，其判断是否要扩充大小除了判断当前数量是否大于等于阈值，同时也必须保证当前数据要插入的桶不能为空

[b]jdk8中的HashMap[/b]
JDK 1.8对于HashMap的实现，新增了红黑树的特点，所以其底层实现原理变得不一样

JDK 1.6 当数量大于容量 *负载因子即会扩充容量。

JDK 1.7 初次扩充为：当数量大于容量时扩充；第二次及以后为：当数量大于容量 *负载因子时扩充。

JDK 1.8 初次扩充为：与负载因子无关；第二次及以后为：与负载因子有关。其详细计算过程需要具体详解。

注：以上均未考虑最大容量时的情况。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： java hashmap jdk

相关文章推荐

新的分享

章节导航