您的位置：首页 > 其它

HashMap 1.8 部分源码解读

2019-04-22 16:35 169 查看

HashMap 1.8 相比于 1.7 及之前，将集合的结构变为：数组+链表+红黑树。引入红黑树自然是为了提高查询效率。这里摘录部分源码进行分析：

开头注释
几个类常量属性
构造方法
put, hash, get, remove方法
为什么capacity为2的次幂

1. 开头注释部分
我觉得还是可以读一下的，这一部分大致总结了一下HashMap

/**
* Hash table based implementation of the <tt>Map</tt> interface.  This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.)  This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*

这一段提出 HashMap 实现了 Map 接口，提供了所有可选择的map操作，
并且允许values（多个）值为null，允许存在一个为null的key，
其与Hashtable相似，差别于：

HashMap可以存放null值，而Hashtable不行；
同时，HashMap非线程同步，即非线程安全的，而Hashtable是线程安全的。

* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets.  Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings).  Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.

当hash方法可以使使元素均匀分布在buckets（我认为指的是数组中的每一个位置）中时，HashMap的get和put操作，是常量时间的（时间复杂度O(1)）。
对于HashMap集合的遍历,需要的时间和buckets数量（数组大小，capacity）以及键值对数量之和成比例。
因此，如果对遍历的性能很看重的话，最好不要将初始的capacity设置的过大（或者将负载因子设置的过小）
负载因子用于决定何时需要对数组扩容，默认值为0.75

<p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>.  The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created.  The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased.  When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.

影响HashMap性能的两个实例变量分别为：initial capacity(初始容量) 以及 load factor(负载因子)
初始容量是指hash表创建时的容量。负载因子是指：当hash表的使用程度为多少时，对其容量进行扩充。
当entries(键值对）数量超过 load factor * capacity 时，对hash表进行扩容，也就是rehashed，将其扩容至二倍。

<p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs.  Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations.  If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.

load factor 设置为默认值0.75时对时间和空间消耗达到一个平衡。较大的load factor会减少空间消耗但是会增加查找消耗。
当设置初始capacity时需要考虑entries数量以及load factor，以最小化rehash操作的次数。
当初始 capacity > entries数量/load factor,扩容就不会发生。

<p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table.  Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.

如果需要将很多键值对存进HashMap，建议创建一个capacity足够大的map,而不是让其通过rehash来扩容。

注意：使用很多具有相同hashCode值的keys必然会降低hash表的性能。当keys实现了Comparable接口时，可以使用keys之间的比较顺序来减小影响（这句不太明白）

/* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally.  (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.)  This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.

*If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method.  This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
*   Map m = Collections.synchronizedMap(new HashMap(...));</pre>*/

HashMap是非线程同步的。若多个线程同时访问一个HashMap对象，且至少有一个线程修改了map结构（结构改变是指添加或删除一个或多个键值对，而不是仅仅改变了已经存在的key与value的关系），就必须要进行外部同步，一般通过对一些封装map的对象进行同步来实现(我认为就是使用一些线程安全之类的手段如Sychronized关键字之类的)
若没有这种对象存在，map应该使用Collections.sychronizedMap()方法进行包装。这一过程最好在创建时完成，以避免对map的突然非同步访问。使用示例：
Map m = Collections.synchronizedMap(new HashMap(…));

* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}.  Thus, i
4000
n the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.

HashMap的迭代器是具有 fail-fast 特性的：若迭代器创建后的任意时刻，map结构被修改，除了迭代器自身的remove方法，均会抛出ConcurrentModificationException异常。因此，在面临并发修改时，iterator会立马失败，以避免冒任何风险。

<p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification.  Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>

注：iterator的fail-fast 行为是不能被保证的，因为，通常，在发生非同步并发修改时做出任何硬性保障是不可能的。Fail-fast 迭代器只能尽最大的努力抛出ConcurrentModificationException异常。
因此，依赖该异常编写程序确定其正确性是不对的：fail-fast行为应被用于检测bugs。

下面开始代码部分
2. 几个类常量属性：

DEFAULT_INITIAL_CAPACITY: 默认初始容量，值为2的4次方，必须为2的幂次方（最后解释）。

MAXIMUM_CAPACITY：最大容量， 2的30次方，同时也要求是2的次幂。

DEFAULT_LOAD_FACTOR：默认负载因子，为0.75

TREEIFY_THRESHOLD：将链表转化为树的阈值，当链表的长度达到8时转换为红黑树。

UNTREEIFY_THRESHOLD：当树中节点数小于等于6时，将其转换为链表。（之前看到有人说这个和TREEIFY_THRESHOLD之间差2的原因是：防止树形化和链表化的操作太频繁，我觉有点道理）

MIN_TREEIFY_CAPACITY：capacity 大于等于这个值（64）时，才可以进行树形化，小于该值时，可能出现一个bucket 里的节点过多，此时应该将capacity重新设置，而不是进行树形化。

一个静态内部类，用于链表及红黑树。

3. HashMap的构造方法：有4个

含有初始容量initialCapacity和负载因子loadFactor两个参数的构造方法，这里就不解释代码了
只有初始容量一个参数的构造方法，这里调用了上一个构造方法，负载因子设置为默认值0.75
无参构造方法（常用）
含有一个Map类型参数的构造方法，将一个map复制给新的hash map。

4. put, hash, get, remove 方法
put()方法：向HashMap对象中添加键值对，可以看到其调用了hash()方法计算key的hash值，并调用了putVal()方法。

hash()方法，将key.hashCode(）结果的高16位及低16为进行了异或。

注释中给出了原因，最后使用hash值确定该key的key-value对应的数组位置时，会使用 hash&(n-1)。 如果直接使用两个key的hashCode()进行操作，很容易发生hash冲突，将高低位异或会减少这种事情的发生。

如：假设两个

key: key1，key2

，其hashCode()值如下

虽然

key1 != kye2

, 但是

key1& (n-1) == key2 & (n-1)

但是hash操作后：

hash(key1)

变为：

hash(key2)

hash(key1) & (n-1)

hash(key2) & (n-1)

明显结果不同了，这样就减少了hash冲突。

/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p；
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

put大致步骤如下：

首先需要确保数组是存在的
之后，确定数组索引hash & (n-1)位置上是否为null，即判断是否要放入的key的位置上有hash冲突。
若数组该位置为空，生成一个含有key-value信息的Node，放入数组该位置。
不为空：
若该头节点key与要插入的key相同，则不用新生成Node，直接对该节点进行修改即可
若不同：
若头节点为树节点，则调用putTreeVal方法进行操作
若头节点为链表节点，遍历链表；
若链表中无该key,则生成一个Node
若有，直接对该节点进行修改即可
完成上述操作后：
++modCount(修改次数)
同时需要判断是否需要扩容。

get方法：

其调用
```
getNode()
```
方法：

final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}

get 方法大致步骤：

需要确定数组存在，不为空且key对应的索引位中不为空
若数组中的头节点Node的key与key相同，直接返回该节点即可
头节点key与key不同：
若头节点为树节点，则调用getTreeNode方法，在树中查询
头节点为链表节点，遍历链表,存在key的节点则返回该节点
完成上述操作，若没有key的节点，则返回null.

remove 方法

调用了removeNode方法

final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
Node<K,V>[] tab; Node<K,V> p; int n, index;
if ((tab = table) != null && (n = tab.length) > 0 &&
(p = tab[index = (n - 1) & hash]) != null) {
Node<K,V> node = null, e; K k; V v;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
node = p;
else if ((e = p.next) != null) {
if (p instanceof TreeNode)
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else {
do {
if (e.hash == hash &&
((k = e.key) == key ||
(key != null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while ((e = e.next) != null);
}
}
if (node != null && (!matchValue || (v = node.value) == value ||
(value != null && value.equals(v)))) {
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
else if (node == p)
tab[index] = node.next;
else
p.next = node.next;
++modCount;
--size;
afterNodeRemoval(node);
return node;
}
}
return null;
}

removeNode方法步骤：

确定数组存在不为空且(hash & n-1)处有Node：
首先要找到key的节点：
若头节点key为key：node = p
否则：
若头节点为树节点：调用getTreeNode方法
头节点为链表节点：遍历链表找key的Node
找到节点，若matchValue为true需要确定value值也相同(但是matchValue默认值为false，因此这里不用考虑value值)
若节点为树节点：调用removeTreeNode方法
为头节点：删除头节点
为链表节点：删除该节点

最后说一下为什么capacity,即数组容量为2的次幂？
原因很简单：保证hash & (capacity-1)可以落在 0~capacity-1上。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航