您的位置：首页 > 其它

【原创】关于hashcode和equals的不同实现对HashMap和HashSet集合类的影响的探究

2015-01-28 14:45 465 查看

这篇文章做了一个很好的测试：http://blog.csdn.net/afgasdg/article/details/6889383，判断往HashSet（不允许元素重复）里塞对象时，是如何判定set里的对象是否一致的问题。但是他是从写测试例子入手来看的，或许全文看下来还迷迷蒙蒙，印象不够深刻。

所以看了这篇博文，我又去看了下HashSet的源码，在添加新元素时，调用的方法如下：

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

这说明，HashSet里存储对象最终还是利用了HashMap来存，所以如何判定对象是否一致的问题还要落在HashMap上。看看HashMap的put方法实现：

public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

modCount++;
addEntry(hash, key, value, i);
return null;
}

其中最关键的几行，

int hash = hash(key);

取得传入的key的hash值，当然这里的key就是我们需要存储如HashSet的对象啦。

那是如何取得hash值的呢？看看hash方法是如何实现的：

final int hash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}

h ^= k.hashCode();

// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

对于非String对象，采取的是调用key自身实现的hashcode方法，并做一些异或操作，减少冲突，使得存储到HashMap时相对分散，提高效率。

返回到put方法上的第二个关键代码行：

int i = indexFor(hash, table.length);

这一行代码是取得hash值对应的index位置，取得之后，就开始在对应链表上查找对象了（顺便说下HashMap的存储结果，它是一个数组和链表的组合体，由hashcode定位到数组的对应位置，然后将对象存储到这个位置指向的链表中，http://www.cnblogs.com/highriver/archive/2011/08/15/2139462.html文章说的很清楚，可以参考阅读）。在链表中查找对象的代码如下：

for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

那么重点来了，看这一行：

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

所以这里我们可以得出结论：在往HashSet里add对象或者往HashMap里add对象(看key是否相同)时，比较对象(HashMap中是对其key比较)是否相同时，先看hashcode是否一样，不一样则表示一定不一样，可以添加；否则就表示是不相同的key（对hashset来说就是不同对象，可以同时保存在set中）；如果hashcode相同，再比较对象地址是否一样或者两个对象是否equals，有一个为true则表示是相同的。

相同则更新value值，这个我们不必关心，因为hashset没有value值，或者说value值都是PRESENT，而PRESENT是一个东东：

private static final Object PRESENT = new Object();

看到这里，差不多明白了HashSet是如何比较存入对象是否相同的了。那又该反思下自定义的覆盖的hashcode方法和equals方法实现了。

要使用HashSet和HashMap这些集合类进行存储，必须覆盖hashcode方法和equals方法实现，不然调用的都是Object的默认实现，而Object的默认实现是这样的：

equals默认实现：

public boolean equals(Object obj) {
return (this == obj);
}

这里实现默认是对比对象的物理地址，如果你要改变这种equal方式，需要自己覆盖重写。看HashMap中对比一个key是否存在中&&后面的

(k = e.key) == key || key.equals(k)

只要一个成立则可以，所以equals的默认实现再这里就没有意义了，其表达的内在含义就在于：只要你覆盖了Object的equals默认实现，就只看你自己的equals 来绝对是否相等了（因为一般物理地址相等的不可能不equals）。

再来看看hashcode的实现：

/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {@link java.util.HashMap}.
* <p>
* The general contract of {@code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
*     an execution of a Java application, the {@code hashCode} method
*     must consistently return the same integer, provided no information
*     used in {@code equals} comparisons on the object is modified.
*     This integer need not remain consistent from one execution of an
*     application to another execution of the same application.
* <li>If two objects are equal according to the {@code equals(Object)}
*     method, then calling the {@code hashCode} method on each of
*     the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
*     according to the {@link java.lang.Object#equals(java.lang.Object)}
*     method, then calling the {@code hashCode} method on each of the
*     two objects must produce distinct integer results.  However, the
*     programmer should be aware that producing distinct integer results
*     for unequal objects may improve the performance of hash tables.
* </ul>
* <p>
* As much as is reasonably practical, the hashCode method defined by
* class {@code Object} does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java<font size="-2"><sup>TM</sup></font> programming language.)
*
* @return  a hash code value for this object.
* @see     java.lang.Object#equals(java.lang.Object)
* @see     java.lang.System#identityHashCode
*/
public native int hashCode();

也许你看到会很奇怪，为啥没有实现呢？都是祖先类了。这也是我为什么贴注释的原因了。

native关键字所定义的方法的实现不在java源码范围内，它是java与其他语言协作运行所定义的一种方法。

它是操作系统基本的实现，一般应该是用C/C++实现的的原生方法，并且被编译为DLL，由Java去调用。我们知道Java是一个跨平台的语言，所以它只能牺牲这些底层的控制，根据不同平台的具体实现来调用。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航