Java基础知识_HashMap

一、HashMap剖析

首先看看HashMap的顶部注释说了些什么：

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>　　　　　　　　　　　　　　允许为null，不保证有序

 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order　　　　　　　　　　　　初始容量太高和装载因子太低对遍历都不好
 * will remain constant over time.
 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *
 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the　　　　　　　　　　　　当初始容量*装载因子小于哈希表的容量时，哈希表
 * number of buckets.　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　会再散列，桶数会*2
 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the　　　　　　　　　　　　　　　装载因子默认是0.75，设置高会减少空间，但是会增加
 * space overhead but increase the lookup cost (reflected in most of　　　　　　　　　　　　　　遍历开销
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 * <p>If many mappings are to be stored in a <tt>HashMap</tt>　　　　　　　　　　如果知道有足够多的数据存到HashMap，最好设置初始容量，比自然散列
 * instance, creating it with a sufficiently large capacity will allow　　　要好很多。
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>　　　　　　不同步但可以用Collection工具类；
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 *
 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *
 * <p>The iterators returned by all of this class's "collection view methods"　　　　　　　　迭代器相关
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>

再来看看HashMap的类继承图：

下面我们来看一下HashMap的属性

 　　static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16　　　　　　初始容量是16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;　　　　　　　　　　　　　　　　　最大容量是2的31次方

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;　　　　　　　　　　　　　　 默认装载因子

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;　　　　　　　　　　　　　　　　桶超过8个节点，转化成树形结构

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;　　　　　　　　　　　　　　同上，不过是把树形转换成链表

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;　　　　　　　　　　　　　桶可能转化成树形结构的最小容量

成员属性有这么几个

再来看一下，hashmap的一个内部类Node

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;　　　　链表结构，存储下一个元素

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

我们知道Hash的底层是散列表，而在Java中散列表的实现是通过数组+链表的

再来简单看看put方法就可以印证我们的说法了：数组+链表=散列表

我们可以简单总结出HashMap

　　无序，允许为null，非同步

　　底层由散列表实现

　　初始容量和装在因子对HashMap影响挺大的，设置小了不好，设置大了也不好

1.1 HashMap构造方法

HashMap的构造方法有4个

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

　　判断初始化大小是否合理，如果超过就赋值最大值，初始化装载因子

在上面的构造方法的最后一行，我们会发现调用tableSizeFor()，我们进去看看

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

返回的是一个大于输入参数且最近的2的整数次幂的数

为什么是2的整数次幂呢？hash%length=hash&(length-1)，但前提是length是2的n次方，并且采用&运算比%运算效率高

这里是一个初始化，在创建哈希表的时候，它会重新赋值（capacity*loadfactor）

其他的构造方法就不多说了。

1.2 put方法

put方法可以说是HashMap的核心，我们来看看

1     public V put(K key, V value) {
2         return putVal(hash(key), key, value, false, true);
3     }

调用了putVal方法，以key计算哈希值，传入key和value，还有两个参数

我们来看看是怎么计算哈希值的

1     static final int hash(Object key) {
2         int h;
3         return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
4     }

得到key的HashCode，与KeyHashCode的高16位做异或运算

为什么要这么干呢？？我们一般来说直接将key作为哈希值不就好了，做异或运算是干嘛用的？？

我们看下来

我们是根据key的哈希值来保存在散列表中的，我们表默认的初始容量是16，要放到散列表中也就是0-15的位置

也就是tab[n-1&hash]，我们可以发现的是仅仅后四位有效，那如果我们key的哈希值高位变化很大，地位变化很小。直接拿过去&运算，这就会导致计算出来的Hash值相同的很多。

而设计者将key的哈希值的高位做了运算（与高16位做异或运算，使得在做&的时候，此时的低位实际上是高位和地位的结合），这就增加了随机性，减少了碰撞冲突的可能性。

下面我们再来看看流程是怎么样的

1.3 get方法

1     public V get(Object key) {
2         Node<K,V> e;
3         return (e = getNode(hash(key), key)) == null ? null : e.value;
4     }

计算key的哈希值，调用getNode获取相对应的value

接下来我们看看getNode()是怎么实现的：

 1     final Node<K,V> getNode(int hash, Object key) {
 2         Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
 3         if ((tab = table) != null && (n = tab.length) > 0 &&
 4             (first = tab[(n - 1) & hash]) != null) {
 5             if (first.hash == hash && // always check first node
 6                 ((k = first.key) == key || (key != null && key.equals(k))))
 7                 return first;
 8             if ((e = first.next) != null) {
 9                 if (first instanceof TreeNode)
10                     return ((TreeNode<K,V>)first).getTreeNode(hash, key);
11                 do {
12                     if (e.hash == hash &&
13                         ((k = e.key) == key || (key != null && key.equals(k))))
14                         return e;
15                 } while ((e = e.next) != null);
16             }
17         }
18         return null;
19     }

计算出来的哈希值是在哈希表上的，如果在桶的首位上就可以找到，那么就直接返回，否则就在红黑树或者链表中寻找。

1.4 remove方法

1    public V remove(Object key) {
2         Node<K,V> e;
3         return (e = removeNode(hash(key), key, null, false, true)) == null ?
4             null : e.value;
5     }

也是计算key的哈希值来计算来删除value

再来看看removeNode()的实现：

二、HashMap和HashTable对比

从存储结构和实现来讲基本上都是相同的。它和HashMap的最大的不同是它是线程安全的，另外它不允许key和value为null。Hashtable是个过时的集合类，不建议在新代码中使用，不需要线程安全的场合可以用HashMap替换，需要线程安全的场合可以用ConcurrentHashMap替换

四、总结

在JDK8中HashMap的底层是数组+链表（散列表）+红黑树

在散列表中有装载因子这么一个属性，当装载因子*初始容量小于散列表元素时，该散列表会再扩散，扩容2倍

装载因子的默认值是0.75，无论初始大了还是初始小了，都对Hash Map的性能都不好

　　装载因子初始值大了，可以减少散列表的扩容次数，但同时会导致散列冲突的可能性变大（散列冲突也是耗性能的一个操作，要得操作链表红黑树）

　　装载因子初始值小了，可以减少冲突得可能性，但同时扩容得次数会变多

初始容量默认值是16，也一样，无论是大了还说小了，对我们得HashMap都是有影响的：

　　初始容量过大，那么我们遍历的速度就会受到影响

　　初始容量过小，那么再散列（扩容的次数）可能就变得多了，扩容也是一件非常耗性能的事情

从源码上我们可以发现：HashMap并不是直接拿Key的哈希值来用的，它会将key的哈希值的最高位16位进行异或操作，使得我们将元素放入哈希表的时候增加了一定的随机性。

还要值得注意的是：并不是桶子上有8位元素的时候它就能变成红黑树的，得同时满足散列表得容量大于64才行的

posted @ 2019-09-09 15:15 chyblogs 阅读(230) 评论(0) 收藏举报

刷新页面返回顶部

chyblogs

Java基础知识_HashMap

公告