詳解Java集合類之HashSet篇

2022-07-22 18:00:47

1.Set介面方法

Set介面物件存放的資料是沒有重複，且資料是無序存放的（新增順序和存放順序不一致，但是這個存放的順序是固定的，不會隨機變化）

程式碼範例：

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

/**
 * Set介面方法
 */
public class SetTest {
    @SuppressWarnings({"all"})
    public static void main(String[] args) {
        Set set = new HashSet();
        // 新增
        set.add("dahe");
        set.add("wangwei");
        set.add(521);
        set.add(521);
        set.add(null);
        System.out.println(set);
        // 遍歷Set
        // 迭代器
        Iterator iterator = set.iterator();
        while (iterator.hasNext()) {
            Object obj =  iterator.next();
            System.out.println(obj);
        }
        // 增強for
        for (Object o : set) {
            System.out.println(o);
        }
    }
}

2.HashSet

HashSet的底層其實，是HashMap：維護的是一個陣列 + 單向連結串列

public HashSet() {
    map = new HashMap<>();
}

HashSet不保證存放元素的順序和取出的順序一致，這取決於hash後，再確定索引的結果

程式碼範例：

import java.util.HashSet;
import java.util.Set;

/**
 * HashSet
 */
public class HashSetText {
    @SuppressWarnings({"all"})
    public static void main(String[] args) {
        Set hashSet = new HashSet();
        // 新增
        hashSet.add("dahe");
        // 新增成功，返回true，失敗返回false
        System.out.println(hashSet.add("qian"));
        System.out.println(hashSet.add("qian"));
        System.out.println(hashSet);
        // 新增物件，以下是不同的物件
        hashSet.add(new DDD("aaa"));
        hashSet.add(new DDD("aaa"));
        System.out.println(hashSet);
        // 經典面試題，以下的兩個只能新增一個
        hashSet.add(new String("hsp"));
        hashSet.add(new String("hsp"));
        System.out.println(hashSet);
    }
}

class DDD {
    private String name;

    public DDD(String name) {
        this.name = name;
    }

    @Override
    public String toString() {
        return "DDD{" +
                "name='" + name + ''' +
                '}';
    }
}

3.HashSet的擴容機制 - 初次新增資料

針對如下的程式碼對java的擴容機制進行分析：

hashSet.add("dahe");
System.out.println(hashSet.add("qian"));
System.out.println(hashSet.add("qian"));

執行add操作：（傳入待新增的值e和PRESENT，這裡的PRESENT只起到一個佔位的效果）

private static final Object PRESENT = new Object();

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

繼續步入：

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

在進入putVal方法之前，我們先來看一下這個hash的演演算法：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

如果待新增的資料為null，則返回0值，否則返回hash演演算法的結果（此演演算法可以極大的防止衝突的發生）

執行putVal方法：這個方法很重要（且複雜）！

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

不要慌，我們來一步一步進行分析：

先來看一下這個東西：

Node<K,V>[] tab;

這個是存放Map Node節點的陣列，如果你精通資料結構鄰接表，對這個陣列應該很熟悉，tab裡面的儲存結構是這樣的：（下圖僅作範例）

當這個節點陣列為空或者大小為0的時候，會觸發這個操作：（tab先進行resize操作，隨後返回給n一個處理後陣列的大小）

if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

那這個resize操作到底是什麼呢？我們步入來看看它的真面目：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

由於初始化tab為null，經過一番操作，會執行如下的程式碼，這裡給計算了新陣列的空間大小：

newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

DEFAULT_INITIAL_CAPACITY的定義，預設表的大小為16：

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

下面這裡是JDK設計者的聰明所在，tab陣列並非用到空之後擴容，而是內部有一個臨界的值newThr，所用的空間達到臨界的值會觸發擴容機制（容量*2），起到一個緩衝的效果，這樣做主要是為了防止阻塞

注意：這裡的空間指的是全部節點的數量，而非tab元素的個數

newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

一切準備就緒，開始擴容（這裡初始化擴容的tab容量為16）：

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;

我們再回到putVal方法，看一下接下來會發生什麼有趣的事情

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

根據key得到hash，去計算該key應該存放到table表的那個索引位置，並把這個位置物件賦值給p，如果p為null的話，表示該索引位置還沒有存放過任何的資料，就在tab[i]位置建立一個Node，建立完新Node之後，它在tab陣列中的儲存結構就變成了這樣：

繼續向下走，修改次數 + 1，並且還要判斷一次tab元素數量是否大於了臨界值，如果大於了臨界值，進行擴容操作：

++modCount;
if (++size > threshold)
    resize();

最後，返回null，代表一切操作成功！

至此，初次新增資料的操作就已經完成了！

4.HashSet的擴容機制 - 繼續新增資料

初次新增資料的結構其實很簡單，更加困難的是第二次新增資料的操作

我們繼續步入，再次追到putVal方法：

和初次新增不同的是，不會再進入下面的語句，而是向下執行：

if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

直接判斷計算得出的tab索引位置有沒有資料，沒有的話（實驗的值沒有）繼續新建節點：

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

新增完資料後，tab裡面的結構就變成了這樣：

5.HashSet的擴容機制 - 新增重複元素

此時存在兩個key是相等的，那麼下面的語句必然不會為空，因為key相等，那麼他們hash過後的值也會相等：

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

繼續步入，走到else裡面，我們來看一下if語句裡面的內容：

if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;

如果當前索引位置對應的連結串列的第一個元素和準備新增的key的hash值一樣

並且滿足準備：（比較地址和值）

加入的key和p指向的Node節點的key是同一個物件
不是同一個物件，但是通過equals比較過後相同

這時就不能加入，執行：e = p;

再來看看else if語句：

判斷p是不是一顆紅黑樹，如果是的話就按照紅黑樹的方式進行比較：

else if (p instanceof TreeNode)
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

再看看else語句：

for (int binCount = 0; ; ++binCount) {
    if ((e = p.next) == null) {
        p.next = newNode(hash, key, value, null);
        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
        break;
    }
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        break;
    p = e;
}

當前索引位置已經是一個連結串列。會依次和該連結串列的每一個節點進行比較，有重複的直接break掉，沒有重複的進行掛載

注意：在新增新節點之後，需要進行一次連結串列長度判斷，看下當前連結串列中是否已經有8個節點了，如果已經存在了8個節點，會通過treeifyBin方法嘗試進化連結串列為紅黑樹

有趣的是，在進化紅黑樹的程式碼中，存在下面這兩行：

if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
    resize();

這裡面的MIN_TREEIFY_CAPACITY定義如下：

static final int MIN_TREEIFY_CAPACITY = 64;

也就是說，如果tab長度小於64，不會馬上進行樹化，會先進行tab擴容操作！

到此這篇關於詳解Java集合類之HashSet篇的文章就介紹到這了,更多相關Java集合類HashSet內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com！