PostgreSQL中hash_search_with_hash_value函數有什么作用

發布時間：2021-11-09 15:51:40 來源：億速云閱讀：163 作者：iii 欄目：關系型數據庫

本篇內容主要講解“PostgreSQL中hash_search_with_hash_value函數有什么作用”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實用性強。下面就讓小編來帶大家學習“PostgreSQL中hash_search_with_hash_value函數有什么作用”吧!

一、數據結構

BufferDesc
共享緩沖區的共享描述符(狀態)數據

/*
 * Flags for buffer descriptors
 * buffer描述器標記
 *
 * Note: TAG_VALID essentially means that there is a buffer hashtable
 * entry associated with the buffer's tag.
 * 注意:TAG_VALID本質上意味著有一個與緩沖區的標記相關聯的緩沖區散列表條目。
 */
//buffer header鎖定
#define BM_LOCKED               (1U << 22)  /* buffer header is locked */
//數據需要寫入(標記為DIRTY)
#define BM_DIRTY                (1U << 23)  /* data needs writing */
//數據是有效的
#define BM_VALID                (1U << 24)  /* data is valid */
//已分配buffer tag
#define BM_TAG_VALID            (1U << 25)  /* tag is assigned */
//正在R/W
#define BM_IO_IN_PROGRESS       (1U << 26)  /* read or write in progress */
//上一個I/O出現錯誤
#define BM_IO_ERROR             (1U << 27)  /* previous I/O failed */
//開始寫則變DIRTY
#define BM_JUST_DIRTIED         (1U << 28)  /* dirtied since write started */
//存在等待sole pin的其他進程
#define BM_PIN_COUNT_WAITER     (1U << 29)  /* have waiter for sole pin */
//checkpoint發生,必須刷到磁盤上
#define BM_CHECKPOINT_NEEDED    (1U << 30)  /* must write for checkpoint */
//持久化buffer(不是unlogged或者初始化fork)
#define BM_PERMANENT            (1U << 31)  /* permanent buffer (not unlogged,
                                             * or init fork) */
/*
 *  BufferDesc -- shared descriptor/state data for a single shared buffer.
 *  BufferDesc -- 共享緩沖區的共享描述符(狀態)數據
 *
 * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
 * the tag, state or wait_backend_pid fields.  In general, buffer header lock
 * is a spinlock which is combined with flags, refcount and usagecount into
 * single atomic variable.  This layout allow us to do some operations in a
 * single atomic operation, without actually acquiring and releasing spinlock;
 * for instance, increase or decrease refcount.  buf_id field never changes
 * after initialization, so does not need locking.  freeNext is protected by
 * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
 * of itself.  The buffer header lock is *not* used to control access to the
 * data in the buffer!
 * 注意:必須持有Buffer header鎖(BM_LOCKED標記)才能檢查或修改tag/state/wait_backend_pid字段.
 * 通常來說,buffer header lock是spinlock,它與標記位/參考計數/使用計數組合到單個原子變量中.
 * 這個布局設計允許我們執行原子操作,而不需要實際獲得或者釋放spinlock(比如,增加或者減少參考計數).
 * buf_id字段在初始化后不會出現變化,因此不需要鎖定.
 * freeNext通過buffer_strategy_lock鎖而不是buffer header lock保護.
 * LWLock可以很好的處理自己的狀態.
 * 務請注意的是:buffer header lock不用于控制buffer中的數據訪問!
 *
 * It's assumed that nobody changes the state field while buffer header lock
 * is held.  Thus buffer header lock holder can do complex updates of the
 * state variable in single write, simultaneously with lock release (cleaning
 * BM_LOCKED flag).  On the other hand, updating of state without holding
 * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
 * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed.
 * 假定在持有buffer header lock的情況下,沒有人改變狀態字段.
 * 持有buffer header lock的進程可以執行在單個寫操作中執行復雜的狀態變量更新,
 *   同步的釋放鎖(清除BM_LOCKED標記).
 * 換句話說,如果沒有持有buffer header lock的狀態更新,會受限于CAS,
 *   這種情況下確保BM_LOCKED沒有被設置.
 * 比如原子的增加/減少(AND/OR)等操作是不允許的.
 *
 * An exception is that if we have the buffer pinned, its tag can't change
 * underneath us, so we can examine the tag without locking the buffer header.
 * Also, in places we do one-time reads of the flags without bothering to
 * lock the buffer header; this is generally for situations where we don't
 * expect the flag bit being tested to be changing.
 * 一種例外情況是如果我們已有buffer pinned,該buffer的tag不能改變(在本進程之下),
 *   因此不需要鎖定buffer header就可以檢查tag了.
 * 同時,在執行一次性的flags讀取時不需要鎖定buffer header.
 * 這種情況通常用于我們不希望正在測試的flag bit將被改變.
 *
 * We can't physically remove items from a disk page if another backend has
 * the buffer pinned.  Hence, a backend may need to wait for all other pins
 * to go away.  This is signaled by storing its own PID into
 * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present,
 * there can be only one such waiter per buffer.
 * 如果其他進程有buffer pinned,那么進程不能物理的從磁盤頁面中刪除items.
 * 因此,后臺進程需要等待其他pins清除.這可以通過存儲它自己的PID到wait_backend_pid中,
 *   并設置標記位BM_PIN_COUNT_WAITER.
 * 目前,每個緩沖區只能由一個等待進程.
 *
 * We use this same struct for local buffer headers, but the locks are not
 * used and not all of the flag bits are useful either. To avoid unnecessary
 * overhead, manipulations of the state field should be done without actual
 * atomic operations (i.e. only pg_atomic_read_u32() and
 * pg_atomic_unlocked_write_u32()).
 * 本地緩沖頭部使用同樣的結構,但并不需要使用locks,而且并不是所有的標記位都使用.
 * 為了避免不必要的負載,狀態域的維護不需要實際的原子操作
 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32())
 *
 * Be careful to avoid increasing the size of the struct when adding or
 * reordering members.  Keeping it below 64 bytes (the most common CPU
 * cache line size) is fairly important for performance.
 * 在增加或者記錄成員變量時,小心避免增加結構體的大小.
 * 保持結構體大小在64字節內(通常的CPU緩存線大小)對于性能是非常重要的.
 */
typedef struct BufferDesc
{
    //buffer tag
    BufferTag   tag;            /* ID of page contained in buffer */
    //buffer索引編號(0開始),指向相應的buffer pool slot
    int         buf_id;         /* buffer's index number (from 0) */
    /* state of the tag, containing flags, refcount and usagecount */
    //tag狀態,包括flags/refcount和usagecount
    pg_atomic_uint32 state;
    //pin-count等待進程ID
    int         wait_backend_pid;   /* backend PID of pin-count waiter */
    //空閑鏈表鏈中下一個空閑的buffer
    int         freeNext;       /* link in freelist chain */
    //緩沖區內容鎖
    LWLock      content_lock;   /* to lock access to buffer contents */
} BufferDesc;

BufferTag
Buffer tag標記了buffer存儲的是磁盤中哪個block

/*
 * Buffer tag identifies which disk block the buffer contains.
 * Buffer tag標記了buffer存儲的是磁盤中哪個block
 *
 * Note: the BufferTag data must be sufficient to determine where to write the
 * block, without reference to pg_class or pg_tablespace entries.  It's
 * possible that the backend flushing the buffer doesn't even believe the
 * relation is visible yet (its xact may have started before the xact that
 * created the rel).  The storage manager must be able to cope anyway.
 * 注意:BufferTag必須足以確定如何寫block而不需要參照pg_class或者pg_tablespace數據字典信息.
 * 有可能后臺進程在刷新緩沖區的時候深圳不相信關系是可見的(事務可能在創建rel的事務之前).
 * 存儲管理器必須可以處理這些事情.
 *
 * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have
 * to be fixed to zero them, since this struct is used as a hash key.
 * 注意:如果在結構體中有填充的字節,INIT_BUFFERTAG必須將它們固定為零，因為這個結構體用作散列鍵.
 */
typedef struct buftag
{
    //物理relation標識符
    RelFileNode rnode;          /* physical relation identifier */
    ForkNumber  forkNum;
    //相對于relation起始的塊號
    BlockNumber blockNum;       /* blknum relative to begin of reln */
} BufferTag;

HTAB
哈希表的頂層控制結構.

/*
 * Top control structure for a hashtable --- in a shared table, each backend
 * has its own copy (OK since no fields change at runtime)
 * 哈希表的頂層控制結構.
 * 在這個共享哈希表中,每一個后臺進程都有自己的拷貝
 * (之所以沒有問題是因為fork出來后,在運行期沒有字段會變化)
 */
struct HTAB
{
    //指向共享的控制信息
    HASHHDR    *hctl;           /* => shared control information */
    //段開始目錄
    HASHSEGMENT *dir;           /* directory of segment starts */
    //哈希函數
    HashValueFunc hash;         /* hash function */
    //哈希鍵比較函數
    HashCompareFunc match;      /* key comparison function */
    //哈希鍵拷貝函數
    HashCopyFunc keycopy;       /* key copying function */
    //內存分配器
    HashAllocFunc alloc;        /* memory allocator */
    //內存上下文
    MemoryContext hcxt;         /* memory context if default allocator used */
    //表名(用于錯誤信息)
    char       *tabname;        /* table name (for error messages) */
    //如在共享內存中,則為T
    bool        isshared;       /* true if table is in shared memory */
    //如為T,則固定大小不能擴展
    bool        isfixed;        /* if true, don't enlarge */
    /* freezing a shared table isn't allowed, so we can keep state here */
    //不允許凍結共享表,因此這里會保存相關狀態
    bool        frozen;         /* true = no more inserts allowed */
    /* We keep local copies of these fixed values to reduce contention */
    //保存這些固定值的本地拷貝,以減少沖突
    //哈希鍵長度(以字節為單位)
    Size        keysize;        /* hash key length in bytes */
    //段大小,必須為2的冪
    long        ssize;          /* segment size --- must be power of 2 */
    //段偏移,ssize的對數
    int         sshift;         /* segment shift = log2(ssize) */
};
/*
 * Header structure for a hash table --- contains all changeable info
 * 哈希表的頭部結構 -- 存儲所有可變信息
 *
 * In a shared-memory hash table, the HASHHDR is in shared memory, while
 * each backend has a local HTAB struct.  For a non-shared table, there isn't
 * any functional difference between HASHHDR and HTAB, but we separate them
 * anyway to share code between shared and non-shared tables.
 * 在共享內存哈希表中,HASHHDR位于共享內存中,每一個后臺進程都有一個本地HTAB結構.
 * 對于非共享哈希表,HASHHDR和HTAB沒有任何功能性的不同,
 * 但無論如何,我們還是把它們區分為共享和非共享表.
 */
struct HASHHDR
{
    /*
     * The freelist can become a point of contention in high-concurrency hash
     * tables, so we use an array of freelists, each with its own mutex and
     * nentries count, instead of just a single one.  Although the freelists
     * normally operate independently, we will scavenge entries from freelists
     * other than a hashcode's default freelist when necessary.
     * 在高并發的哈希表中,空閑鏈表會成為競爭熱點,因此我們使用空閑鏈表數組,
     *   數組中的每一個元素都有自己的mutex和條目統計,而不是使用一個.
     *
     * If the hash table is not partitioned, only freeList[0] is used and its
     * spinlock is not used at all; callers' locking is assumed sufficient.
     * 如果哈希表沒有分區,那么只有freelist[0]元素是有用的,自旋鎖沒有任何用處;
     * 調用者鎖定被認為已足夠OK.
     */
    FreeListData freeList[NUM_FREELISTS];
    /* These fields can change, but not in a partitioned table */
    //這些域字段可以改變,但不適用于分區表
    /* Also, dsize can't change in a shared table, even if unpartitioned */
    //同時,就算是非分區表,共享表的dsize也不能改變
    //目錄大小
    long        dsize;          /* directory size */
    //已分配的段大小(<= dbsize)
    long        nsegs;          /* number of allocated segments (<= dsize) */
    //正在使用的最大桶ID
    uint32      max_bucket;     /* ID of maximum bucket in use */
    //進入整個哈希表的模掩碼
    uint32      high_mask;      /* mask to modulo into entire table */
    //進入低于半個哈希表的模掩碼
    uint32      low_mask;       /* mask to modulo into lower half of table */
    /* These fields are fixed at hashtable creation */
    //下面這些字段在哈希表創建時已固定
    //哈希鍵大小(以字節為單位)
    Size        keysize;        /* hash key length in bytes */
    //所有用戶元素大小(以字節為單位)
    Size        entrysize;      /* total user element size in bytes */
    //分區個數(2的冪),或者為0
    long        num_partitions; /* # partitions (must be power of 2), or 0 */
    //目標的填充因子
    long        ffactor;        /* target fill factor */
    //如目錄是固定大小,則該值為dsize的上限值
    long        max_dsize;      /* 'dsize' limit if directory is fixed size */
    //段大小,必須是2的冪
    long        ssize;          /* segment size --- must be power of 2 */
    //端偏移,ssize的對數
    int         sshift;         /* segment shift = log2(ssize) */
    //一次性分配的條目個數
    int         nelem_alloc;    /* number of entries to allocate at once */
#ifdef HASH_STATISTICS
    /*
     * Count statistics here.  NB: stats code doesn't bother with mutex, so
     * counts could be corrupted a bit in a partitioned table.
     * 統計信息.
     * 注意:統計相關的代碼不會影響mutex,因此對于分區表,統計可能有一點點問題
     */
    long        accesses;
    long        collisions;
#endif
};
/*
 * HASHELEMENT is the private part of a hashtable entry.  The caller's data
 * follows the HASHELEMENT structure (on a MAXALIGN'd boundary).  The hash key
 * is expected to be at the start of the caller's hash entry data structure.
 * HASHELEMENT是哈希表條目的私有部分.
 * 調用者的數據按照HASHELEMENT結構組織(位于MAXALIGN的邊界).
 * 哈希鍵應位于調用者hash條目數據結構的開始位置.
 */
typedef struct HASHELEMENT
{
    //鏈接到相同桶中的下一個條目
    struct HASHELEMENT *link;   /* link to next entry in same bucket */
    //該條目的哈希函數結果
    uint32      hashvalue;      /* hash function result for this entry */
} HASHELEMENT;
/* Hash table header struct is an opaque type known only within dynahash.c */
//哈希表頭部結構,非透明類型,用于dynahash.c
typedef struct HASHHDR HASHHDR;
/* Hash table control struct is an opaque type known only within dynahash.c */
//哈希表控制結構,非透明類型,用于dynahash.c
typedef struct HTAB HTAB;
/* Parameter data structure for hash_create */
//hash_create使用的參數數據結構
/* Only those fields indicated by hash_flags need be set */
//根據hash_flags標記設置相應的字段
typedef struct HASHCTL
{
    //分區個數(必須是2的冪)
    long        num_partitions; /* # partitions (must be power of 2) */
    //段大小
    long        ssize;          /* segment size */
    //初始化目錄大小
    long        dsize;          /* (initial) directory size */
    //dsize上限
    long        max_dsize;      /* limit to dsize if dir size is limited */
    //填充因子
    long        ffactor;        /* fill factor */
    //哈希鍵大小(字節為單位)
    Size        keysize;        /* hash key length in bytes */
    //參見上述數據結構注釋
    Size        entrysize;      /* total user element size in bytes */
    //
    HashValueFunc hash;         /* hash function */
    HashCompareFunc match;      /* key comparison function */
    HashCopyFunc keycopy;       /* key copying function */
    HashAllocFunc alloc;        /* memory allocator */
    MemoryContext hcxt;         /* memory context to use for allocations */
    //共享內存中的哈希頭部結構地址
    HASHHDR    *hctl;           /* location of header in shared mem */
} HASHCTL;
/* A hash bucket is a linked list of HASHELEMENTs */
//哈希桶是HASHELEMENTs鏈表
typedef HASHELEMENT *HASHBUCKET;
/* A hash segment is an array of bucket headers */
//hash segment是桶數組
typedef HASHBUCKET *HASHSEGMENT;
/*
 * Hash functions must have this signature.
 * Hash函數必須有它自己的標識
 */
typedef uint32 (*HashValueFunc) (const void *key, Size keysize);
 /*
 * Key comparison functions must have this signature.  Comparison functions
 * return zero for match, nonzero for no match.  (The comparison function
 * definition is designed to allow memcmp() and strncmp() to be used directly
 * as key comparison functions.)
 * 哈希鍵對比函數必須有自己的標識.
 * 如匹配則對比函數返回0,不匹配返回非0.
 * (對比函數定義被設計為允許在對比鍵值時可直接使用memcmp()和strncmp())
 */
typedef int (*HashCompareFunc) (const void *key1, const void *key2,
 Size keysize);
 /*
 * Key copying functions must have this signature.  The return value is not
 * used.  (The definition is set up to allow memcpy() and strlcpy() to be
 * used directly.)
 * 鍵拷貝函數必須有自己的標識.
 * 返回值無用.
 */
typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);
/*
 * Space allocation function for a hashtable --- designed to match malloc().
 * Note: there is no free function API; can't destroy a hashtable unless you
 * use the default allocator.
 * 哈希表的恐懼分配函數 -- 被設計為與malloc()函數匹配.
 * 注意:這里沒有釋放函數API;不能銷毀哈希表,除非使用默認的分配器.
 */
typedef void *(*HashAllocFunc) (Size request);

FreeListData
在一個分區哈希表中,每一個空閑鏈表與特定的hashcodes集合相關,通過下面的FREELIST_IDX()宏進行定義.
nentries跟蹤有這些hashcodes的仍存活的hashtable條目個數.

/*
 * Per-freelist data.
 * 空閑鏈表數據.
 *
 * In a partitioned hash table, each freelist is associated with a specific
 * set of hashcodes, as determined by the FREELIST_IDX() macro below.
 * nentries tracks the number of live hashtable entries having those hashcodes
 * (NOT the number of entries in the freelist, as you might expect).
 * 在一個分區哈希表中,每一個空閑鏈表與特定的hashcodes集合相關,通過下面的FREELIST_IDX()宏進行定義.
 * nentries跟蹤有這些hashcodes的仍存活的hashtable條目個數.
 * (注意不要搞錯,不是空閑的條目個數)
 *
 * The coverage of a freelist might be more or less than one partition, so it
 * needs its own lock rather than relying on caller locking.  Relying on that
 * wouldn't work even if the coverage was the same, because of the occasional
 * need to "borrow" entries from another freelist; see get_hash_entry().
 * 空閑鏈表的覆蓋范圍可能比一個分區多或少,因此需要自己的鎖而不能僅僅依賴調用者的鎖.
 * 依賴調用者鎖在覆蓋面一樣的情況下也不會起效,因為偶爾需要從另一個自由列表“借用”條目,詳細參見get_hash_entry()
 *
 * Using an array of FreeListData instead of separate arrays of mutexes,
 * nentries and freeLists helps to reduce sharing of cache lines between
 * different mutexes.
 * 使用FreeListData數組而不是一個獨立的mutexes,nentries和freelists數組有助于減少不同mutexes之間的緩存線共享.
 */
typedef struct
{
    //該空閑鏈表的自旋鎖
    slock_t     mutex;          /* spinlock for this freelist */
    //相關桶中的條目個數
    long        nentries;       /* number of entries in associated buckets */
    //空閑元素鏈
    HASHELEMENT *freeList;      /* chain of free elements */
} FreeListData;

BufferLookupEnt

/* entry for buffer lookup hashtable */
//檢索hash表的條目
typedef struct
{
    //磁盤page的tag
    BufferTag   key;            /* Tag of a disk page */
    //相關聯的buffer ID
    int         id;             /* Associated buffer ID */
} BufferLookupEnt;

二、源碼解讀

hash_search_with_hash_value函數,根據給定tag和buffer ID,插入到哈希表中,如該tag相應的條目已存在,則不處理.
其主要實現邏輯如下:
1.初始化相關變量,如根據hash值獲取idx等
2.如action為插入,檢查是否需要分裂哈希桶
3.執行相關初始化檢索,計算桶號/段號/段內編號等
4.沿著哈希鍵沖突鏈搜索匹配鍵,根據搜索結果給foundPtr賦值
5.根據輸入action執行相應的邏輯
5.1HASH_FIND,搜索
5.2HASH_REMOVE,移除
5.3HASH_ENTER_NULL,驗證分配器,轉至HASH_ENTER
5.4HASH_ENTER,如找到,則返回現存的元素,否則創建一個
6.找不到,則報錯,返回NULL

void *
hash_search_with_hash_value(HTAB *hashp,
                            const void *keyPtr,
                            uint32 hashvalue,
                            HASHACTION action,
                            bool *foundPtr)
{
    HASHHDR    *hctl = hashp->hctl;//獲取HASHHDR
    int         freelist_idx = FREELIST_IDX(hctl, hashvalue);//根據hash值獲取idx
    Size        keysize;//鍵值大小
    uint32      bucket;//桶號
    long        segment_num;//段號
    long        segment_ndx;//
    HASHSEGMENT segp;//段
    HASHBUCKET  currBucket;//當前桶號
    HASHBUCKET *prevBucketPtr;//上一個桶號
    HashCompareFunc match;//是否match?
#if HASH_STATISTICS//統計信息
    hash_accesses++;
    hctl->accesses++;
#endif
    /*
     * If inserting, check if it is time to split a bucket.
     * 如正插入,檢查是否需要分裂哈希桶
     *
     * NOTE: failure to expand table is not a fatal error, it just means we
     * have to run at higher fill factor than we wanted.  However, if we're
     * using the palloc allocator then it will throw error anyway on
     * out-of-memory, so we must do this before modifying the table.
     * 注意:擴展哈希表出現問題不是致命錯誤,只是意味著我們不得不執行比我們期望更高更高的填充因子.
     * 但是,如果我們正在使用palloc分配器,那么只要出現內存溢出則會拋出錯誤,
     *   因此我們不需在更新表前完成這個事情.
     */
    if (action == HASH_ENTER || action == HASH_ENTER_NULL)
    {
        /*
         * Can't split if running in partitioned mode, nor if frozen, nor if
         * table is the subject of any active hash_seq_search scans.  Strange
         * order of these tests is to try to check cheaper conditions first.
         * 如在分區模式/凍結/處于其他活動hash_seq_search掃描期間,則不能進行分裂.
         * 奇怪的是,這些測試的順序是先嘗試檢查成本更低的條件.
         */
        if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
            hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
            !has_seq_scans(hashp))
            (void) expand_table(hashp);
    }
    /*
     * Do the initial lookup
     * 執行初始化檢索
     */
    //計算桶號
    bucket = calc_bucket(hctl, hashvalue);
    //計算段號和段內編號
    segment_num = bucket >> hashp->sshift;
    segment_ndx = MOD(bucket, hashp->ssize);
    //獲取directory
    segp = hashp->dir[segment_num];
    if (segp == NULL)
        hash_corrupted(hashp);
    //記錄桶號
    prevBucketPtr = &segp[segment_ndx];
    currBucket = *prevBucketPtr;
    /*
     * Follow collision chain looking for matching key
     * 沿著哈希鍵沖突鏈搜索匹配鍵
     */
    //匹配函數
    match = hashp->match;       /* save one fetch in inner loop */
    //鍵大小
    keysize = hashp->keysize;   /* ditto */
    while (currBucket != NULL)
    {
        if (currBucket->hashvalue == hashvalue &&
            match(ELEMENTKEY(currBucket), keyPtr, keysize) == 0)
            break;
        prevBucketPtr = &(currBucket->link);
        currBucket = *prevBucketPtr;
#if HASH_STATISTICS
        hash_collisions++;
        hctl->collisions++;
#endif
    }
    //結果賦值
    if (foundPtr)
        *foundPtr = (bool) (currBucket != NULL);
    /*
     * OK, now what?
     * 根據action執行相關操作
     */
    switch (action)
    {
        case HASH_FIND:
            //搜索
            if (currBucket != NULL)
                return (void *) ELEMENTKEY(currBucket);
            return NULL;
        case HASH_REMOVE:
            //移除
            if (currBucket != NULL)
            {
                /* if partitioned, must lock to touch nentries and freeList */
                //如分區,在訪問條目入口和空閑鏈表時必須先請求鎖
                if (IS_PARTITIONED(hctl))
                    SpinLockAcquire(&(hctl->freeList[freelist_idx].mutex));
                /* delete the record from the appropriate nentries counter. */
                //修改nentries計數器
                Assert(hctl->freeList[freelist_idx].nentries > 0);
                hctl->freeList[freelist_idx].nentries--;
                /* remove record from hash bucket's chain. */
                //在哈希桶中鏈中刪除記錄
                *prevBucketPtr = currBucket->link;
                /* add the record to the appropriate freelist. */
                //添加記錄到正確的空閑鏈表上
                currBucket->link = hctl->freeList[freelist_idx].freeList;
                hctl->freeList[freelist_idx].freeList = currBucket;
                if (IS_PARTITIONED(hctl))
                    //釋放鎖
                    SpinLockRelease(&hctl->freeList[freelist_idx].mutex);
                /*
                 * better hope the caller is synchronizing access to this
                 * element, because someone else is going to reuse it the next
                 * time something is added to the table
                 * 調用者最好是同步訪問元素,因為其他進程在下一次添加到哈希表可以復用.
                 */
                return (void *) ELEMENTKEY(currBucket);
            }
            return NULL;
        case HASH_ENTER_NULL:
            /* ENTER_NULL does not work with palloc-based allocator */
            //驗證分配器
            Assert(hashp->alloc != DynaHashAlloc);
            /* FALL THRU */
            //繼續往下執行
        case HASH_ENTER:
            /* Return existing element if found, else create one */
            //如找到,則返回現存的元素,否則創建一個
            if (currBucket != NULL)
                return (void *) ELEMENTKEY(currBucket);
            /* disallow inserts if frozen */
            //如凍結,則不允許插入,報錯
            if (hashp->frozen)
                elog(ERROR, "cannot insert into frozen hashtable \"%s\"",
                     hashp->tabname);
            //獲取當前桶
            currBucket = get_hash_entry(hashp, freelist_idx);
            if (currBucket == NULL)
            {
                //如為NULL
                /* out of memory */
                //內存溢出
                if (action == HASH_ENTER_NULL)
                    return NULL;
                /* report a generic message */
                //報錯
                if (hashp->isshared)
                    ereport(ERROR,
                            (errcode(ERRCODE_OUT_OF_MEMORY),
                             errmsg("out of shared memory")));
                else
                    ereport(ERROR,
                            (errcode(ERRCODE_OUT_OF_MEMORY),
                             errmsg("out of memory")));
            }
            //正常
            /* link into hashbucket chain */
            //連接到哈希桶鏈中
            *prevBucketPtr = currBucket;
            currBucket->link = NULL;
            /* copy key into record */
            //拷貝鍵到記錄中
            currBucket->hashvalue = hashvalue;
            hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
            /*
             * Caller is expected to fill the data field on return.  DO NOT
             * insert any code that could possibly throw error here, as doing
             * so would leave the table entry incomplete and hence corrupt the
             * caller's data structure.
             * 調用者期望在返回時已填充了數據.
             * 不要插入有可能拋出異常的代碼,因為這樣做可能會導致哈希表條目不完整并因此破壞調用者的數據結構
             */
            return (void *) ELEMENTKEY(currBucket);
    }
    //如執行到這里,那程序就有問題了.
    elog(ERROR, "unrecognized hash action code: %d", (int) action);
    //返回NULL,讓編譯器shut up
    return NULL;                /* keep compiler quiet */
}
/* Convert a hash value to a bucket number */
//轉換hash值為桶號
static inline uint32
calc_bucket(HASHHDR *hctl, uint32 hash_val)
{
    uint32      bucket;//桶號
    bucket = hash_val & hctl->high_mask;//執行&操作
    if (bucket > hctl->max_bucket)//大于最大桶號,則返回low_mask
        bucket = bucket & hctl->low_mask;
    return bucket;
}
/*
 * Allocate a new hashtable entry if possible; return NULL if out of memory.
 * (Or, if the underlying space allocator throws error for out-of-memory,
 * we won't return at all.)
 * 如可能,分配一個新的哈希表條目.如內存溢出則返回NULL.
 * (或者,如果依賴的空間分配器因為內存溢出拋出錯誤,則不會返回任何信息)
 */
static HASHBUCKET
get_hash_entry(HTAB *hashp, int freelist_idx)
{
    HASHHDR    *hctl = hashp->hctl;
    HASHBUCKET  newElement;
    for (;;)
    {
        //循環
        /* if partitioned, must lock to touch nentries and freeList */
        //如為分區哈希表,在訪問條目和空閑鏈表時,必須鎖定
        if (IS_PARTITIONED(hctl))
            SpinLockAcquire(&hctl->freeList[freelist_idx].mutex);
        /* try to get an entry from the freelist */
        //從空閑鏈表中嘗試獲取一個條目
        newElement = hctl->freeList[freelist_idx].freeList;
        if (newElement != NULL)
            break;
        if (IS_PARTITIONED(hctl))
            SpinLockRelease(&hctl->freeList[freelist_idx].mutex);
        /*
         * No free elements in this freelist.  In a partitioned table, there
         * might be entries in other freelists, but to reduce contention we
         * prefer to first try to get another chunk of buckets from the main
         * shmem allocator.  If that fails, though, we *MUST* root through all
         * the other freelists before giving up.  There are multiple callers
         * that assume that they can allocate every element in the initially
         * requested table size, or that deleting an element guarantees they
         * can insert a new element, even if shared memory is entirely full.
         * Failing because the needed element is in a different freelist is
         * not acceptable.
         * 在空閑鏈表中沒有空閑條目.在分區哈希表中,在其他空閑鏈表中可能存在條目,
         *   但為了減少爭用,我們期望首先嘗試從主shmem分配器中獲取桶中的其他chunk.
         * 如果失敗,我們必須在放棄之前從根節點開始遍歷所有其他空閑鏈表.
         * 存在多個調用者假定它們可以在初始的請求哈希表大小內分配每一個元素,
         *   或者甚至在共享內存全滿的情況下刪除元素可以保證它們可以插入一個新元素.
         * 之所以失敗是因為所需要的元素在不同的空閑鏈表中是不可接受的.
         */
        if (!element_alloc(hashp, hctl->nelem_alloc, freelist_idx))
        {
            //本空閑鏈表不能分配內存
            int         borrow_from_idx;
            if (!IS_PARTITIONED(hctl))
                //非分區哈希表,返回NULL,意味著內存溢出了.
                return NULL;    /* out of memory */
            /* try to borrow element from another freelist */
            //嘗試從其他空閑鏈表瀏覽元素
            borrow_from_idx = freelist_idx;
            for (;;)
            {
                //------- 開始遍歷其他空閑鏈表
                borrow_from_idx = (borrow_from_idx + 1) % NUM_FREELISTS;
                if (borrow_from_idx == freelist_idx)
                    //已經完成整個空閑鏈表的遍歷,退出
                    break;      /* examined all freelists, fail */
                //獲取自旋鎖
                SpinLockAcquire(&(hctl->freeList[borrow_from_idx].mutex));
                newElement = hctl->freeList[borrow_from_idx].freeList;
                if (newElement != NULL)
                {
                    hctl->freeList[borrow_from_idx].freeList = newElement->link;
                    SpinLockRelease(&(hctl->freeList[borrow_from_idx].mutex));
                    /* careful: count the new element in its proper freelist */
                    //小心:在合適的空閑鏈表上統計新的元素
                    SpinLockAcquire(&hctl->freeList[freelist_idx].mutex);
                    hctl->freeList[freelist_idx].nentries++;
                    SpinLockRelease(&hctl->freeList[freelist_idx].mutex);
                    return newElement;
                }
                SpinLockRelease(&(hctl->freeList[borrow_from_idx].mutex));
            }
            /* no elements available to borrow either, so out of memory */
            //已無元素,內存溢出
            return NULL;
        }
    }
    /* remove entry from freelist, bump nentries */
    //從空閑鏈表中移除條目,bump nentries
    hctl->freeList[freelist_idx].freeList = newElement->link;
    hctl->freeList[freelist_idx].nentries++;
    if (IS_PARTITIONED(hctl))
        SpinLockRelease(&hctl->freeList[freelist_idx].mutex);
    return newElement;
}

三、跟蹤分析

測試腳本

10:44:12 (xdb@[local]:5432)testdb=# select * from t1 limit 10;

設置斷點,跟蹤

(gdb) b hash_search_with_hash_value
Breakpoint 1 at 0xa3a4d7: file dynahash.c, line 925.
(gdb) c
Continuing.
Breakpoint 2, hash_search_with_hash_value (hashp=0x13af8f8, keyPtr=0x7ffdb7d63e40, hashvalue=3920871586, action=HASH_ENTER, 
    foundPtr=0x7ffdb7d63e3f) at dynahash.c:925
925     HASHHDR    *hctl = hashp->hctl;
(gdb)

輸入參數

(gdb) p *hashp
$3 = {hctl = 0x13af990, dir = 0x13afda8, hash = 0xa3bf74 <tag_hash>, match = 0x4791a0 <memcmp@plt>, 
  keycopy = 0x479690 <memcpy@plt>, alloc = 0xa39589 <DynaHashAlloc>, hcxt = 0x13af7e0, 
  tabname = 0x13af958 "LOCALLOCK hash", isshared = false, isfixed = false, frozen = false, keysize = 20, ssize = 256, 
  sshift = 8}
(gdb) p *hashp->hctl
$4 = {freeList = {{mutex = 0 '\000', nentries = 0, freeList = 0x13b1378}, {mutex = 0 '\000', nentries = 0, 
      freeList = 0x0} <repeats 31 times>}, dsize = 256, nsegs = 1, max_bucket = 15, high_mask = 31, low_mask = 15, 
  keysize = 20, entrysize = 80, num_partitions = 0, ffactor = 1, max_dsize = -1, ssize = 256, sshift = 8, nelem_alloc = 42}
(gdb) p *(int *)keyPtr
$5 = 16402

1.初始化相關變量,如根據hash值獲取idx等

(gdb) n
926     int         freelist_idx = FREELIST_IDX(hctl, hashvalue);
(gdb) 
949     if (action == HASH_ENTER || action == HASH_ENTER_NULL)
(gdb) p freelist_idx
$6 = 0
(gdb) p *hctl->freeList[0].freeList
$7 = {link = 0x13b1318, hashvalue = 3920871586}
(gdb) 
(gdb) p hctl->freeList[0]
$8 = {mutex = 0 '\000', nentries = 0, freeList = 0x13b1378}

2.如action為插入,檢查是否需要分裂哈希桶

(gdb) n
956         if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
(gdb) 
957             hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
(gdb) 
956         if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
(gdb)

3.執行相關初始化檢索,計算桶號/段號/段內編號等

(gdb) p bucket
$9 = 2
(gdb) p segment_num
$10 = 0
(gdb) p segment_ndx
$11 = 2
(gdb) p segp
$12 = (HASHSEGMENT) 0x13b05c0
(gdb) p *segp
$13 = (HASHBUCKET) 0x0
(gdb) 
(gdb) n
975     prevBucketPtr = &segp[segment_ndx];
(gdb) 
976     currBucket = *prevBucketPtr;
(gdb) p
$14 = (HASHBUCKET) 0x0
(gdb) n
981     match = hashp->match;       /* save one fetch in inner loop */
(gdb) p currBucket
$15 = (HASHBUCKET) 0x0
(gdb)

4.沿著哈希鍵沖突鏈搜索匹配鍵,根據搜索結果給foundPtr賦值

(gdb) n
982     keysize = hashp->keysize;   /* ditto */
(gdb) 
984     while (currBucket != NULL)
(gdb) p keysize
$16 = 20
(gdb) p match
$17 = (HashCompareFunc) 0x4791a0 <memcmp@plt>
(gdb) p currBucket
$18 = (HASHBUCKET) 0x0
(gdb) n
997     if (foundPtr)
(gdb) 
998         *foundPtr = (bool) (currBucket != NULL);
(gdb)

5.根據輸入action執行相應的邏輯
5.1HASH_FIND,搜索
5.2HASH_REMOVE,移除
5.3HASH_ENTER_NULL,驗證分配器,轉至HASH_ENTER
5.4HASH_ENTER,如找到,則返回現存的元素,否則創建一個

(gdb) 
1003        switch (action)
(gdb) p action
$19 = HASH_ENTER
(gdb) n
1047                if (currBucket != NULL)
(gdb) 
1051                if (hashp->frozen)
(gdb) 
1055                currBucket = get_hash_entry(hashp, freelist_idx);
(gdb)

進入get_hash_entry,如可能,分配一個新的哈希表條目.如內存溢出則返回NULL.

1055                currBucket = get_hash_entry(hashp, freelist_idx);
(gdb) step
get_hash_entry (hashp=0x13af8f8, freelist_idx=0) at dynahash.c:1252
1252        HASHHDR    *hctl = hashp->hctl;
(gdb) 
(gdb) n
1258            if (IS_PARTITIONED(hctl))
(gdb) p *hctl
$20 = {freeList = {{mutex = 0 '\000', nentries = 0, freeList = 0x13b1378}, {mutex = 0 '\000', nentries = 0, 
      freeList = 0x0} <repeats 31 times>}, dsize = 256, nsegs = 1, max_bucket = 15, high_mask = 31, low_mask = 15, 
  keysize = 20, entrysize = 80, num_partitions = 0, ffactor = 1, max_dsize = -1, ssize = 256, sshift = 8, nelem_alloc = 42}
(gdb) n
1262            newElement = hctl->freeList[freelist_idx].freeList;
(gdb) 
1264            if (newElement != NULL)
(gdb) p newElement
$21 = (HASHBUCKET) 0x13b1378
(gdb) p *newElement
$22 = {link = 0x13b1318, hashvalue = 3920871586}
(gdb) n
1265                break;
(gdb) 
1322        hctl->freeList[freelist_idx].freeList = newElement->link;
(gdb) 
1323        hctl->freeList[freelist_idx].nentries++;
(gdb) 
1325        if (IS_PARTITIONED(hctl))
(gdb) p *newElement->link
$23 = {link = 0x13b12b8, hashvalue = 2593617408}
(gdb) n
1328        return newElement;
(gdb) 
1329    }
(gdb)

回到hash_search_with_hash_value

hash_search_with_hash_value (hashp=0x13af8f8, keyPtr=0x7ffdb7d63e40, hashvalue=3920871586, action=HASH_ENTER, 
    foundPtr=0x7ffdb7d63e3f) at dynahash.c:1056
1056                if (currBucket == NULL)
(gdb) n
1073                *prevBucketPtr = currBucket;
(gdb) p *currBucket
$24 = {link = 0x13b1318, hashvalue = 3920871586}
(gdb) n
1074                currBucket->link = NULL;
(gdb) 
1077                currBucket->hashvalue = hashvalue;
(gdb) 
1078                hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
(gdb) p *currBucket
$25 = {link = 0x0, hashvalue = 3920871586}
(gdb) p *prevBucketPtr
$26 = (HASHBUCKET) 0x13b1378
(gdb) p **prevBucketPtr
$27 = {link = 0x0, hashvalue = 3920871586}
(gdb) n
1087                return (void *) ELEMENTKEY(currBucket);
(gdb) p (void *) ELEMENTKEY(currBucket)
$28 = (void *) 0x13b1388

找到entry,返回

(gdb) n
1093    }
(gdb) 
hash_search (hashp=0x13af8f8, keyPtr=0x7ffdb7d63e40, action=HASH_ENTER, foundPtr=0x7ffdb7d63e3f) at dynahash.c:916
916 }
(gdb)

到此，相信大家對“PostgreSQL中hash_search_with_hash_value函數有什么作用”有了更深的了解，不妨來實際操作一番吧！這里是億速云網站，更多相關內容可以進入相關頻道進行查詢，關注我們，繼續學習！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

PostgreSQL中hash_search_with_hash_value函數有什么作用

一、數據結構

二、源碼解讀

三、跟蹤分析

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

PostgreSQL中hash_search_with_hash_value函數有什么作用

一、數據結構

二、源碼解讀

三、跟蹤分析

猜你喜歡

最新資訊

相關推薦

相關標簽