您好,登錄后才能下訂單哦!
本篇內容主要講解“PostgreSQL中mdread函數有什么作用”,感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷,實用性強。下面就讓小編來帶大家學習“PostgreSQL中mdread函數有什么作用”吧!
PostgreSQL存儲管理的mdread函數是magnetic disk存儲管理中負責讀取的函數.
smgrsw
f_smgr函數指針結構體定義了獨立的存儲管理模塊和smgr.c之間的API函數.
md是magnetic disk的縮寫.
除了md,先前PG還支持Sony WORM optical disk jukebox and persistent main memory這兩種存儲方式,
但在后面只剩下magnetic disk,其余的已被廢棄不再支持.
“magnetic disk”本身的名稱也存在誤導,實際上md可以支持操作系統提供標準文件系統的任何類型的設備.
/* * This struct of function pointers defines the API between smgr.c and * any individual storage manager module. Note that smgr subfunctions are * generally expected to report problems via elog(ERROR). An exception is * that smgr_unlink should use elog(WARNING), rather than erroring out, * because we normally unlink relations during post-commit/abort cleanup, * and so it's too late to raise an error. Also, various conditions that * would normally be errors should be allowed during bootstrap and/or WAL * recovery --- see comments in md.c for details. * 函數指針結構體定義了獨立的存儲管理模塊和smgr.c之間的API函數. * 注意smgr子函數通常會通過elog(ERROR)報告錯誤. * 其中一個例外是smgr_unlink應該使用elog(WARNING),而不是把錯誤拋出, * 因為通過來說在事務提交/回滾清理期間才會解鏈接(unlinke)關系, * 因此這時候拋出錯誤就顯得太晚了. * 同時,在bootstrap和/或WAL恢復期間,各種可能會出現錯誤的情況也應被允許 --- 詳細可查看md.c中的注釋. */ typedef struct f_smgr { void (*smgr_init) (void); /* may be NULL */ void (*smgr_shutdown) (void); /* may be NULL */ void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, bool isRedo); bool (*smgr_exists) (SMgrRelation reln, ForkNumber forknum); void (*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo); void (*smgr_extend) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum); void (*smgr_read) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer); void (*smgr_write) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_writeback) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, BlockNumber nblocks); BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum); void (*smgr_truncate) (SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks); void (*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum); void (*smgr_pre_ckpt) (void); /* may be NULL */ void (*smgr_sync) (void); /* may be NULL */ void (*smgr_post_ckpt) (void); /* may be NULL */ } f_smgr; /* md是magnetic disk的縮寫. 除了md,先前PG還支持Sony WORM optical disk jukebox and persistent main memory這兩種存儲方式, 但在后面只剩下magnetic disk,其余的已被廢棄不再支持. "magnetic disk"本身的名稱也存在誤導,實際上md可以支持操作系統提供標準文件系統的任何類型的設備. */ static const f_smgr smgrsw[] = { /* magnetic disk */ { .smgr_init = mdinit, .smgr_shutdown = NULL, .smgr_close = mdclose, .smgr_create = mdcreate, .smgr_exists = mdexists, .smgr_unlink = mdunlink, .smgr_extend = mdextend, .smgr_prefetch = mdprefetch, .smgr_read = mdread, .smgr_write = mdwrite, .smgr_writeback = mdwriteback, .smgr_nblocks = mdnblocks, .smgr_truncate = mdtruncate, .smgr_immedsync = mdimmedsync, .smgr_pre_ckpt = mdpreckpt, .smgr_sync = mdsync, .smgr_post_ckpt = mdpostckpt } };
MdfdVec
magnetic disk存儲管理在自己的描述符池中跟蹤打開的文件描述符.
之所以這樣做是因為便于支持超過os文件大小上限(通常是2GB)的關系.
為了達到這個目的,我們拆分關系為多個比OS文件大小上限要小的”segment”文件.
段大小通過pg_config.h中定義的RELSEG_SIZE配置參數設置.
/* * The magnetic disk storage manager keeps track of open file * descriptors in its own descriptor pool. This is done to make it * easier to support relations that are larger than the operating * system's file size limit (often 2GBytes). In order to do that, * we break relations up into "segment" files that are each shorter than * the OS file size limit. The segment size is set by the RELSEG_SIZE * configuration constant in pg_config.h. * magnetic disk存儲管理在自己的描述符池中跟蹤打開的文件描述符. * 之所以這樣做是因為便于支持超過os文件大小上限(通常是2GB)的關系. * 為了達到這個目的,我們拆分關系為多個比OS文件大小上限要小的"segment"文件. * 段大小通過pg_config.h中定義的RELSEG_SIZE配置參數設置. * * On disk, a relation must consist of consecutively numbered segment * files in the pattern * -- Zero or more full segments of exactly RELSEG_SIZE blocks each * -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks * -- Optionally, any number of inactive segments of size 0 blocks. * The full and partial segments are collectively the "active" segments. * Inactive segments are those that once contained data but are currently * not needed because of an mdtruncate() operation. The reason for leaving * them present at size zero, rather than unlinking them, is that other * backends and/or the checkpointer might be holding open file references to * such segments. If the relation expands again after mdtruncate(), such * that a deactivated segment becomes active again, it is important that * such file references still be valid --- else data might get written * out to an unlinked old copy of a segment file that will eventually * disappear. * 在磁盤上,關系必須由按照某種模式連續編號的segment files組成. * -- 每個RELSEG_SIZE塊的另段或多個完整段 * -- 大小滿足0 <= size < RELSEG_SIZE blocks的一個部分段 * -- 可選的,大小為0 blocks的N個非活動段 * 完整和部分段統稱為活動段.非活動段指的是哪些因為mdtruncate()操作而出現的包含數據但目前不需要的. * 保留這些大小為0的非活動段而不是unlinking的原因是其他進程和/或checkpointer進程可能 * 持有這些段的文件依賴. * 如果關系在mdtruncate()之后再次擴展了,這樣一個無效的會重新變為活動段, * 因此文件依賴仍然保持有效是很重要的 * --- 否則數據可能寫出到未經鏈接的舊segment file拷貝上,會時不時的出現數據丟失. * * File descriptors are stored in the per-fork md_seg_fds arrays inside * SMgrRelation. The length of these arrays is stored in md_num_open_segs. * Note that a fork's md_num_open_segs having a specific value does not * necessarily mean the relation doesn't have additional segments; we may * just not have opened the next segment yet. (We could not have "all * segments are in the array" as an invariant anyway, since another backend * could extend the relation while we aren't looking.) We do not have * entries for inactive segments, however; as soon as we find a partial * segment, we assume that any subsequent segments are inactive. * 文件描述符在SMgrRelation中的per-fork md_seg_fds數組存儲. * 這些數組的長度存儲在md_num_open_segs中. * 注意一個fork的md_num_open_segs有一個特定值并不必要意味著關系不能有額外的段, * 我們只是還沒有打開下一個段而已. * (但不管怎樣,我們不可能把"所有段都放在數組中"作為一個不變式看待, * 因為其他后臺進程在尚未檢索時已經擴展了關系) * 但是,我們不需要持有非活動段的條目,只要我們一旦發現部分段,那么就可以假定接下來的段是非活動的. * * The entire MdfdVec array is palloc'd in the MdCxt memory context. * 整個MdfdVec數組通過palloc在MdCxt內存上下文中分配. */ typedef struct _MdfdVec { //文件描述符池中該文件的編號 File mdfd_vfd; /* fd number in fd.c's pool */ //段號,從0起算 BlockNumber mdfd_segno; /* segment number, from 0 */ } MdfdVec;
mdread() — 從relation中讀取相應的block.
源碼較為簡單,主要是調用FileRead函數執行實際的讀取操作.
/* * mdread() -- Read the specified block from a relation. * mdread() -- 從relation中讀取相應的block */ void mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer) { off_t seekpos;//seek的位置 int nbytes;//bytes MdfdVec *v;//md文件描述符向量數組 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend); //獲取向量數組 v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY); //獲取block偏移 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); //驗證 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); //讀取文件,讀入buffer中,返回讀取的字節數 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_READ); //跟蹤 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend, nbytes, BLCKSZ); if (nbytes != BLCKSZ) { //讀取的字節數不等于塊大小,報錯 if (nbytes < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read block %u in file \"%s\": %m", blocknum, FilePathName(v->mdfd_vfd)))); /* * Short read: we are at or past EOF, or we read a partial block at * EOF. Normally this is an error; upper levels should never try to * read a nonexistent block. However, if zero_damaged_pages is ON or * we are InRecovery, we should instead return zeroes without * complaining. This allows, for example, the case of trying to * update a block that was later truncated away. * Short read:處于EOF或者在EOF之后,或者在EOF處讀取了一個部分塊. * 通常來說,這是一個錯誤,高層代碼不應嘗試讀取一個不存在的block. * 但是,如果zero_damaged_pages參數設置為ON或者處于InRecovery狀態,那么應該返回0而不報錯. * 比如,這可以允許嘗試更新一個塊但隨后就給截斷的情況. */ if (zero_damaged_pages || InRecovery) MemSet(buffer, 0, BLCKSZ); else ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("could not read block %u in file \"%s\": read only %d of %d bytes", blocknum, FilePathName(v->mdfd_vfd), nbytes, BLCKSZ))); } }
測試腳本
11:15:11 (xdb@[local]:5432)testdb=# insert into t1(id) select generate_series(100,500);
啟動gdb,跟蹤
查看調用棧
(gdb) b mdread Breakpoint 3 at 0x8b669b: file md.c, line 738. (gdb) c Continuing. Breakpoint 3, mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738 738 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum, (gdb) bt #0 mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738 #1 0x00000000008b92d5 in smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at smgr.c:628 #2 0x00000000008793f9 in ReadBuffer_common (smgr=0x2d09be0, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd5fb2948b) at bufmgr.c:890 #3 0x0000000000878cd4 in ReadBufferExtended (reln=0x7f3836e1e788, forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0) at bufmgr.c:664 #4 0x0000000000878bb1 in ReadBuffer (reln=0x7f3836e1e788, blockNum=50) at bufmgr.c:596 #5 0x00000000004eeb96 in ReadBufferBI (relation=0x7f3836e1e788, targetBlock=50, bistate=0x0) at hio.c:87 #6 0x00000000004ef387 in RelationGetBufferForTuple (relation=0x7f3836e1e788, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffd5fb295ec, vmbuffer_other=0x0) at hio.c:415 #7 0x00000000004df1f8 in heap_insert (relation=0x7f3836e1e788, tup=0x2ca6770, cid=0, options=0, bistate=0x0) at heapam.c:2468 #8 0x0000000000709dda in ExecInsert (mtstate=0x2ca4c40, slot=0x2ca3418, planSlot=0x2ca3418, estate=0x2ca48d8, canSetTag=true) at nodeModifyTable.c:529 #9 0x000000000070c475 in ExecModifyTable (pstate=0x2ca4c40) at nodeModifyTable.c:2159 #10 0x00000000006e05cb in ExecProcNodeFirst (node=0x2ca4c40) at execProcnode.c:445 #11 0x00000000006d552e in ExecProcNode (node=0x2ca4c40) at ../../../src/include/executor/executor.h:247 #12 0x00000000006d7d66 in ExecutePlan (estate=0x2ca48d8, planstate=0x2ca4c40, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2d41a30, execute_once=true) at execMain.c:1723 #13 0x00000000006d5af8 in standard_ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364 #14 0x00000000006d5920 in ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:307 #15 0x00000000008c1092 in ProcessQuery (plan=0x2d418b8, sourceText=0x2c7eec8 "insert into t1(id) select generate_series(100,500);", params=0x0, queryEnv=0x0, dest=0x2d41a30, ---Type <return> to continue, or q <return> to quit--- completionTag=0x7ffd5fb29b80 "") at pquery.c:161 #16 0x00000000008c29a1 in PortalRunMulti (portal=0x2ce4488, isTopLevel=true, setHoldSnapshot=false, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:1286 #17 0x00000000008c1f7a in PortalRun (portal=0x2ce4488, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:799 #18 0x00000000008bbf16 in exec_simple_query (query_string=0x2c7eec8 "insert into t1(id) select generate_series(100,500);") at postgres.c:1145 #19 0x00000000008c01a1 in PostgresMain (argc=1, argv=0x2ca8af8, dbname=0x2ca8960 "testdb", username=0x2c7bba8 "xdb") at postgres.c:4182 #20 0x000000000081e07c in BackendRun (port=0x2ca0940) at postmaster.c:4361 #21 0x000000000081d7ef in BackendStartup (port=0x2ca0940) at postmaster.c:4033 #22 0x0000000000819be9 in ServerLoop () at postmaster.c:1706 #23 0x000000000081949f in PostmasterMain (argc=1, argv=0x2c79b60) at postmaster.c:1379 #24 0x0000000000742941 in main (argc=1, argv=0x2c79b60) at main.c:228 (gdb)
獲取讀取的偏移
(gdb) n 744 v = _mdfd_getseg(reln, forknum, blocknum, false, (gdb) 747 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); (gdb) p *v $1 = {mdfd_vfd = 26, mdfd_segno = 0} (gdb) p BLCKSZ $2 = 8192 (gdb) p blocknum $3 = 50 (gdb) p RELSEG_SIZE $4 = 131072 (gdb) n 749 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); (gdb) p seekpos $5 = 409600 (gdb)
執行讀取操作
(gdb) n 751 if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos) (gdb) 757 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, WAIT_EVENT_DATA_FILE_READ); (gdb) 759 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, (gdb) p nbytes $6 = 8192 (gdb) p *buffer $7 = 1 '\001' (gdb) n 767 if (nbytes != BLCKSZ) (gdb) 792 } (gdb) smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "\001") at smgr.c:629 629 } (gdb)
到此,相信大家對“PostgreSQL中mdread函數有什么作用”有了更深的了解,不妨來實際操作一番吧!這里是億速云網站,更多相關內容可以進入相關頻道進行查詢,關注我們,繼續學習!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。