您好,登錄后才能下訂單哦!
這篇文章主要講解了“PostgreSQL中sort_inner_and_outer函數分析”,文中的講解內容簡單清晰,易于學習與理解,下面請大家跟著小編的思路慢慢深入,一起來研究和學習“PostgreSQL中sort_inner_and_outer函數分析”吧!
merge join的算法實現偽代碼如下:
READ data_set_1 SORT BY JOIN KEY TO temp_ds1
READ data_set_2 SORT BY JOIN KEY TO temp_ds2
READ ds1_row FROM temp_ds1
READ ds2_row FROM temp_ds2
WHILE NOT eof ON temp_ds1,temp_ds2 LOOP
IF ( temp_ds1.key = temp_ds2.key ) OUTPUT JOIN ds1_row,ds2_row
ELSIF ( temp_ds1.key <= temp_ds2.key ) READ ds1_row FROM temp_ds1
ELSIF ( temp_ds1.key => temp_ds2.key ) READ ds2_row FROM temp_ds2
END LOOP
Cost相關
注意:實際使用的參數值通過系統配置文件定義,而不是這里的常量定義!
typedef double Cost; /* execution cost (in page-access units) */ /* defaults for costsize.c's Cost parameters */ /* NB: cost-estimation code should use the variables, not these constants! */ /* 注意:實際值通過系統配置文件定義,而不是這里的常量定義! */ /* If you change these, update backend/utils/misc/postgresql.sample.conf */ #define DEFAULT_SEQ_PAGE_COST 1.0 //順序掃描page的成本 #define DEFAULT_RANDOM_PAGE_COST 4.0 //隨機掃描page的成本 #define DEFAULT_CPU_TUPLE_COST 0.01 //處理一個元組的CPU成本 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005 //處理一個索引元組的CPU成本 #define DEFAULT_CPU_OPERATOR_COST 0.0025 //執行一次操作或函數的CPU成本 #define DEFAULT_PARALLEL_TUPLE_COST 0.1 //并行執行,從一個worker傳輸一個元組到另一個worker的成本 #define DEFAULT_PARALLEL_SETUP_COST 1000.0 //構建并行執行環境的成本 #define DEFAULT_EFFECTIVE_CACHE_SIZE 524288 /*先前已有介紹, measured in pages */ double seq_page_cost = DEFAULT_SEQ_PAGE_COST; double random_page_cost = DEFAULT_RANDOM_PAGE_COST; double cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST; double cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST; double cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST; double parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST; double parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST; int effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE; Cost disable_cost = 1.0e10;//1后面10個0,通過設置一個巨大的成本,讓優化器自動放棄此路徑 int max_parallel_workers_per_gather = 2;//每次gather使用的worker數
sort_inner_and_outer函數嘗試構造merge join訪問路徑.
構造過程中的成本估算實現函數initial_cost_mergejoin和final_cost_mergejoin在下一節介紹.
//------------------------------------------------ sort_inner_and_outer /* * sort_inner_and_outer * Create mergejoin join paths by explicitly sorting both the outer and * inner join relations on each available merge ordering. * 顯式排序outer和inner表,創建mergejoin連接路徑. * * 'joinrel' is the join relation * joinrel-連接后新生成的relation * 'outerrel' is the outer join relation * outerrel-參與連接的outer relation(俗稱外表,驅動表) * 'innerrel' is the inner join relation * innerrel-參與連接的inner relation(俗稱內表) * 'jointype' is the type of join to do * jointype-連接類型 * 'extra' contains additional input values * extra-額外的輸入參數 */ static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, JoinPathExtraData *extra) { JoinType save_jointype = jointype; Path *outer_path; Path *inner_path; Path *cheapest_partial_outer = NULL; Path *cheapest_safe_inner = NULL; List *all_pathkeys; ListCell *l; /* * We only consider the cheapest-total-cost input paths, since we are * assuming here that a sort is required. We will consider * cheapest-startup-cost input paths later, and only if they don't need a * sort. * 由于我們假定排序是必須的,只考慮總成本最低的路徑。 * 后續會嘗試啟動成本最低的路徑,并且只在它們不需要排序的情況下。 * * This function intentionally does not consider parameterized input * paths, except when the cheapest-total is parameterized. If we did so, * we'd have a combinatorial explosion of mergejoin paths of dubious * value. This interacts with decisions elsewhere that also discriminate * against mergejoins with parameterized inputs; see comments in * src/backend/optimizer/README. * 該函數特意不考慮參數化輸入路徑,除非成本最低路徑是參數化的。 * 如果考慮了參數化輸入路徑,mergejoin將有一個數量巨大的組合,成本上劃不來。 * 而且這將與其他地方的決策相互作用,這些決策同樣會"歧視"使用參數化輸入的mergejoin; * 請參閱src/backend/optimizer/README中的注釋。 */ outer_path = outerrel->cheapest_total_path; inner_path = innerrel->cheapest_total_path; /* * If either cheapest-total path is parameterized by the other rel, we * can't use a mergejoin. (There's no use looking for alternative input * paths, since these should already be the least-parameterized available * paths.) * 如果任意一個總成本最低的路徑使用了參數化訪問路徑,那么不能使用合并連接。 * (沒有必要尋找替代的輸入路徑,因為這些路徑已經是參數化最少的可用路徑了。) */ if (PATH_PARAM_BY_REL(outer_path, innerrel) || PATH_PARAM_BY_REL(inner_path, outerrel)) return; /* * If unique-ification is requested, do it and then handle as a plain * inner join. * 如果需要保證唯一,創建唯一性訪問路徑并設置為JOIN_INNER連接類型 */ if (jointype == JOIN_UNIQUE_OUTER) { outer_path = (Path *) create_unique_path(root, outerrel, outer_path, extra->sjinfo); Assert(outer_path); jointype = JOIN_INNER; } else if (jointype == JOIN_UNIQUE_INNER) { inner_path = (Path *) create_unique_path(root, innerrel, inner_path, extra->sjinfo); Assert(inner_path); jointype = JOIN_INNER; } /* * If the joinrel is parallel-safe, we may be able to consider a partial * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the * outer path will be partial, and therefore we won't be able to properly * guarantee uniqueness. Similarly, we can't handle JOIN_FULL and * JOIN_RIGHT, because they can produce false null extended rows. Also, * the resulting path must not be parameterized. * 如果連接可以并行執行,那么可以考慮并行合并連接。 * 但是,PG不能處理JOIN_UNIQUE_OUTER,因為外部路徑是部分的,因此不能正確地保證惟一性。 * 類似地,PG不能處理JOIN_FULL和JOIN_RIGHT,因為它們會產生假空擴展行。 * 此外,生成的路徑不能被參數化。 */ if (joinrel->consider_parallel && save_jointype != JOIN_UNIQUE_OUTER && save_jointype != JOIN_FULL && save_jointype != JOIN_RIGHT && outerrel->partial_pathlist != NIL && bms_is_empty(joinrel->lateral_relids)) { cheapest_partial_outer = (Path *) linitial(outerrel->partial_pathlist); if (inner_path->parallel_safe) cheapest_safe_inner = inner_path; else if (save_jointype != JOIN_UNIQUE_INNER) cheapest_safe_inner = get_cheapest_parallel_safe_total_inner(innerrel->pathlist); } /* * Each possible ordering of the available mergejoin clauses will generate * a differently-sorted result path at essentially the same cost. We have * no basis for choosing one over another at this level of joining, but * some sort orders may be more useful than others for higher-level * mergejoins, so it's worth considering multiple orderings. * 可用的mergejoin子句的每一種可能的排序都將以基本相同的代價生成不同的排序結果路徑。 * 在這種連接級別上,我們沒有選擇一個排序的基礎,但是對于更高層的合并, * 某些排序順序可能比其他更有用,因此值得考慮多種排序的順序。 * * Actually, it's not quite true that every mergeclause ordering will * generate a different path order, because some of the clauses may be * partially redundant (refer to the same EquivalenceClasses). Therefore, * what we do is convert the mergeclause list to a list of canonical * pathkeys, and then consider different orderings of the pathkeys. * 實際上,并不是每個mergeclause都會生成不同的路徑順序,因為有些子句可能是部分冗余的(請參閱等價類)。 * 因此,要做的是將mergeclause列表轉換為一個規范路徑鍵列表,然后考慮路徑鍵的不同順序。 * * Generating a path for *every* permutation of the pathkeys doesn't seem * like a winning strategy; the cost in planning time is too high. For * now, we generate one path for each pathkey, listing that pathkey first * and the rest in random order. This should allow at least a one-clause * mergejoin without re-sorting against any other possible mergejoin * partner path. But if we've not guessed the right ordering of secondary * keys, we may end up evaluating clauses as qpquals when they could have * been done as mergeclauses. (In practice, it's rare that there's more * than two or three mergeclauses, so expending a huge amount of thought * on that is probably not worth it.) * 為每個路徑鍵的排列生成路徑似乎不是一個好的策略,因為計劃時間成本太高了。 * 現在,我們為每個pathkey生成一個路徑,首先列出這個pathkey,然后按隨機順序列出其余的路徑。 * 這應該至少允許一個單子句的mergejoin,而不需要對任何其他可能的mergejoin路徑進行重新排序。 * 但如果沒有正確判斷二級鍵的順序,那么可能會以qpquals的形式計算條件子句,而這些子句本可以作為mergeclauses完成。 * (實際上,很少有兩個或三個以上的mergeclauses,所以花大量時間在這上面可能不值得)。 * * The pathkey order returned by select_outer_pathkeys_for_merge() has * some heuristics behind it (see that function), so be sure to try it * exactly as-is as well as making variants. * 通過調用select_outer_pathkeys_for_merge函數返回的排序鍵順序pathkey * 背后有一些啟發式函數(請參閱該函數),所以一定要按原樣嘗試它,并創建變體。 */ all_pathkeys = select_outer_pathkeys_for_merge(root, extra->mergeclause_list, joinrel); foreach(l, all_pathkeys)//遍歷所有可用的排序鍵 { List *front_pathkey = (List *) lfirst(l); List *cur_mergeclauses; List *outerkeys; List *innerkeys; List *merge_pathkeys; /* Make a pathkey list with this guy first */ if (l != list_head(all_pathkeys)) outerkeys = lcons(front_pathkey, list_delete_ptr(list_copy(all_pathkeys), front_pathkey)); else outerkeys = all_pathkeys; /* no work at first one... */ /* Sort the mergeclauses into the corresponding ordering */ cur_mergeclauses = find_mergeclauses_for_outer_pathkeys(root, outerkeys, extra->mergeclause_list); /* Should have used them all... */ Assert(list_length(cur_mergeclauses) == list_length(extra->mergeclause_list)); /* Build sort pathkeys for the inner side */ innerkeys = make_inner_pathkeys_for_merge(root, cur_mergeclauses, outerkeys); /* Build pathkeys representing output sort order */ merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, outerkeys); /* * And now we can make the path. * * Note: it's possible that the cheapest paths will already be sorted * properly. try_mergejoin_path will detect that case and suppress an * explicit sort step, so we needn't do so here. * 注意:成本最低的路徑可能已被正確排序。 * try_mergejoin_path將檢測這種情況并禁止顯式排序步驟,因此在這里不需要這樣做。 */ try_mergejoin_path(root, joinrel, outer_path, inner_path, merge_pathkeys, cur_mergeclauses, outerkeys, innerkeys, jointype, extra, false); /* * If we have partial outer and parallel safe inner path then try * partial mergejoin path. * 并行處理 */ if (cheapest_partial_outer && cheapest_safe_inner) try_partial_mergejoin_path(root, joinrel, cheapest_partial_outer, cheapest_safe_inner, merge_pathkeys, cur_mergeclauses, outerkeys, innerkeys, jointype, extra); } } //----------------------------------- try_mergejoin_path /* * try_mergejoin_path * Consider a merge join path; if it appears useful, push it into * the joinrel's pathlist via add_path(). * 嘗試merge join path,如可用,則把該路徑通過add_path函數放在joinrel的pathlist鏈表中 */ static void try_mergejoin_path(PlannerInfo *root, RelOptInfo *joinrel, Path *outer_path, Path *inner_path, List *pathkeys, List *mergeclauses, List *outersortkeys, List *innersortkeys, JoinType jointype, JoinPathExtraData *extra, bool is_partial) { Relids required_outer; JoinCostWorkspace workspace; if (is_partial) { try_partial_mergejoin_path(root, joinrel, outer_path, inner_path, pathkeys, mergeclauses, outersortkeys, innersortkeys, jointype, extra);//并行執行 return; } /* * Check to see if proposed path is still parameterized, and reject if the * parameterization wouldn't be sensible. * 檢查建議的路徑是否仍然參數化,如果參數化不合理,則舍棄。 */ required_outer = calc_non_nestloop_required_outer(outer_path, inner_path); if (required_outer && !bms_overlap(required_outer, extra->param_source_rels)) { /* Waste no memory when we reject a path here */ bms_free(required_outer); return; } /* * If the given paths are already well enough ordered, we can skip doing * an explicit sort. * 如果給定的路徑已完成排序,則跳過顯式排序. */ if (outersortkeys && pathkeys_contained_in(outersortkeys, outer_path->pathkeys)) outersortkeys = NIL; if (innersortkeys && pathkeys_contained_in(innersortkeys, inner_path->pathkeys)) innersortkeys = NIL; /* * See comments in try_nestloop_path(). */ initial_cost_mergejoin(root, &workspace, jointype, mergeclauses, outer_path, inner_path, outersortkeys, innersortkeys, extra);//初始化mergejoin if (add_path_precheck(joinrel, workspace.startup_cost, workspace.total_cost, pathkeys, required_outer))//執行前置檢查 { add_path(joinrel, (Path *) create_mergejoin_path(root, joinrel, jointype, &workspace, extra, outer_path, inner_path, extra->restrictlist, pathkeys, required_outer, mergeclauses, outersortkeys, innersortkeys));//創建并添加路徑 } else { /* Waste no memory when we reject a path here */ bms_free(required_outer); } } //----------------------- create_mergejoin_path /* * create_mergejoin_path * Creates a pathnode corresponding to a mergejoin join between * two relations * 創建merge join訪問路徑 * * 'joinrel' is the join relation * joinrel-連接后新生成的relation * 'jointype' is the type of join required * jointype-連接類型 * 'workspace' is the result from initial_cost_mergejoin * workspace-通過函數initial_cost_mergejoin返回的結果 * 'extra' contains various information about the join * extra-額外的輸入參數 * 'outer_path' is the outer path * outer_path-outer relation訪問路徑 * 'inner_path' is the inner path * inner_path-inner relation訪問路徑 * 'restrict_clauses' are the RestrictInfo nodes to apply at the join * restrict_clauses-約束條件子句 * 'pathkeys' are the path keys of the new join path * pathkeys-排序鍵 * 'required_outer' is the set of required outer rels * required_outer-如需要outer rels,則在此保存relids * 'mergeclauses' are the RestrictInfo nodes to use as merge clauses * (this should be a subset of the restrict_clauses list) * mergeclauses-合并連接條件子句 * 'outersortkeys' are the sort varkeys for the outer relation * outersortkeys-outer relation的排序鍵 * 'innersortkeys' are the sort varkeys for the inner relation * innersortkeys-inner relation的排序鍵 */ MergePath * create_mergejoin_path(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, JoinCostWorkspace *workspace, JoinPathExtraData *extra, Path *outer_path, Path *inner_path, List *restrict_clauses, List *pathkeys, Relids required_outer, List *mergeclauses, List *outersortkeys, List *innersortkeys) { MergePath *pathnode = makeNode(MergePath); pathnode->jpath.path.pathtype = T_MergeJoin; pathnode->jpath.path.parent = joinrel; pathnode->jpath.path.pathtarget = joinrel->reltarget; pathnode->jpath.path.param_info = get_joinrel_parampathinfo(root, joinrel, outer_path, inner_path, extra->sjinfo, required_outer, &restrict_clauses); pathnode->jpath.path.parallel_aware = false; pathnode->jpath.path.parallel_safe = joinrel->consider_parallel && outer_path->parallel_safe && inner_path->parallel_safe; /* This is a foolish way to estimate parallel_workers, but for now... */ pathnode->jpath.path.parallel_workers = outer_path->parallel_workers; pathnode->jpath.path.pathkeys = pathkeys; pathnode->jpath.jointype = jointype; pathnode->jpath.inner_unique = extra->inner_unique; pathnode->jpath.outerjoinpath = outer_path; pathnode->jpath.innerjoinpath = inner_path; pathnode->jpath.joinrestrictinfo = restrict_clauses; pathnode->path_mergeclauses = mergeclauses; pathnode->outersortkeys = outersortkeys; pathnode->innersortkeys = innersortkeys; /* pathnode->skip_mark_restore will be set by final_cost_mergejoin */ /* pathnode->materialize_inner will be set by final_cost_mergejoin */ final_cost_mergejoin(root, pathnode, workspace, extra);//估算成本 return pathnode; }
測試腳本如下
select a.*,b.grbh,b.je from t_dwxx a, lateral (select t1.dwbh,t1.grbh,t2.je from t_grxx t1 inner join t_jfxx t2 on t1.dwbh = a.dwbh and t1.grbh = t2.grbh) b order by b.dwbh;
啟動gdb,設置斷點
(gdb) b sort_inner_and_outer Breakpoint 1 at 0x7af63a: file joinpath.c, line 888. (gdb) c Continuing. Breakpoint 1, sort_inner_and_outer (root=0x1a4a278, joinrel=0x1aa7180, outerrel=0x1a55700, innerrel=0x1a56c30, jointype=JOIN_INNER, extra=0x7ffca933f880) at joinpath.c:888 888 JoinType save_jointype = jointype; (gdb)
新生成的joinrel是1號和3號RTE的連接,類型為JOIN_INNER
(gdb) p *joinrel->relids->words $1 = 10 (gdb) p jointype $2 = JOIN_INNER
獲取排序鍵,PathKey中的等價類EC,成員為t_grxx.dwbh和t_dwxx.dwbh
... (gdb) 993 all_pathkeys = select_outer_pathkeys_for_merge(root, (gdb) n 997 foreach(l, all_pathkeys) (gdb) p *all_pathkeys $3 = {type = T_List, length = 1, head = 0x1a69490, tail = 0x1a69490} (gdb) p *(PathKey *)all_pathkeys->head->data.ptr_value $5 = {type = T_PathKey, pk_eclass = 0x1a60e08, pk_opfamily = 1994, pk_strategy = 1, pk_nulls_first = false} ... (gdb) set $rt=(RelabelType *)((EquivalenceMember *)$ec->ec_members->head->data.ptr_value)->em_expr (gdb) p *$rt->arg $14 = {type = T_Var} (gdb) p *(Var *)$rt->arg $15 = {xpr = {type = T_Var}, varno = 3, varattno = 1, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, varnoold = 3, varoattno = 1, location = 208} (gdb) set $rt2=(RelabelType *)((EquivalenceMember *)$ec->ec_members->head->next->data.ptr_value)->em_expr (gdb) p *(Var *)$rt2->arg $16 = {xpr = {type = T_Var}, varno = 1, varattno = 2, vartype = 1043, vartypmod = 24, varcollid = 100, varlevelsup = 0, varnoold = 1, varoattno = 2, location = 218}
開始遍歷all_pathkeys
(gdb) n 999 List *front_pathkey = (List *) lfirst(l);
獲取連接條件子句,t_dwxx.dwbh=t_grxx.dwbh
(gdb) p *cur_mergeclauses $17 = {type = T_List, length = 1, head = 0x1a694f0, tail = 0x1a694f0}
構建outer和inner relation的排序鍵
(gdb) p *(PathKey *)innerkeys->head->data.ptr_value $22 = {type = T_PathKey, pk_eclass = 0x1a60e08, pk_opfamily = 1994, pk_strategy = 1, pk_nulls_first = false} (gdb) p *(PathKey *)merge_pathkeys->head->data.ptr_value $25 = {type = T_PathKey, pk_eclass = 0x1a60e08, pk_opfamily = 1994, pk_strategy = 1, pk_nulls_first = false}
嘗試merge join,進入函數try_mergejoin_path
(gdb) 1038 try_mergejoin_path(root, (gdb) step try_mergejoin_path (root=0x1a4dcc0, joinrel=0x1a68e20, outer_path=0x1a62288, inner_path=0x1a62320, pathkeys=0x1a694b8, mergeclauses=0x1a69518, outersortkeys=0x1a694b8, innersortkeys=0x1a69578, jointype=JOIN_INNER, extra=0x7ffca933f880, is_partial=false) at joinpath.c:572 572 if (is_partial)
初始merge join成本
... (gdb) 615 initial_cost_mergejoin(root, &workspace, jointype, mergeclauses, (gdb) p workspace $26 = {startup_cost = 10861.483356195882, total_cost = 11134.203356195881, run_cost = 24.997499999999999, inner_run_cost = 247.72250000000003, inner_rescan_run_cost = 1.3627136827435593e-316, outer_rows = 9999, inner_rows = 100000, outer_skip_rows = 0, inner_skip_rows = 911, numbuckets = 27665584, numbatches = 0, inner_rows_total = 1.3681950446447804e-316}
構造merge join
... (gdb) n 625 create_mergejoin_path(root, (gdb) 624 add_path(joinrel, (Path *) (gdb) 644 } (gdb) p *joinrel->pathlist $28 = {type = T_List, length = 1, head = 0x1a6a180, tail = 0x1a6a180} (gdb) p *(Node *)joinrel->pathlist->head->data.ptr_value $29 = {type = T_MergePath} (gdb) p *(MergePath *)joinrel->pathlist->head->data.ptr_value $30 = {jpath = {path = {type = T_MergePath, pathtype = T_MergeJoin, parent = 0x1a68e20, pathtarget = 0x1a69058, param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 10863.760856195882, total_cost = 12409.200856195883, pathkeys = 0x1a694b8}, jointype = JOIN_INNER, inner_unique = false, outerjoinpath = 0x1a62288, innerjoinpath = 0x1a62320, joinrestrictinfo = 0x1a692f8}, path_mergeclauses = 0x1a69518, outersortkeys = 0x1a694b8, innersortkeys = 0x1a69578, skip_mark_restore = false, materialize_inner = false}
完成調用
(gdb) n sort_inner_and_outer (root=0x1a4dcc0, joinrel=0x1a68e20, outerrel=0x1a4d700, innerrel=0x1a4d918, jointype=JOIN_INNER, extra=0x7ffca933f880) at joinpath.c:1054 1054 if (cheapest_partial_outer && cheapest_safe_inner) (gdb) 997 foreach(l, all_pathkeys) (gdb) 1066 } (gdb) n add_paths_to_joinrel (root=0x1a4dcc0, joinrel=0x1a68e20, outerrel=0x1a4d700, innerrel=0x1a4d918, jointype=JOIN_INNER, sjinfo=0x7ffca933f970, restrictlist=0x1a692f8) at joinpath.c:279 279 if (mergejoin_allowed) (gdb) 280 match_unsorted_outer(root, joinrel, outerrel, innerrel, ...
感謝各位的閱讀,以上就是“PostgreSQL中sort_inner_and_outer函數分析”的內容了,經過本文的學習后,相信大家對PostgreSQL中sort_inner_and_outer函數分析這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是億速云,小編將為大家推送更多相關知識點的文章,歡迎關注!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。