共计 6025 个字符,预计需要花费 16 分钟才能阅读完成。
本篇内容介绍了“PostgreSQL 查询优化中对消除外连接的处理过程是什么”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
使用的测试脚本:
drop table if exists t_null1;
create table t_null1(c1 int);
insert into t_null1 values(1);
insert into t_null1 values(2);
insert into t_null1 values(null);
drop table if exists t_null2;
create table t_null2(c1 int);
insert into t_null2 values(1);
insert into t_null2 values(null);
一、基本概念
消除外连接的代码注释说明如下:
/*
* reduce_outer_joins
* Attempt to reduce outer joins to plain inner joins.
*
* The idea here is that given a query like
* SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
* we can reduce the LEFT JOIN to a plain JOIN if the = operator in WHERE
* is strict. The strict operator will always return NULL, causing the outer
* WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b s
* columns. Therefore, there s no need for the join to produce null-extended
* rows in the first place --- which makes it a plain join not an outer join.
* (This scenario may not be very likely in a query written out by hand, but
* it s reasonably likely when pushing quals down into complex views.)
*
* More generally, an outer join can be reduced in strength if there is a
* strict qual above it in the qual tree that constrains a Var from the
* nullable side of the join to be non-null. (For FULL joins this applies
* to each side separately.)
*
* Another transformation we apply here is to recognize cases like
* SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
* If the join clause is strict for b.y, then only null-extended rows could
* pass the upper WHERE, and we can conclude that what the query is really
* specifying is an anti-semijoin. We change the join type from JOIN_LEFT
* to JOIN_ANTI. The IS NULL clause then becomes redundant, and must be
* removed to prevent bogus selectivity calculations, but we leave it to
* distribute_qual_to_rels to get rid of such clauses.
*
* Also, we get rid of JOIN_RIGHT cases by flipping them around to become
* JOIN_LEFT. This saves some code here and in some later planner routines,
* but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
* join type.
*
* To ease recognition of strict qual clauses, we require this routine to be
* run after expression preprocessing (i.e., qual canonicalization and JOIN
* alias-var expansion).
*/
有两种类型的外连接可以被消除, 第一种是形如以下形式的语句:
SELECT … FROM a LEFT JOIN b ON (…) WHERE b.y = 42;
这种语句如满足条件可变换为内连接 (INNER_JOIN).
之所以可以变换为内连接, 那是因为这样的语句与内连接处理的结果是一样的, 原因是在 Nullable-Side 端 (需要填充 NULL 值的一端), 存在过滤条件保证这一端不可能是 NULL 值, 比如 IS NOT NULL/y = 42 这类强(strict) 过滤条件.
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1;
QUERY PLAN
--------------------------------------------------------------------------------
Merge Left Join (cost=359.57..860.00 rows=32512 width=8) -- 外连接
Output: a.c1, b.c1
Merge Cond: (a.c1 = b.c1)
- Sort (cost=179.78..186.16 rows=2550 width=4)
Output: a.c1
Sort Key: a.c1
- Seq Scan on public.t_null1 a (cost=0.00..35.50 rows=2550 width=4)
Output: a.c1
- Sort (cost=179.78..186.16 rows=2550 width=4)
Output: b.c1
Sort Key: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..35.50 rows=2550 width=4)
Output: b.c1
(13 rows)
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1 where b.c1 = 1;
QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=0.00..85.89 rows=169 width=8) -- 外连接 (Left 关键字) 已被消除
Output: a.c1, b.c1
- Seq Scan on public.t_null1 a (cost=0.00..41.88 rows=13 width=4)
Output: a.c1
Filter: (a.c1 = 1)
- Materialize (cost=0.00..41.94 rows=13 width=4)
Output: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..41.88 rows=13 width=4)
Output: b.c1
Filter: (b.c1 = 1)
(10 rows)
第二种形如:
SELECT … FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
这种语句如满足条件可以变换为反半连接 (ANTI-SEMIJOIN).
过滤条件已明确要求 Nullable-Side 端 y IS NULL, 如果连接条件是 a.x = b.y 这类严格 (strict) 的条件, 那么这样的外连接与反半连接的结果是一样的.
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1 where b.c1 is null;
QUERY PLAN
--------------------------------------------------------------------------------
Hash Anti Join (cost=67.38..152.44 rows=1275 width=8) -- 变换为反连接
Output: a.c1, b.c1
Hash Cond: (a.c1 = b.c1)
- Seq Scan on public.t_null1 a (cost=0.00..35.50 rows=2550 width=4)
Output: a.c1
- Hash (cost=35.50..35.50 rows=2550 width=4)
Output: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..35.50 rows=2550 width=4)
Output: b.c1
(9 rows)
值得一提的是, 在 PG 中, 形如 SELECT … FROM a LEFT JOIN b ON (…) WHERE b.y = 42; 这样的 SQL 语句,WHERE b.y = 42 这类条件可以视为连接的上层过滤条件, 在查询树中,Jointree- fromlist(元素类型为 JoinExpr)与 Jointree- quals 处于同一层次, 由于 JoinExpr 中的 quals 为同层条件, 因此其上层即为 Jointree- quals. 有兴趣的可以查看日志输出查看 Query 查询树结构.
二、源码解读
消除外连接的代码在主函数 subquery_planner 中, 通过调用 reduce_outer_joins 函数实现, 代码片段如下:
/*
* If we have any outer joins, try to reduce them to plain inner joins.
* This step is most easily done after we ve done expression
* preprocessing.
*/
if (hasOuterJoins)
reduce_outer_joins(root);
reduce_outer_joins
相关的数据结构和依赖的子函数:
reduce_outer_joins_state
typedef struct reduce_outer_joins_state
{
Relids relids; /* base relids within this subtree */
bool contains_outer; /* does subtree contain outer join(s)? */
List *sub_states; /* List of states for subtree components */
} reduce_outer_joins_state;
BitmapXX
typedef struct Bitmapset
{
int nwords; /* number of words in array */
bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */
} Bitmapset;
#define WORDNUM(x) ((x) / BITS_PER_BITMAPWORD)
#define BITNUM(x) ((x) % BITS_PER_BITMAPWORD)
/* The unit size can be adjusted by changing these three declarations: */
#define BITS_PER_BITMAPWORD 32
typedef uint32 bitmapword; /* must be an unsigned type */
/*
* bms_make_singleton - build a bitmapset containing a single member
*/
Bitmapset *
bms_make_singleton(int x)
{
Bitmapset *result;
int wordnum,
bitnum;
if (x 0)
elog(ERROR, negative bitmapset member not allowed
wordnum = WORDNUM(x);
bitnum = BITNUM(x);
result = (Bitmapset *) palloc0(BITMAPSET_SIZE(wordnum + 1));
result- nwords = wordnum + 1;
result- words[wordnum] = ((bitmapword) 1 bitnum);
return result;
}