您的位置：首页 > 数据库

PostgreSQL查询优化简介

2017-05-27 15:57 155 查看

简介

PostgreSQL查询优化器执行过程

语法分析：生成查询树
语义检查：对SQL表达的语义进行检查
查询优化视图重写
逻辑优化：子查询优化，条件化简，等价谓词重写，连接消除，得到逻辑计划
物理优化：基于代价优化，得到物理计划。PostgreSQL主要采用动态规划和遗传算法
非SPJ优化：主要针对分组，排序，去重等操作

查询计划执行

在PostgreSQL中，语法树并不是一棵树状结构的，把关系平面化到一个链表里面。因为，PostgreSQL认为，在这个阶段不清楚表之间如何链接。

重要数据结构

查询语法树

typedef struct Query
{
//上面还有节点类型，是否存在相关子句
List       *cteList;        /* WITH 子句 */
List       *rtable;         /* list of range table entries */
FromExpr   *jointree;       /* table join tree (FROM and WHERE clauses) */
List       *targetList;     /* target list (of TargetEntry) */
List       *returningList;  /* return-values list (of TargetEntry) */
List       *groupClause;    /* a list of SortGroupClause's */
Node       *havingQual;     /* qualifications applied to groups */
List       *windowClause;   /* 窗口函数子句链表 */
List       *distinctClause; /* a list of SortGroupClause's */
List       *sortClause;     /* a list of SortGroupClause's */

Node       *limitOffset;    /* limit的offset子句 */
Node       *limitCount;     /* limit的个数*/
Node       *setOperations;  /* 是否为多个SQL UNION/INTERSECT/EXCEPT query */
} Query;

范围表(优化前)

表示被查询的对象，可以是一张表，一个From子句中的子查询，一个连接操作的结果

typedef struct RangeTblEntry
{
//普通表
Oid         relid;          /* OID of the relation */
char        relkind;        /* relation kind (see pg_class.relkind) */
struct TableSampleClause *tablesample;      /* sampling info, or NULL */

//子查询
Query      *subquery;       /* the sub-query */
bool        security_barrier;       /* is from security_barrier view?如果是视图展开的子查询，PostgreSQL不做优化 */

//连接类型
JoinType    jointype;       /* type of join */
List       *joinaliasvars;  /* list of alias-var expansions */
} RangeTblEntry;

关系优化信息(优化过程中)

对应PlannerInfo结构体的两个成员(simple_rel_array和join_rel_list)，是优化阶段的操作对象，具有查询优化的相关信息

typedef struct RelOptInfo
{
/* all relations included in this RelOptInfo */
Relids      relids;         /* set of base relids (rangetable indexes) */

/* 估算结果的行数 */
double      rows;

/* materialization information */
List       *pathlist;       /* 存放所有可能的路径 */
List       *ppilist;        /* ParamPathInfos used in pathlist */
List       *partial_pathlist;       /* partial Paths */

/* 局部最优不一定后面最优，上一层的3个可能的最优结果 */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
struct Path *cheapest_unique_path;
List       *cheapest_parameterized_paths;

//本关系为单表或者join

/* used by various scans and joins: */
List       *baserestrictinfo;       /* RestrictInfo structures (if base
* rel) */
QualCost    baserestrictcost;       /* cost of evaluating the above */
List       *joininfo;       /* RestrictInfo structures for join clauses
* involving this rel */
bool        has_eclass_joins;       /* T means joininfo is incomplete */
} RelOptInfo;

计划节点信息

全局查询优化计划的相关信息，存放在PlannerInfo结构体

typedef struct PlannerInfo
{
Query      *parse;          /* 开始时的查询计划树 */
PlannerGlobal *glob;        /* global info for current planner run */
Index       query_level;    /* 本计划所处的层数*/

struct PlannerInfo *parent_root;    /* NULL at outermost Query */

struct RelOptInfo **simple_rel_array;       /* 所有基本表信息 */
int         simple_rel_array_size;  /* allocated size of array */
RangeTblEntry **simple_rte_array;   /* rangetable as an array */

//考虑过连接后生成的新关系
List       *join_rel_list;  /* list of join-relation RelOptInfos */
struct HTAB *join_rel_hash; /* optional hashtable for join relations */

List      **join_rel_level; /*结果关系*/
int         join_cur_level; /* index of list being extended */

} PlannerInfo;

计划节点

代表根据最有路径，生成的物理计划(Plan)

typedef struct Plan
{
/*
* estimated execution costs for plan (see costsize.c for more info)
*/
Cost        startup_cost;   /* cost expended before fetching any tuples */
Cost        total_cost;     /* total cost (assuming all tuples fetched) */

/*
* 估计的元组数和元组宽度
*/
double      plan_rows;
int         plan_width;

/*
* Common structural data for all Plan types.
*/
int         plan_node_id;   /* unique across entire final plan tree */
List       *targetlist;     /* target list to be computed at this node */
List       *qual;           /* implicitly-ANDed qual conditions */
struct Plan *lefttree;      /* input plan tree(s) */
struct Plan *righttree;
List       *initPlan;       /* Init Plan nodes (un-correlated expr
* subselects) */
} Plan;

PlannedStmt

优化器结果，保存查询执行计划，范围表相关信息

SelectStmt

语法分析结果

typedef struct SelectStmt
{
List       *distinctClause; /* distinct子句*/
IntoClause *intoClause;     /* target for SELECT INTO */
List       *targetList;     /* 投影列子句 */
List       *fromClause;     /* From子句，包括join */
Node       *whereClause;    /* where子句 */
List       *groupClause;    /* GROUP BY clauses */
Node       *havingClause;   /* HAVING conditional-expression */
List       *windowClause;   /* WINDOW window_name AS (...), ... */

List       *sortClause;     /* sort clause (a list of SortBy's) */
Node       *limitOffset;    /* # of result tuples to skip */
Node       *limitCount;     /* # of result tuples to return */
List       *lockingClause;  /* FOR UPDATE (list of LockingClause's) */
} SelectStmt;

结构体关系

PlannerInfo是逻辑优化的主要产物，拥有查询树(Query)，关系优化信息，约束条件
路径是物理优化阶段的主要产物，拥有排序键和连接节点
PlannerInfo和路径混杂在一起
查询执行计划Plan是所有路径的最小代价生成的

代码入口

planner：主入口函数

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
PlannedStmt *result;

if (planner_hook)
result = (*planner_hook) (parse, cursorOptions, boundParams);
else
result = standard_planner(parse, cursorOptions, boundParams);
return result;
}

被主函数调用

standard_planner——标准的查询优化器入口

standard_planner只是查询优化器的外壳，通过调用subquery_planner完成查询优化，通过调用set_plan_references完成清理辅助工作

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
PlannedStmt *result;//结果
PlannerGlobal *glob;//查询优化一些所有子查询需要的公共信息
double      tuple_fraction;//
PlannerInfo *root;
RelOptInfo *final_rel;
Path       *best_path;
Plan       *top_plan;
ListCell   *lp,
*lr;

···

/* primary planning entry point (may recurse for subqueries) */
root = subquery_planner(glob, parse, NULL,
false, tuple_fraction);

/* Select best Path and turn it into a Plan */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

top_plan = create_plan(root, best_path);

···

/* final cleanup of the plan */
Assert(glob->finalrtable == NIL);
Assert(glob->finalrowmarks == NIL);
Assert(glob->resultRelations == NIL);
top_plan = set_plan_references(root, top_plan);
/* ... and the subplans (both regular subplans and initplans) */
Assert(list_length(glob->subplans) == list_length(glob->subroots));
forboth(lp, glob->subplans, lr, glob->subroots)
{
Plan       *subplan = (Plan *) lfirst(lp);
PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);

lfirst(lp) = set_plan_references(subroot, subplan);
}

···
/* build the PlannedStmt result */
result = makeNode(PlannedStmt);

result->commandType = parse->commandType;
result->queryId = parse->queryId;   //parse结果
result->planTree = top_plan;    //查询计划
result->rtable = glob->finalrtable; //范围表
result->resultRelations = glob->resultRelations;    //

return result;
}

subquery_planner返回逻辑优化和物理优化结果root(PlannerInfo *)
create_plan根据最优路径，和PlannerInfo生成物理执行计划Plan
set_plan_references对执行计划部分调整和清理

subquery_planner生成(子)查询执行计划的函数

subquery_planner分为两步。第一步是逻辑优化；第二步是物理优化

//传入glob，parse,typle_fraction，parent_root最开始为NULL
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
PlannerInfo *parent_root,
bool hasRecursion, double tuple_fraction)
{
PlannerInfo *root;
List       *newWithCheckOptions;
List       *newHaving;
bool        hasOuterJoins;
RelOptInfo *final_rel;
ListCell   *l;

/* 为当前子查询创建PlannerInfo */
root = makeNode(PlannerInfo);
root->parse = parse;
root->glob = glob;
root->query_level = parent_root ? parent_root->query_level + 1 : 1;
root->parent_root = parent_root;
root->hasRecursion = hasRecursion;
if (hasRecursion)
root->wt_param_id = SS_assign_special_param(root);
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;

/*
* Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
* to transform them into joins.  Note that this step does not descend
* into subqueries; if we pull up any subqueries below, their SubLinks are
* processed just before pulling them up.
*子连接：在where和join子句含有ANY和EXISTS
*/
if (parse->hasSubLinks)
pull_up_sublinks(root);

//上拉子查询
pull_up_subqueries(root);

//子查询合并
if (parse->setOperations)
flatten_simple_union_all(root);

//上拉子查询后处理继承关系
preprocess_rowmarks(root);
expand_inherited_tables(root);

//条件化简
preprocess_expression

//合并having子句到where子句，如果having里面含有聚集函数，易失函数，子查询不能合并

//消除外连接
if (hasOuterJoins)
reduce_outer_joins(root);

/*
* Do the main planning.  If we have an inherited target relation, that
* needs special processing, else go straight to grouping_planner.
*/
if (parse->resultRelation &&
rt_fetch(parse->resultRelation, parse->rtable)->inh)
//含有继承关系的物理优化
inheritance_planner(root);
else
//物理优化
grouping_planner(root, false, tuple_fraction);

set_cheapest(final_rel);

return root;
}

逻辑优化处理CTE表达式（ss_process_ctes）
上拉子连接
上拉子查询
Union all处理：flatten_simple_union_all
处理for update(row lock)：preprocess_rowmark
继承表处理(expand_inherited_tables)
处理目标列(prepocess_expression)
处理withCheckOptions：prepocess_expression
处理return 表达式，window子句，limit off子句：prepocess_expression
合并having到where子句
消除外连接

物理优化：生成本成查询PlanInfo的三条最优路径，返回给上层

整体流程

subquery_planner 处理CTE表达式
上拉子链接(去除in, some ,exist)
上拉子查询
预处理表达式(and/or，计算明显的结果，处理不能上拉的子连接)
消除外连接
grouping_planer 处理集合(生成子查询)
处理非集合，调整order, group, target list之句中的顺序
query_planner 构建基本表的RelOptInfo
选择下推
投影下推
推导隐含表达式
生成pathkey
make_one_rel(通过数据直方图计算选择率) 处理单表最优的查询方式
处理两表join最优的方式(判定是否special join)
动态规划或者遗传算法构建多表Join

获取cheatest_path，再加上其他子句

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航