Join Reorder优化 - 论文摘要 - fxjwind

Join Reorder优化 - 论文摘要

Query Simplification: Graceful Degradation for Join-Order Optimization

这篇的related work可以参考，列的比较全面，

Query图分为下面几种，

Graph Simplification算法，

Heuristic

Optimization of Large Join Queries: Combining Heuristic and Combinatorial Techniques

这篇文章的主要观点，结合Combinatorial和Heuristic

Combinatorial意思是组合

组合优化问题就是，在状态空间中寻找一个最优的状态，状态的cost由cost function来决定

Combinatorial优化算法主要分为两种，

Iterative算法，这里主要是指repeat，不断随机重试，以找到更优解

退火算法，

Heuristic算法，

首先，Augmentation Heuristic

初始只有第一张关系表，

然后一张张往上加，加哪一个取决于chooseNext函数

下面给出一些可以用作chooseNext的指标，论文说实验结果是3的效果最好

KBZ Heuristic

算法分为三个部分，

R，给定一个rooted tree，给出optimal join ordering

T，给定一个join tree，遍历所有的root，用R找出每个rooted tree的optimal

G，给定一个join graph，可能cyclic，找出一个spanning tree(生成树)，调用T

Local Improvement

分而治之，表数太多的时候，穷举的代价很高，但是切分成小的cluster，就会简单许多

同样这样也无法得到最优解，cluster可以重合

最后如果把两个技术结合起来？

II和SA就是两种基本的Combinatorial方法，

SAA，SAK分别把augmentation和KBZ两种Heuristic方法用于SA，用于产生一个较优的initial state

IAI，IKI，用Heuristic的方法产生每一轮迭代的initial state

IAL，加入local improvement

AGI，KBI，先用Heuristic产生state，再用Iterative去优化

A New Heuristic for Optimizing Large Queries

查询优化的目的是避免worst plans，而不是找到best plan，在这样的假设下，启发式算法可能会达到比较好的效果

当前基于combinatorial优化技术(比如iterative或退火)的cost-based searching，已经取得了一定的效果，但是当前的方法并没有利用queries中inherent的semantic information

所以基本的思路就是，在当前cost-based searching的基础上利用semantic information，从而提出Goo算法，Greedy Operator Ordering

这是一种，Greedy的bottom up算法

Node关键属性是Size，Edge关键属性是Selectivity

Goo的目的是逐渐合并各个node，

合并的标准是，每次都是找产生中间结果最小的edge进行合并

很明显，Goo产生的肯定不是最优解

一般的思路都是，基于启发式的结果，进行进一步的调整和优化，找到更优解，

比如，增加一组rules，bottom up的试图apply这些rules得到更好的结果

Polynomial Heuristics for Query Optimization

One line of work adapts randomized techniques and combinatorial heuristics to address this problem.
These techniques consider the space of plans as points in a high-dimensional space, that can be “traversed” via transformations (e.g., join commutativity and
associativity).
Reference [13] surveys different such strategies, including iterative improvement, simulated annealing, and genetic algorithms.
These techniques can be seen as heuristic variations of transformation-based exhaustive enumeration algorithms.
Another line of work implements heuristic variations of dynamic programming. These approaches include reference [14] (which performs dynamic programming for a
subset of tables, picks the best k-table join, replaces it with a new “virtual” table, and repeats the procedure until all tables are part of the final plan),
reference [15] (which simplifies an initial join graph by disallowing non-promising join edges and then exhaustively searches the resulting, simpler problem using [8]), and references [16], [17] (which greedily build join trees one table at a time).

本文首先给出一个分类，比较新颖，

启发式是优化的基本技术，分为对于Transformation-based技术的启发式优化，和动态规划的启发式优化

其中Heuristic DP算法都是基于graph的，可以采用iterative的方式，根据cost等信息降低搜索空间等，或者用Greedy算法

但是文中说除了greedy的方案，其他的性能都太差

所以文中给出一个通用的Greedy算法框架，ERM