redis6.0.5之zset阅读笔记2--跳跃列表(zskiplist)之论文翻译3相关工作和总结 (这部分没有深入的挖掘相关文档,只是简单字面翻译)
ALTERNATIVE DATA STRUCTURES替代的数据结构
Balanced trees (e.g., AVL trees [Knu73] [Wir76]) and selfadjusting trees [ST85] can be used for the same problems as skip lists.
All three techniques have performance bounds of the same order.
A choice among these schemes involves several factors: the difficulty of implementing the algorithms, constant factors,
type of bound (amortized, probabilistic or worst-case) and performance on a non-uniform distribution of queries.
平衡树(例如 平衡二叉树 见文献[Knu73] [Wir76]) 和 自适应树 文献[ST85] 能够处理桶跳跃链表相同的问题。
这三种技术都有相同的性能边界。在这个几种方案中选择需要考虑以下几个因素:
实现算法的困难程度,常数因子,边界类型(摊销, 概率 或者 最坏情况) 和 非均匀分布查询的性能
Implementation difficulty 实现复杂度
For most applications,
implementers generally agree skip lists are significantly easier to implement than either balanced tree algorithms or self-adjusting tree algorithms.
对大部分应用来说,程序员普遍认为跳跃链表比其他平衡树的算法或者自适应树的算法来说是更容易实现。
Constant factors 常量因子
Constant factors can make a significant difference in the practical application of an algorithm.
This is particularly true for sub-linear algorithms. For example, assume that algorithms A and B both require O(log n) time to process a query,
but that B is twice as fast as A: in the time algorithm A takes to process a query on a data set of size n,
algorithm B can process a query on a data set of size n的2次方.
常量因子 决定一个算法在实际应用很大的不同。这个对亚线性算法尤其如此(亚线性就是次数低于1次 即小于x,比如x的0.5次).
举例如下,假设算法A和B同时需要O(log n)时间处理一个查询,但是B的速度是A的两倍,
算法A在一个大小为n的数据集中处理一个查询花费的时间里,算法B可以在一个大小为n的2次方的数据集中处理一个查询。
There are two important but qualitatively different contributions to the constant factors of an algorithm.
First, the inherent complexity of the algorithm places a lower bound on any implementation.
Self-adjusting trees are continuously rearranged as searches are performed;
this imposes a significant overhead on any implementation of self-adjusting trees.
Skip list algorithms seem to have very low inherent constant-factor overheads:
the inner loop of the deletion algorithm for skip lists compiles to just six instructions on the 68020.
算法的常量因子 有两个重要但是不同的性质的贡献。
第一个,一个算法的内在复杂度决定了任何实现的下界。
当执行查询时候,自适应树 连续的重新排列;这对任何实现自适应树的算法由带来一个很大的开销。
跳跃链表算法 看上去拥有非常低的内部常量影子开销:跳跃链表删除算法的内循在芯片68020编译只有6个指令。
Second, if the algorithm is complex, programmers are deterred from implementing optimizations.
For example, balanced tree algorithms are normally described using recursive insert and delete procedures,
since that is the most simple and intuitive method of describing the algorithms.
A recursive insert or delete procedure incurs a procedure call overhead. By using non-recursive insert and delete procedures,
some of this overhead can be eliminated. However, the complexity of nonrecursive algorithms for insertion and deletion in a balanced
tree is intimidating and this complexity deters most programmers from eliminating recursion in these routines.
Skip list algorithms are already non-recursive and they are simple enough that programmers are not deterred from performing optimizations.
第二个,如果算法很复杂,那么程序员实现优化就会收到威胁。
举例如下,平衡树算法是通常使用递归的插入和删除过程来描述,因为这个是最简单和直观的方式来描述(平衡树)算法。
一个递归的插入或者删除过程导致了过度调用过程。通过使用非递归的插入和删除过程,可以消除一些过度的开销。
但是,在平衡树中插入和删除非递归算法的复杂度是吓人的并且这个复杂性阻止了大部分程序员在这个过程中消除递归。
跳跃链表算法已经是非递归的并且他们足够简单所以程序员不会阻止优化实现。
Table 2 compares the performance of implementations of skip lists and four other techniques.
All implementations were optimized for efficiency.
The AVL tree algorithms were written by James Macropol of Contel and based on those in [Wir76].
The 2–3 tree algorithms are based on those presented in [AHU83].
Several other existing balanced tree packages were timed and found to be much slower than the results presented below.
The self-adjusting tree algorithms are based on those presented in [ST85]. The times in this table reflect the
CPU time on a Sun-3/60 to perform an operation in a data structure containing 2的16次方 elements with integer keys.
The values in parenthesis show the results relative to the skip list time The times for insertion and deletion do not include the time
for memory management (e.g, in C programs, calls to malloc and free).
表2比较了跳跃链表和其它4种技术的实现性能。所有的实现都为效率进行了优化。
AVL树算法是由Contel的jamesmacropol基于文献[Wir76]编写的.
2-3树算法是基于文献[AHU83]中提出的.
另外一些存在的平衡树算法包经过时间测试, 发现比我们下面呈现的结果都要慢很多。
自适应树算法是基于文献[ST85]提出的。
在这个表中的时间反映了在一个包含2的16次方个整数键元素的数据结构中用执行一个操作的Sun-3/60的CPU时间。
在下面括号中的值展示了 相对于跳跃链表 插入和删除的时间结果 ,不包括内存管理的时间。 (例如,在C程序中,调用malloc和free)
******************************Table 2 - Timings of implementations of different algorithms********************************
-------------------------------------------------------------------------------------------
Implementation Search Time Insertion Time Deletion Time
-------------------------------------------------------------------------------------------
Skip lists 0.051 msec (1.0) 0.065 msec (1.0) 0.059 msec (1.0)
non-recursive AVL trees 0.046 msec (0.91) 0.10 msec (1.55) 0.085 msec (1.46)
recursive 2–3 trees 0.054 msec (1.05) 0.21 msec (3.2) 0.21 msec (3.65)
Self–adjusting trees:
top-down splaying 0.15 msec (3.0) 0.16 msec (2.5) 0.18 msec (3.1)
bottom-up splaying 0.49 msec (9.6) 0.51 msec (7.8) 0.53 msec (9.0)
-------------------------------------------------------------------------------------------
******************************Table 2 - Timings of implementations of different algorithms********************************
Note that skip lists perform more comparisons than other methods
(the skip list algorithms presented here require an average of L(n)/p + 1/(1–p) + 1 comparisons).
For tests using real numbers as keys, skip lists were slightly slower than the non-recursive AVL tree algorithms
and search in a skip list was slightly slower than search in a 2–3 tree
(insertion and deletion using the skip list algorithms was still faster than using the recursive 2–3 tree algorithms).
If comparisons are very expensive,
it is possible to change the algorithms so that we never compare the search key against the key of a node more than once during a search.
For p = 1/2,
this produces an upper bound on the expected number of comparisons of 7/2 + 3/2 log2 n. This modification is discussed in [Pug89b].
注意到跳跃链表比其它方法执行更多的比较(在这提出的跳跃链表算法 需要个平均L(n)/p + 1/(1–p) + 1 次比较 ) (根据上文可知,多1即为比较次数)
为了测试使用实数做键,跳跃链比非递归的AVL树算法轻微的慢。
在跳跃链表中查找也比2–3树轻微的慢(插入和删除操作,跳跃链表还是比使用递归的2–3树算法快)
如果比较是非常花时间的,我们就改变算法,这样可以不会使得查找键和节点键的比较在查找中比较多次。
对于p=1/2, 这个过程产生一个预期比较次数的上界值7/2 + 3/2 log2(n),这个修改在文献[Pug89b]中讨论。
Type of performance bound 性能边界类型
These three classes of algorithm have different kinds of performance bounds. Balanced trees have worst-case time bounds,
self-adjusting trees have amortized time bounds and skip lists have probabilistic time bounds. With self-adjusting trees,
an individual operation can take O(n) time, but the time bound always holds over a long sequence of operations.
For skip lists, any operation or sequence of operations can take longer than expected,
although the probability of any operation taking significantly longer than expected is negligible.
这三类算法有不同的性能界限。平衡树有最坏的时间界限,
自适应树具有摊销时间界限,跳过列表具有概率时间界限。有了自适应树,单个操作可能需要O(n)个时间,
但时间限制始终适用于一个很长的操作序列。对于跳跃链表,任何操作或操作序列的时间都可能比预期的长,
尽管任何操作所花费的时间远远超过预期的概率可以忽略不计。
In certain real-time applications, we must be assured that an operation will complete within a certain time bound.
For such applications, self-adjusting trees may be undesirable,
since they can take significantly longer on an individual operation than expected
(e.g., an individual search can take O(n) time instead of O(log n) time). For real-time systems,
skip lists may be usable if an adequate safety margin is provided:
the chance that a search in a skip lists containing 1000 elements takes more than 5 times the expected time is about 1 in 10的18次方.
在某些实时应用程序中,我们必须确保操作将在一定的时间范围内完成。对于此类应用,自适应树就不合适,
因为它们在单个操作上花费的时间比预期的要长得多(例如,单个搜索可能需要O(n)时间而不是O(logn)时间)。
对于实时系统,如果提供了足够的安全边界,则可以使用跳过列表:
在包含1000个元素的跳跃链表中进行搜索所需时间是预期时间的5倍以上的概率约为1 的 10的18次方 分之一。
Non-uniform query distribution 非均匀分布查询
Self-adjusting trees have the property that they adjust to nonuniform query distributions.
Since skip lists are faster than self-adjusting trees by a significant constant factor when a uniform query distribution is encountered,
self-adjusting trees are faster than skip lists only for highly skewed distributions.
We could attempt to devise self-adjusting skip lists.
However,there seems little practical motivation to tamper with the simplicity and fast performance of skip lists;
in an application where highly skewed distributions are expected,
either selfadjusting trees or a skip list augmented by a cache may be preferable [Pug90].
自适应树的特性是它们可以调整为非均匀的查询分布。
由于遇到均匀查询分布时,跳跃链表比自适应树快一个显著的常数因子,
仅对于高度倾斜的分布,自适应树比跳过列表更快。我们可以尝试设计自我调整的跳跃链表。
然而,似乎没有什么实际的动机去篡改跳跃链表的简单性和快速性能;
在一个预期分布高度偏斜的应用中,自适应树或由扩充缓存的跳跃链表可能更可取[Pug90]。
ADDITIONAL WORK ON SKIP LISTS 跳跃链表额外的工作
I have described a set of algorithms that allow multiple processors to concurrently update a skip list in shared memory [Pug89a].
This algorithms are much simpler than concurrent balanced tree algorithms.
They allow an unlimited number of readers and n busy writers in a skip list of n elements with very little lock contention.
我已经描述了一组允许多个处理器同时更新共享内存中的跳跃链表的算法[Pug89a]。
这种算法比并发平衡树算法简单得多。
它们允许在一个由n个元素组成的跳跃链表中有无限数量的读卡器和n个繁忙的写卡器,而锁争用很少。
Using skip lists, it is easy to do most (all?) the sorts of operations you might wish to do with a balanced tree such as use search fingers,
merge skip lists and allow ranking operations(e.g., determine the kth element of a skip list) [Pug89b].
使用跳跃链表,很容易做到大多数(全部?)您可能希望使用平衡树执行的各种操作,例如使用搜索指纹,
合并跳跃链表并允许排序操作(例如,确定跳跃链表的第k个元素)[Pug89b]。
Tom Papadakis, Ian Munro and Patricio Poblette [PMP90] have done an exact analysis of the expected search time in a skip list.
The upper bound described in this paper is close to their exact bound;
the techniques they needed to use to derive an exact analysis are very complicated and sophisticated.
Their exact analysis shows that for p = 1/2 and p = 1/4,
the upper bound given in this paper on the expected cost of a search is not more than 2 comparisons more than the exact expected cost.
TomPapadakis、IanMunro和PatricioPoblette在文献[PMP90]对跳跃链表中的预期搜索时间做了精确的分析。
本文所描述的上界接近于它们的精确界;
他们用来产生精确结果的技术是非常复杂和精巧的。
他们的精确分析表明,对于p=1/2和p=1/4,
本文给出的查找期望代价的上界不超过精确期望代价的2次比较。
I have adapted idea of probabilistic balancing to some other problems arising both in data structures and in incremental computation [PT88].
We can generate the level of a node based on the result of applying a hash function to the element
(as opposed to using a random number generator).
This results in a scheme where for any set S,
there is a unique data structure that represents S and with high probability the data structure is approximately balanced.
If we combine this idea with an applicative (i.e., persistent) probabilistically balanced data structure and a scheme such as hashed-consing
[All78] which allows constant-time structural equality tests of applicative data structures,
we get a number of interesting properties, such as constant-time equality tests for the representations of sequences.
This scheme also has a number of applications for incremental computation.
Since skip lists are somewhat awkward to make applicative, a probabilistically balanced tree scheme is used.
我已经将概率平衡的思想应用于数据结构和增量计算中的一些其他问题[PT88]。
我们可以根据对元素应用哈希函数的结果来生成节点的级别(与使用随机数生成器相反)。
这就对于任何集合S产生了一个方案,
有一个独特的数据结构,代表S和高概率的数据结构是近似平衡。
如果我们将这一思想与一个实用的(即持久的)概率平衡的数据结构和一个方案相结合,比如hash-consing[All78]
允许应用数据结构的常量时间结构相等性测试,
我们得到了一些有趣的性质,例如序列表示的常数时间相等性检验。
该方案在增量计算方面也有许多应用。
由于跳跃链表在应用上有些困难,因此使用了概率平衡树方案。
RELATED WORK 相关工作
James Discroll pointed out that R. Sprugnoli suggested a method of randomly balancing search trees in 1981 [Spr81].
With Sprugnoli’s approach, the state of the data structure is not independent of the sequence of operations which built it.
This makes it much harder or impossible to formally analyze his algorithms.
Sprugnoli gives empirical evidence that his algorithm has good expected performance, but no theoretical results.
jamesdiscroll指出,R.Sprugnoli在1981年提出了一种随机平衡搜索树的方法[Spr81]。
使用Sprugnoli的方法,数据结构的状态并不独立于构建它的操作序列。
这使得正式分析他的算法变得更加困难或不可能。
Sprugnoli给出的经验证据表明,他的算法具有良好的预期性能,但没有理论结果。
A randomized data structure for ordered sets is describedin [BLLSS86].
However, a search using that data structure requires O(n1/2) expected time
有序集的随机化数据结构在[BLLSS86]中描述。但是,使用该数据结构的搜索需要O(n的1/2次方)预期时间
Cecilia Aragon and Raimund Seidel describe a probabilistically balanced search trees scheme [AC89].
They discuss how to adapt their data structure to non-uniform query distributions.
Cecilia Aragon和Raimund Seidel描述了一个概率平衡的搜索树方案[AC89]。
他们讨论如何使数据结构适应非均匀分布的查询。
SOURCE CODE AVAILABILITY可用源码
Skip list source code libraries for both C and Pascal are available for anonymous ftp from ftp.cs.umd.edu.
跳跃链表C和Pascal的源代码库可从匿名ftp(ftp.cs.umd)获取
CONCLUSIONS结论
From a theoretical point of view, there is no need for skip lists.
Balanced trees can do everything that can be done with skip lists and have good worst-case time bounds (unlike skip lists).
However, implementing balanced trees is an exacting task
and as a result balanced tree algorithms are rarely implemented except as part of a programming assignment in a data structures class.
从理论上讲,不需要跳跃链表。平衡树可以完成跳跃链表可以完成的所有事情,
并且具有良好的最坏情况时间界限(与跳跃链表不同)。
然而,实现平衡树是一项艰巨的任务
因此,除了作为数据结构中编程直接使用外,很少自己实现平衡树算法。
Skip lists are a simple data structure that can be used in place of balanced trees for most applications.
Skip lists algorithms are very easy to implement, extend and modify.
Skip lists are about as fast as highly optimized balanced tree algorithms
and are substantially faster than casually implemented balanced tree algorithms.
跳跃链表是一种简单的数据结构,在大多数应用程序中可以用来代替平衡树。
跳跃链表算法易于实现、扩展和修改。
跳跃链表的速度大约和高度优化的平衡树算法一样快,并且比随意实现的平衡树算法快得多。
ACKNOWLEDGEMENTS
。。。省略,见原文
REFERENCES
。。。省略,见原文
浙公网安备 33010602011771号