Pentium.Labs

System全家桶:https://zhuanlan.zhihu.com/c_1238468913098731520

导航

AutoTikv简介

AutoTikv是一个用于对TiKV数据库进行自动调优的工具。它的设计灵感来自于SIGMOD 2017的一篇paper:Automatic Database Management System Tuning Through Large-scale Machine Learning,使用机器学习模型对数据库参数进行自动调优。

项目地址:https://github.com/pentium3/AutoTiKV

 

设计目标

整个调优过程大致如下图:

整个过程会循环跑200个round(可以用户自定义),或者也可以定义成到结果收敛为止。

 

AutoTiKV支持在修改参数之后重启tikv(如果不需要也可以选择不重启)。需要调节的参数和需要查看的metric可以在controller.py里声明。

以下是一个knob的声明样板:

"rocksdb.defaultcf.write-buffer-size":          # name of the knob  用点分隔不同session名和knob名
    {
        "changebyyml": True,                    # True表示通过修改tikv-ansible/conf/tikv.yml来调节
        "set_func": None,                       # 若changebyyml==False,则在此指定修改参数的函数名(函数也定义在controller.py里,一般就用tikv-ctl命令行来调节
        "minval": 64,                           # if type!=enum, indicate min possible value
        "maxval": 1024,                         # if type!=enum, indicate max possible value
        "enumval": [],                          # if type==enum, list all valid values
        "type": "int",                          # int / enum / real
        "default": 64                           # default value
    },

以下是一个metric的声明样板:

"write_latency":
    {
     "read_func": read_write_latency,      # 声明查看该指标的函数(函数也定义在controller.py里)
     "lessisbetter": 1,                    # whether less value of this metric is better(1: yes)
     "calc": "ins",                        # ins表示该参数的值就是benchmark之后查看的结果。inc表示该参数是incremental的,需要把benchmark之后和之前的值相减作为结果。
    },

一开始的10轮(具体大小可以调节)是用随机生成的knob去benchmark,之后的都是用ML模型推荐的参数去benchmark。

 

ML模型

AutoTikv 使用了和 OtterTune 一样的高斯过程回归(Gaussian Process Regression,以下简称 GP)来推荐新的 knob,它是基于高斯分布的一种非参数模型。高斯过程回归的好处是:

  1. 和神经网络之类的方法相比,GP 属于无参数模型,算法计算量相对较低,而且在训练样本很少的情况下表现比 NN 更好。
  2. 它能估计样本的分布情况,即 X 的均值 m(X) 和标准差 s(X)。若 X 周围的数据不多,则它被估计出的标准差 s(X) 会偏大(表示这个样本 X 和其他数据点的差异大)。直观的理解是若数据不多,则不确定性会大,体现在标准差偏大。反之,数据足够时,不确定性减少,标准差会偏小。这个特性后面会用到。

但 GP 本身其实只能估计样本的分布,为了得到最终的预测值,我们需要把它应用到贝叶斯优化(Bayesian Optimization)中。贝叶斯优化算法大致可分为两步:

  1. 通过 GP 估计出函数的分布情况
  2. 通过采集函数(Acquisition Function)指导下一步的采样(也就是给出推荐值)

采集函数(Acquisition Function)的作用是:在寻找新的推荐值的时候,平衡探索(exploration)和利用(exploitation)两个性质:

  • exploration:在目前数据量较少的未知区域探索新的点。
  • exploitation:对于数据量足够多的已知区域,利用这些数据训练模型进行估计,找出最优值

在推荐的过程中,需要平衡上述两种指标。exploitation 过多会导致结果陷入局部最优值(重复推荐目前已知的最好的点,但可能还有更好的点没被发现),而 exploration 过多又会导致搜索效率太低(一直在探索新区域,而没有对当前比较好的区域进行深入尝试)。而平衡二者的核心思想是:当数据足够多时,利用现有的数据推荐;当缺少数据时,我们在点最少的区域进行探索,探索最未知的区域能给我们最大的信息量。

贝叶斯优化的第二步就可以帮我们实现这一思想。前面提到 GP 可以帮我们估计 X 的均值 m(X) 和标准差 s(X),其中均值 m(x) 可以作为 exploitation 的表征值,而标准差 s(x) 可以作为 exploration 的表征值。这样就可以用贝叶斯优化方法来求解了。

使用置信区间上界(Upper Confidence Bound)作为采集函数。假设我们需要找 X 使 Y 值尽可能大,则 U(X) = m(X) + k*s(X),其中 k > 0 是可调的系数。我们只要找 X 使 U(X) 尽可能大即可。

  • 若 U(X) 大,则可能 m(X) 大,也可能 s(X) 大。
  • 若 s(X) 大,则说明 X 周围数据不多,需要探索未知区域新的点。
  • 若 m(X) 大,说明估计的 Y 值均值大, 则需要利用已知数据找到效果好的点。
  • 其中系数 k 影响着探索和利用的比例,k 越大,越鼓励探索新的区域。

在具体实现中,一开始随机生成若干个 candidate knobs,然后用上述模型计算出它们的 U(X),找出 U(X) 最大的那一个作为本次推荐的结果。

Ref:https://mp.weixin.qq.com/s/y8VIieK0LO37SjRRyPhtrw

 

数据库参数

workload

我们定义了writeheavy、longscan、shortscan、point-lookup四种workload。数据库大小都是80GB。

 

knobs

我们试验了如下参数:

Options Expected Behavior valid range/value set how to set/view knob  
write-buffer-size point-lookup, range-scan: larger is better [64MB, 1GB] tidb-ansible/conf/tikv.yml  
max-bytes-for-level-base point-lookup, range-scan: larger is better [512MB, 4GB] tidb-ansible/conf/tikv.yml  
target-file-size-base point-lookup, range-scan: larger is better {8M, 16M, 32M, 64M, 128M} tidb-ansible/conf/tikv.yml  
disable-auto-compactions write-heavy: 1 is better
point-lookup, range-scan: 0 is better
{1, 0} tidb-ansible/conf/tikv.yml
or
tikv-ctl
 
block-size point-lookup: smaller the better
range-scan: larger the better
{4k,8k,16k,32k,64k} tidb-ansible/conf/tikv.yml  
bloom-filter-bits-per-key point-lookup, range-scan: larger is better {5,10,15,20} tidb-ansible/conf/tikv.yml  
optimize-filters-for-hits point-lookup, range-scan: 0 is better {1, 0} tidb-ansible/conf/tikv.yml  

这些参数的含义如下:

  • block-size:RocksDB 会将数据存放在 data block 里面,block-size 设置这些 block 的大小,当需要访问某一个 key 的时候,RocksDB 需要读取这个 key 所在的整个 block。对于点查,更大的 block 会增加读放大,影响性能,但是对于范围查询,更大的 block 能够更有效的利用磁盘带宽。    
  • disable-auto-compactions:定义是否关闭 compaction。compaction 会占用磁盘带宽,影响写入速度。但如果 LSM 得不到 compact, level0 文件会累积,影响读性能。其实本身 compaction 也是一个有趣的 auto-tuning 的方向
  • write-buffer-size:单个 memtable 的大小限制(最大值)。理论上说更大的 memtable 会增加二分查找插入位置的消耗,但是之前的初步试验发现这个选项对 writeheavy 影响并不明显。
  • max-bytes-for-level-base:LSM tree 里面 level1 的总大小。在数据量固定的情况下,这个值更大意味着其实 LSM 的层数更小,对读有利。
  • target-file-size-base:假设 target-file-size-multiplier=1 的情况下,这个选项设置的是每个 SST 文件的大小。这个值偏小的话意味着 SST 文件更多,会影响读性能。
  • bloom-filter-bits-per-key:设置 Bloom Filter 的位数。对于读操作这一项越大越好。
  • optimize-filters-for-hits:True 表示关闭 LSM 最底层的 bloom filter。这个选项主要是因为最底层的 bloom filter 总大小比较大,比较占用 block cache 空间。如果已知查询的 key 一定在数据库中存,最底层 bloom filter 其实是没有作用的。

几个试验过但最终放弃了的参数:

  • block_cache_size:RocksDB block cache 的大小,这个 cache 就是用来缓存上面提到的解压缩后的 data block 的。理论上来说 block cache 一般不能占满系统内存,要留一部分用来在 OS buffer cache 里面缓存压缩的 data block。但是在我们初步试验里面 block_cache_size 最优值都是打到最大。针对 block cache 的自动调优策略被研究的也比较多,比如用强化学习来选择置换算法SimulatedCache
  • delayed_write_rate:当 flush 或 compaction 的速度跟不上前台写入速度的时候,RocksDB 会强制将写入速度限制到 delayed_write_rate,来避免读性能退化。本来希望通过调整这个参数来试验能否自动调优这个值,但是发生 write stall 以后会导致 TiKV 返回超时错误,影响 tuning 的流程,所以只好放弃了这个参数。

 

metrics

我们选择了如下几个metrics作为优化指标。

  • throughput:根据具体workload不同又分为write throughput、get throughput、scan throughput
  • latency:根据具体workload不同又分为write latency、get latency、scan latency
  • store_size:
  • compaction_cpu:

其中throughput和latency通过go-ycsb的输出结果获得,store_size和compaction_cpu通过tikv-ctl获得。

 

Ref:

https://rdrr.io/github/richfitz/rleveldb/man/leveldb_open.html

http://mysql.taobao.org/monthly/2016/08/03/

https://www.jianshu.com/p/8e0018b6a8b6

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

 

实验测试结果

测试平台

AMD Ryzen5-2600(6C12T), 32GB RAM, 512GB NVME SSD, Ubuntu 18.04, tidb-ansible用的master版本

 

workload=writeheavy    knobs={disable-auto-compactions, block-size}    metric=write_latency

# Copyright (c) 2010 Yahoo! Inc. All rights reserved.                                                                                                                             
#                                                                                                                                                                                 
# Licensed under the Apache License, Version 2.0 (the "License"); you                                                                                                             
# may not use this file except in compliance with the License. You                                                                                                                
# may obtain a copy of the License at                                                                                                                                             
#                                                                                                                                                                                 
# http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                      
#                                                                                                                                                                                 
# Unless required by applicable law or agreed to in writing, software                                                                                                             
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                                               
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or                                                                                                                 
# implied. See the License for the specific language governing                                                                                                                    
# permissions and limitations under the License. See accompanying                                                                                                                 
# LICENSE file.                                                                                                                                                                   


# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#                        
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

# 80GB

recordcount=80000000
operationcount=5000000
# fieldlength=10
workload=core

readallfields=true

readproportion=0
updateproportion=1
scanproportion=0
insertproportion=0

requestdistribution=zipfian
ycsb workload定义

实验效果如下:

################## data ##################
------------------------------previous:------------------------------
knobs:   
[[
0. 4.] [1. 3.] [0. 0.] [0. 3.] [0. 1.] [1. 4.] [0. 1.] [1. 0.] [1. 4.] [0. 4.] [0. 0.] [0. 0.] [0. 0.] [0. 0.] [0. 0.] [0. 0.]] metrics:
[[
1.01428000e+04 5.04230000e+04 8.86174709e+10 1.84750000e+02] [1.01703000e+04 5.02510000e+04 8.98934985e+10 2.50000000e+00] [1.24102000e+04 4.10920000e+04 8.95223916e+10 2.18850000e+02] [1.09910000e+04 4.64880000e+04 8.86518967e+10 1.89610000e+02] [1.20731000e+04 4.21960000e+04 8.90833010e+10 1.88950000e+02] [9.42460000e+03 5.42690000e+04 8.98143324e+10 3.32000000e+00] [1.19275000e+04 4.28240000e+04 8.90753594e+10 1.94820000e+02] [1.18271000e+04 4.32470000e+04 9.11159380e+10 3.08000000e+00] [9.34830000e+03 5.47160000e+04 8.98211663e+10 3.27000000e+00] [1.02665000e+04 4.97860000e+04 8.86331145e+10 1.87730000e+02] [1.25193000e+04 4.08050000e+04 8.94974748e+10 2.19960000e+02] [1.24805000e+04 4.07670000e+04 8.95419805e+10 2.20190000e+02] [1.24086000e+04 4.11510000e+04 8.94650026e+10 2.24280000e+02] [1.21789000e+04 4.18830000e+04 8.95860725e+10 2.18360000e+02] [1.21835000e+04 4.19280000e+04 8.95094852e+10 2.25200000e+02] [1.21365000e+04 4.20690000e+04 8.94701087e+10 2.18990000e+02]] rowlabels: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] num: 16 ------------------------------new:------------------------------ knobs: [[0. 0.]] metrics: [[1.23137000e+04 4.14700000e+04 8.95614611e+10 2.17990000e+02]] rowlabels: [1] ------------------------------TARGET:------------------------------ knob: ['disable-auto-compactions' 'block-size'] metric: write_latency metric_lessisbetter: 1 ------------------------------------------------------------ num of knobs == 2 knobs: ['disable-auto-compactions' 'block-size'] num of metrics == 4 metrics: ['write_throughput' 'write_latency' 'store_size' 'compaction_cpu'] ------------------------------------------------------------ ################## data ##################

这个实验中推荐结果是启用compaction、同时block size设为4KB。

一开始还挺惊讶的(毕竟按理说写入时关闭 compaction 肯定是提升性能的)。后来分析因为TiKV 里面使用了 Percolator 进行分布式事务,写流程也涉及读操作(写冲突检测),所以关闭 compaction 也导致写入性能下降。同理更小的 block size 提高点查性能,对 TiKV 的写流程性能也有提升。

 

为了排除这一干扰因素,接下来用point lookup这一纯读取的workload进行了试验:

workload=pntlookup80    knobs={'bloom-filter-bits-per-key', 'optimize-filters-for-hits', 'block-size', 'disable-auto-compactions'}    metric=get_latency

# Copyright (c) 2010 Yahoo! Inc. All rights reserved.
#                                                                                                                                                                                 
# Licensed under the Apache License, Version 2.0 (the "License"); you                                                                                                             
# may not use this file except in compliance with the License. You                                                                                                                
# may obtain a copy of the License at                                                                                                                                             
#                                                                                                                                                                                 
# http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                      
#                                                                                                                                                                                 
# Unless required by applicable law or agreed to in writing, software                                                                                                             
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                                               
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or                                                                                                                 
# implied. See the License for the specific language governing                                                                                                                    
# permissions and limitations under the License. See accompanying                                                                                                                 
# LICENSE file.                                                                                                                                                                   

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

# 80GB
# 2min each run

recordcount=80000000
operationcount=4000000
workload=core

readallfields=true

readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0

requestdistribution=zipfian
ycsb workload定义

实验效果如下:

------------------------------previous:------------------------------
rowlabels, finish_time, knobs, metrics
1 , 2019-08-15 20:12:21 , [2. 0. 2. 0.] , [3.66446000e+04 1.39670000e+04 8.62385543e+10 2.36200000e+01]
2 , 2019-08-15 21:01:30 , [2. 0. 2. 1.] , [2.00085000e+04 2.55740000e+04 8.65226052e+10 0.00000000e+00]
3 , 2019-08-15 22:06:48 , [3. 1. 0. 0.] , [4.18042000e+04 1.22580000e+04 8.68646096e+10 4.99000000e+01]
4 , 2019-08-15 23:12:15 , [0. 1. 1. 0.] , [3.97759000e+04 1.28700000e+04 8.64727843e+10 4.36500000e+01]
5 , 2019-08-16 00:18:39 , [3. 1. 1. 0.] , [4.0698500e+04 1.2577000e+04 8.6412687e+10 4.2540000e+01]
6 , 2019-08-16 01:08:15 , [3. 0. 4. 1.] , [1.75872000e+04 2.90890000e+04 8.63167881e+10 1.80000000e-01]
7 , 2019-08-16 02:13:59 , [2. 1. 0. 0.] , [4.14569000e+04 1.23490000e+04 8.68367156e+10 4.94200000e+01]
8 , 2019-08-16 03:20:14 , [0. 1. 3. 0.] , [3.2892000e+04 1.5563000e+04 8.6045883e+10 4.1360000e+01]
9 , 2019-08-16 04:26:29 , [2. 1. 2. 0.] , [3.56923000e+04 1.43400000e+04 8.61031652e+10 3.95600000e+01]
10 , 2019-08-16 05:32:04 , [1. 0. 0. 0.] , [4.09599000e+04 1.25000000e+04 8.69347684e+10 4.80500000e+01]
11 , 2019-08-16 06:38:25 , [3. 0. 0. 0.] , [4.11105000e+04 1.24550000e+04 8.70293207e+10 4.88900000e+01]
12 , 2019-08-16 07:44:29 , [1. 1. 0. 0.] , [4.18002000e+04 1.22470000e+04 8.68315762e+10 4.95400000e+01]
13 , 2019-08-16 08:50:32 , [2. 0. 0. 0.] , [4.21299000e+04 1.21530000e+04 8.69322719e+10 3.92500000e+01]
14 , 2019-08-16 09:56:32 , [0. 0. 0. 0.] , [3.96365000e+04 1.29120000e+04 8.68696194e+10 5.50400000e+01]
15 , 2019-08-16 11:02:19 , [2. 1. 0. 0.] , [4.13551000e+04 1.23780000e+04 8.68479242e+10 5.01600000e+01]
16 , 2019-08-16 12:08:19 , [0. 1. 0. 0.] , [3.98915000e+04 1.28310000e+04 8.68413685e+10 4.53700000e+01]
17 , 2019-08-16 13:14:13 , [2. 1. 0. 0.] , [4.1778800e+04 1.2253000e+04 8.6845963e+10 4.8780000e+01]
18 , 2019-08-16 14:05:52 , [0. 1. 0. 1.] , [1.37462000e+04 3.72160000e+04 8.74961963e+10 0.00000000e+00]
19 , 2019-08-16 15:11:48 , [2. 1. 1. 0.] , [4.03858000e+04 1.26740000e+04 8.64025255e+10 3.95100000e+01]
20 , 2019-08-16 16:18:06 , [0. 0. 2. 0.] , [3.49978000e+04 1.46240000e+04 8.61336679e+10 2.37300000e+01]
21 , 2019-08-16 17:24:02 , [2. 0. 1. 0.] , [4.13509000e+04 1.23770000e+04 8.65494483e+10 2.70600000e+01]
22 , 2019-08-16 18:29:36 , [3. 1. 0. 0.] , [4.18111000e+04 1.22440000e+04 8.68484968e+10 4.96900000e+01]
23 , 2019-08-16 19:36:16 , [1. 0. 1. 0.] , [4.03078000e+04 1.27000000e+04 8.64872698e+10 3.91300000e+01]
24 , 2019-08-16 20:41:55 , [3. 1. 0. 0.] , [4.26687000e+04 1.19980000e+04 8.68488277e+10 3.38800000e+01]
25 , 2019-08-16 21:47:55 , [2. 0. 0. 0.] , [4.19810000e+04 1.21900000e+04 8.69691844e+10 4.00500000e+01]
26 , 2019-08-16 22:54:13 , [3. 1. 0. 0.] , [4.18609000e+04 1.22290000e+04 8.68388398e+10 5.11200000e+01]
27 , 2019-08-17 00:01:29 , [3. 1. 4. 0.] , [2.9123000e+04 1.7575000e+04 8.6027491e+10 4.3860000e+01]
28 , 2019-08-17 01:07:53 , [2. 0. 0. 0.] , [4.12169000e+04 1.24210000e+04 8.69920328e+10 4.67300000e+01]
29 , 2019-08-17 02:13:38 , [3. 1. 0. 0.] , [4.18402000e+04 1.22350000e+04 8.68513516e+10 4.57200000e+01]
30 , 2019-08-17 03:19:31 , [2. 0. 0. 0.] , [4.20812000e+04 1.21640000e+04 8.69824656e+10 4.01500000e+01]
31 , 2019-08-17 04:25:12 , [3. 1. 0. 0.] , [4.16913000e+04 1.22760000e+04 8.68498155e+10 4.98100000e+01]
32 , 2019-08-17 05:31:00 , [3. 0. 0. 0.] , [4.15515000e+04 1.23180000e+04 8.70275493e+10 4.94400000e+01]
33 , 2019-08-17 06:37:15 , [3. 1. 0. 0.] , [4.16460000e+04 1.22920000e+04 8.68442154e+10 4.66100000e+01]
34 , 2019-08-17 07:43:24 , [3. 0. 0. 0.] , [4.22696000e+04 1.21100000e+04 8.70264613e+10 3.65300000e+01]
35 , 2019-08-17 08:49:24 , [3. 1. 0. 0.] , [4.18575000e+04 1.22280000e+04 8.68419002e+10 4.99000000e+01]
36 , 2019-08-17 09:55:36 , [3. 0. 0. 0.] , [4.07931000e+04 1.25500000e+04 8.70300743e+10 4.98500000e+01]
37 , 2019-08-17 11:00:54 , [3. 1. 0. 0.] , [4.19244000e+04 1.22080000e+04 8.68508093e+10 4.98500000e+01]
38 , 2019-08-17 12:06:37 , [3. 0. 0. 0.] , [4.1197800e+04 1.2425000e+04 8.7020173e+10 4.6780000e+01]
39 , 2019-08-17 13:12:35 , [3. 1. 0. 0.] , [4.19859000e+04 1.21920000e+04 8.68462752e+10 4.20200000e+01]
40 , 2019-08-17 14:18:12 , [3. 0. 0. 0.] , [4.09505000e+04 1.25020000e+04 8.70206609e+10 5.18800000e+01]
41 , 2019-08-17 15:23:32 , [3. 1. 0. 0.] , [4.19558000e+04 1.22030000e+04 8.68409963e+10 4.25600000e+01]
42 , 2019-08-17 16:29:22 , [3. 0. 0. 0.] , [4.15804000e+04 1.23100000e+04 8.70172108e+10 4.56500000e+01]
43 , 2019-08-17 17:35:13 , [3. 1. 0. 0.] , [4.16524000e+04 1.22890000e+04 8.68602952e+10 4.62100000e+01]
44 , 2019-08-17 18:41:04 , [3. 0. 0. 0.] , [4.09697000e+04 1.24950000e+04 8.70105798e+10 4.56000000e+01]
45 , 2019-08-17 19:46:55 , [3. 1. 0. 0.] , [4.16999000e+04 1.22770000e+04 8.68411373e+10 4.83400000e+01]
46 , 2019-08-17 20:52:48 , [3. 0. 0. 0.] , [4.11311000e+04 1.24450000e+04 8.70303738e+10 4.90000000e+01]
47 , 2019-08-17 21:58:48 , [3. 1. 0. 0.] , [4.23772000e+04 1.20780000e+04 8.68478265e+10 3.74500000e+01]
48 , 2019-08-17 23:04:49 , [3. 0. 0. 0.] , [4.12347000e+04 1.24120000e+04 8.70284529e+10 3.89000000e+01]
49 , 2019-08-18 00:10:42 , [3. 1. 0. 0.] , [4.29264000e+04 1.19250000e+04 8.68530475e+10 3.23300000e+01]
50 , 2019-08-18 01:16:15 , [3. 0. 0. 0.] , [4.15186000e+04 1.23290000e+04 8.70386584e+10 3.65400000e+01]
51 , 2019-08-18 02:21:36 , [3. 1. 0. 0.] , [4.26975000e+04 1.19900000e+04 8.68521299e+10 4.03900000e+01]
52 , 2019-08-18 03:27:19 , [3. 0. 0. 0.] , [4.08752000e+04 1.25230000e+04 8.70437235e+10 4.79600000e+01]
------------------------------new:------------------------------
knobs:   [[3. 1. 0. 0.]]
metrics:   [[4.21738000e+04 1.21390000e+04 8.68461987e+10 4.58900000e+01]]
rowlabels:   [1]
timestamp:   2019-08-18 04:33:07
------------------------------TARGET:------------------------------
knob:   ['bloom-filter-bits-per-key' 'optimize-filters-for-hits' 'block-size' 'disable-auto-compactions']
metric:   get_latency
metric_lessisbetter:   1
------------------------------------------------------------
num of knobs ==  4
knobs:   ['bloom-filter-bits-per-key' 'optimize-filters-for-hits' 'block-size' 'disable-auto-compactions']
num of metrics ==  4
metrics:   ['get_throughput' 'get_latency' 'store_size' 'compaction_cpu']
------------------------------------------------------------

推荐结果为:bloom-filter-bits-per-key=20,block-size=4K,不disable auto compaction。而optimize-filters-for-hits是否启用影响不大(所以会出现这一项的推荐结果一直在摇摆的情况)。

推荐的结果都挺符合预期的。关于 optimize-filter 这一项,应该是试验里面 block cache 足够大,所以 bloom filter 大小对 cache 性能影响不大;而且我们是设置 default CF 相应的选项,而对于 TiKV 来说查询 default CF 之前我们已经确定相应的 key 肯定存在,所以是否有 filter 并没有影响。之后的试验中我们会设置 writeCF 中的 optimize-filters-for-hits(defaultCF的这一项默认就是0了);然后分别设置 defaultCF 和 writeCF 中的 bloom-filter-bits-per-key,把它们作为两个 knob。

 

workload=pntlookup80    knobs={rocksdb.writecf.bloom-filter-bits-per-key,  rocksdb.defaultcf.bloom-filter-bits-per-key,  rocksdb.writecf.optimize-filters-for-hits,  rocksdb.defaultcf.block-size,  rocksdb.defaultcf.disable-auto-compactions}    metric=get_throughput

为了能尽量测出来 bloom filter 的效果,除了上述改动之外,我们把 workload 也改了一下:把 run phase 的 recordcount 设成 load phase 的两倍大,这样强制有一半的查找对应的 key 不存在,这样应该会测出来 write CF 的 optimize-filters-for-hits 必须关闭。改完之后的 workload 如下:

# Copyright (c) 2010 Yahoo! Inc. All rights reserved.
#                                                                                                                                                                                 
# Licensed under the Apache License, Version 2.0 (the "License"); you                                                                                                             
# may not use this file except in compliance with the License. You                                                                                                                
# may obtain a copy of the License at                                                                                                                                             
#                                                                                                                                                                                 
# http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                      
#                                                                                                                                                                                 
# Unless required by applicable law or agreed to in writing, software                                                                                                             
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                                               
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or                                                                                                                 
# implied. See the License for the specific language governing                                                                                                                    
# permissions and limitations under the License. See accompanying                                                                                                                 
# LICENSE file.                                                                                                                                                                   

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

# 80GB
# 2min each run

recordcount=80000000
operationcount=5000000
workload=core

readallfields=true

readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0

requestdistribution=zipfian
Load, db大小是80GB
# Copyright (c) 2010 Yahoo! Inc. All rights reserved.
#                                                                                                                                                                                 
# Licensed under the Apache License, Version 2.0 (the "License"); you                                                                                                             
# may not use this file except in compliance with the License. You                                                                                                                
# may obtain a copy of the License at                                                                                                                                             
#                                                                                                                                                                                 
# http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                      
#                                                                                                                                                                                 
# Unless required by applicable law or agreed to in writing, software                                                                                                             
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                                               
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or                                                                                                                 
# implied. See the License for the specific language governing                                                                                                                    
# permissions and limitations under the License. See accompanying                                                                                                                 
# LICENSE file.                                                                                                                                                                   

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

# 80GB
# 2min each run

recordcount=160000000
operationcount=5000000
workload=core

readallfields=true

readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0

requestdistribution=zipfian
Run, db大小是160GB

这次的实验效果如下(发现一个很出乎意料的现象喔):

################## data ##################
------------------------------previous:------------------------------
rowlabels, finish_time, knobs, metrics
1 , 2019-08-21 20:11:14 , [3. 0. 0. 1. 0.] , [4.17141000e+04 1.22700000e+04 8.63951981e+10 3.22600000e+01]
2 , 2019-08-21 21:01:50 , [1. 2. 1. 0. 1.] , [1.84393000e+04 2.77530000e+04 8.76023557e+10 0.00000000e+00]
3 , 2019-08-21 21:52:36 , [3. 2. 1. 2. 1.] , [1.7525800e+04 2.9193000e+04 8.6489484e+10 0.0000000e+00]
4 , 2019-08-21 22:57:46 , [3. 2. 1. 4. 0.] , [3.1377400e+04 1.6311000e+04 8.6011209e+10 3.3480000e+01]
5 , 2019-08-22 00:03:06 , [2. 2. 1. 3. 0.] , [3.57383000e+04 1.43210000e+04 8.60386882e+10 3.97000000e+01]
6 , 2019-08-22 00:53:29 , [2. 3. 1. 3. 1.] , [1.78001000e+04 2.87420000e+04 8.64059038e+10 0.00000000e+00]
7 , 2019-08-22 01:58:09 , [2. 3. 1. 1. 0.] , [4.29348000e+04 1.19210000e+04 8.64341341e+10 3.46900000e+01]
8 , 2019-08-22 03:03:35 , [1. 1. 1. 2. 0.] , [3.66121000e+04 1.39810000e+04 8.61325773e+10 4.39700000e+01]
9 , 2019-08-22 04:09:05 , [2. 2. 0. 3. 0.] , [3.59254000e+04 1.42450000e+04 8.60441991e+10 4.09100000e+01]
10 , 2019-08-22 05:14:04 , [2. 2. 1. 0. 0.] , [4.3455900e+04 1.1779000e+04 8.6827261e+10 4.7180000e+01]
11 , 2019-08-22 06:19:28 , [2. 1. 0. 1. 0.] , [4.32743000e+04 1.18270000e+04 8.64087651e+10 2.76900000e+01]
12 , 2019-08-22 07:25:29 , [2. 2. 1. 0. 0.] , [4.37505000e+04 1.16980000e+04 8.68377817e+10 3.66300000e+01]
13 , 2019-08-22 08:30:47 , [2. 0. 0. 1. 0.] , [4.09163000e+04 1.25110000e+04 8.63941229e+10 4.32500000e+01]
14 , 2019-08-22 09:36:14 , [2. 2. 0. 0. 0.] , [4.36414000e+04 1.17270000e+04 8.68246281e+10 4.70600000e+01]
15 , 2019-08-22 10:41:49 , [2. 1. 1. 0. 0.] , [4.29599000e+04 1.19140000e+04 8.68228424e+10 4.91600000e+01]
16 , 2019-08-22 11:47:07 , [2. 1. 0. 0. 0.] , [4.28121000e+04 1.19560000e+04 8.68404965e+10 4.76900000e+01]
17 , 2019-08-22 12:53:03 , [2. 2. 1. 0. 0.] , [4.33225000e+04 1.18150000e+04 8.68531672e+10 4.67000000e+01]
18 , 2019-08-22 13:58:13 , [3. 1. 0. 0. 0.] , [4.42762000e+04 1.15600000e+04 8.68428438e+10 3.25600000e+01]
19 , 2019-08-22 15:03:52 , [2. 2. 1. 0. 0.] , [4.43796000e+04 1.15330000e+04 8.68332426e+10 3.37300000e+01]
20 , 2019-08-22 16:09:26 , [3. 1. 0. 0. 0.] , [4.24397000e+04 1.20590000e+04 8.68403016e+10 5.07300000e+01]
21 , 2019-08-22 17:14:34 , [2. 2. 1. 0. 0.] , [4.35737000e+04 1.17460000e+04 8.68471932e+10 4.73400000e+01]
22 , 2019-08-22 18:19:47 , [3. 1. 0. 0. 0.] , [4.28986000e+04 1.19310000e+04 8.68300705e+10 4.80600000e+01]
23 , 2019-08-22 19:25:22 , [2. 2. 1. 0. 0.] , [4.34617000e+04 1.17780000e+04 8.68395239e+10 4.80400000e+01]
24 , 2019-08-22 20:31:11 , [2. 1. 0. 0. 0.] , [4.32535000e+04 1.18330000e+04 8.68426298e+10 4.46100000e+01]
25 , 2019-08-22 21:36:29 , [3. 2. 1. 0. 0.] , [4.30494000e+04 1.18900000e+04 8.68364294e+10 4.78600000e+01]
26 , 2019-08-22 22:42:20 , [2. 1. 0. 0. 0.] , [4.27872000e+04 1.19630000e+04 8.68309331e+10 4.76100000e+01]
27 , 2019-08-22 23:47:42 , [3. 2. 0. 0. 0.] , [4.32865000e+04 1.18250000e+04 8.68361102e+10 4.83400000e+01]
28 , 2019-08-23 00:53:08 , [2. 1. 1. 0. 0.] , [4.29929000e+04 1.19080000e+04 8.68338814e+10 5.06200000e+01]
29 , 2019-08-23 01:58:37 , [2. 2. 0. 0. 0.] , [4.36637000e+04 1.17220000e+04 8.67981041e+10 4.49300000e+01]
30 , 2019-08-23 03:03:42 , [3. 1. 1. 0. 0.] , [4.30542000e+04 1.18890000e+04 8.68628124e+10 5.10200000e+01]
31 , 2019-08-23 04:09:01 , [2. 2. 0. 0. 0.] , [4.31552000e+04 1.18600000e+04 8.68568929e+10 5.26200000e+01]
32 , 2019-08-23 05:13:59 , [3. 1. 1. 0. 0.] , [4.29512000e+04 1.19180000e+04 8.68360587e+10 5.17800000e+01]
33 , 2019-08-23 06:19:15 , [2. 2. 0. 0. 0.] , [4.34998000e+04 1.17670000e+04 8.68505644e+10 4.75000000e+01]
34 , 2019-08-23 07:24:36 , [3. 1. 1. 0. 0.] , [4.29066000e+04 1.19310000e+04 8.68417278e+10 4.94600000e+01]
35 , 2019-08-23 08:30:13 , [2. 2. 0. 0. 0.] , [4.37385000e+04 1.17030000e+04 8.68307716e+10 4.26100000e+01]
36 , 2019-08-23 09:34:58 , [3. 1. 1. 0. 0.] , [4.29117000e+04 1.19300000e+04 8.68479672e+10 4.71600000e+01]
37 , 2019-08-23 10:40:21 , [2. 2. 0. 0. 0.] , [4.30777000e+04 1.18810000e+04 8.68356132e+10 4.95800000e+01]
38 , 2019-08-23 11:45:43 , [3. 1. 1. 0. 0.] , [4.36291000e+04 1.17310000e+04 8.68428416e+10 4.08700000e+01]
39 , 2019-08-23 12:51:25 , [2. 2. 0. 0. 0.] , [4.36237000e+04 1.17360000e+04 8.68353864e+10 4.00500000e+01]
40 , 2019-08-23 13:57:10 , [3. 1. 1. 0. 0.] , [4.39189000e+04 1.16570000e+04 8.68385229e+10 3.60400000e+01]
------------------------------new:------------------------------
knobs:   [[2. 2. 0. 0. 0.]]
metrics:   [[4.36609000e+04 1.17230000e+04 8.68364011e+10 4.77100000e+01]]
rowlabels:   [1]
timestamp:   2019-08-23 15:02:11
------------------------------TARGET:------------------------------
knob:   ['rocksdb.writecf.bloom-filter-bits-per-key'
 'rocksdb.defaultcf.bloom-filter-bits-per-key'
 'rocksdb.writecf.optimize-filters-for-hits'
 'rocksdb.defaultcf.block-size'
 'rocksdb.defaultcf.disable-auto-compactions']
metric:   get_throughput
metric_lessisbetter:   0
------------------------------------------------------------
num of knobs ==  5
knobs:   ['rocksdb.writecf.bloom-filter-bits-per-key'
 'rocksdb.defaultcf.bloom-filter-bits-per-key'
 'rocksdb.writecf.optimize-filters-for-hits'
 'rocksdb.defaultcf.block-size'
 'rocksdb.defaultcf.disable-auto-compactions']
num of metrics ==  4
metrics:   ['get_throughput' 'get_latency' 'store_size' 'compaction_cpu']
------------------------------------------------------------
################## data ##################

测出来发现推荐配置基本集中在以下两种:

  • {3,1,1,0,0}
    • rocksdb.writecf.bloom-filter-bits-per-key ['rocksdb', 'writecf'] bloom-filter-bits-per-key 20
      rocksdb.defaultcf.bloom-filter-bits-per-key ['rocksdb', 'defaultcf'] bloom-filter-bits-per-key 10
      rocksdb.writecf.optimize-filters-for-hits ['rocksdb', 'writecf'] optimize-filters-for-hits True
      rocksdb.defaultcf.block-size ['rocksdb', 'defaultcf'] block-size 4KB
      rocksdb.defaultcf.disable-auto-compactions ['rocksdb', 'defaultcf'] disable-auto-compactions False
  • {2,2,0,0,0}
    • rocksdb.writecf.bloom-filter-bits-per-key ['rocksdb', 'writecf'] bloom-filter-bits-per-key 15
      rocksdb.defaultcf.bloom-filter-bits-per-key ['rocksdb', 'defaultcf'] bloom-filter-bits-per-key 15
      rocksdb.writecf.optimize-filters-for-hits ['rocksdb', 'writecf'] optimize-filters-for-hits False
      rocksdb.defaultcf.block-size ['rocksdb', 'defaultcf'] block-size 4KB
      rocksdb.defaultcf.disable-auto-compactions ['rocksdb', 'defaultcf'] disable-auto-compactions False

分析了一下,感觉是因为 write CF 比较小,当 block cache size 足够大时,bloom filter 的效果可能就不很明显了。

如果仔细看一下结果,比较如下两个sample,会有个很神奇的发现:

30 , 2019-08-23 03:03:42 , [3. 1. 1. 0. 0.] , [4.30542000e+04 1.18890000e+04 8.68628124e+10 5.10200000e+01]
20 , 2019-08-22 16:09:26 , [3. 1. 0. 0. 0.] , [4.24397000e+04 1.20590000e+04 8.68403016e+10 5.07300000e+01]

它们 knob 的唯一区别就是 30 号关闭了底层 bloom filter(optimize-filters-for-hits==True),20 号启用了底层 bloom filter(optimize-filters-for-hits==False)。结果 20 号的 throughput 比 30 还低了一点,和预期完全不一样。于是我们打开 grafana 琢磨了一下,分别截取了这两个 sample 运行时段的图表:

(两种场景run时候的block-cache-size都是12.8GB,篇幅有限就不截那部分的图了)

图中粉色竖线左边是 load 阶段,右边是 run 阶段。可以看出来这俩情况下 cache hit 其实相差不大,而且 20 号还稍微低一点点。这种情况是因为 bloom filter 本身也是占空间的,如果本来 block cache size 够用,但 bloom filter 占空间又比较大,就会影响 cache hit。这个一开始确实没有预料到。其实这是一个好事情,说明 ML 模型确实可以帮我们发现一些人工想不到的东西。

 

接下来再试验一下short range scan。这次要优化的metric改成scan latency

workload=shortscan    knobs={'bloom-filter-bits-per-key', 'optimize-filters-for-hits', 'block-size', 'disable-auto-compactions'}    metric=scan_latency

# Copyright (c) 2010 Yahoo! Inc. All rights reserved.                                                                                                                             
#                                                                                                                                                                                 
# Licensed under the Apache License, Version 2.0 (the "License"); you                                                                                                             
# may not use this file except in compliance with the License. You                                                                                                                
# may obtain a copy of the License at                                                                                                                                             
#                                                                                                                                                                                 
# http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                      
#                                                                                                                                                                                 
# Unless required by applicable law or agreed to in writing, software                                                                                                             
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                                               
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or                                                                                                                 
# implied. See the License for the specific language governing                                                                                                                    
# permissions and limitations under the License. See accompanying                                                                                                                 
# LICENSE file.                                                                                                                                                                   

# Yahoo! Cloud System Benchmark
# Workload E: Short ranges
#   Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id)
#                        
#   Scan/insert ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

# The insert order is hashed, not ordered. Although the scans are ordered, it does not necessarily
# follow that the data is inserted in order. For example, posts for thread 342 may not be inserted contiguously, but
# instead interspersed with posts from lots of other threads. The way the YCSB client works is that it will pick a start
# key, and then request a number of records; this works fine even for hashed insertion.

recordcount=80000000
operationcount=200000
workload=core

readallfields=true

readproportion=0
updateproportion=0
scanproportion=1
insertproportion=0

requestdistribution=uniform

minscanlength=100
maxscanlength=100

# scanlengthdistribution=uniform
shortscan workload定义

实验结果如下:

################## data ##################
------------------------------previous:------------------------------
rowlabels, finish_time, knobs, metrics
1 , 2019-08-24 18:29:05 , [1. 1. 1. 2. 1.] , [6.72800000e+02 7.53744000e+05 8.64420017e+10 2.40000000e-01]
2 , 2019-08-24 19:20:03 , [0. 3. 1. 3. 1.] , [6.1490000e+02 8.2401700e+05 8.6410917e+10 0.0000000e+00]
3 , 2019-08-24 20:10:54 , [3. 0. 0. 0. 1.] , [6.64200000e+02 7.62370000e+05 8.74716093e+10 0.00000000e+00]
4 , 2019-08-24 21:14:30 , [0. 1. 0. 1. 0.] , [4.05440000e+03 1.25855000e+05 8.64184132e+10 2.80100000e+01]
5 , 2019-08-24 22:18:31 , [2. 1. 0. 3. 0.] , [4.23970000e+03 1.20196000e+05 8.60256954e+10 3.74100000e+01]
6 , 2019-08-24 23:08:55 , [0. 0. 0. 1. 1.] , [7.07000000e+02 7.16597000e+05 8.68539722e+10 0.00000000e+00]
7 , 2019-08-25 00:12:24 , [2. 1. 0. 3. 0.] , [4.5478000e+03 1.1218900e+05 8.6033236e+10 2.5120000e+01]
8 , 2019-08-25 01:04:55 , [1. 3. 1. 4. 1.] , [4.96200000e+02 1.02048400e+06 8.63227618e+10 0.00000000e+00]
9 , 2019-08-25 01:56:06 , [3. 3. 1. 0. 1.] , [6.6310000e+02 7.6451400e+05 8.7654137e+10 0.0000000e+00]
10 , 2019-08-25 02:47:01 , [3. 3. 1. 2. 1.] , [6.66900000e+02 7.60646000e+05 8.65341307e+10 0.00000000e+00]
11 , 2019-08-25 03:51:18 , [1. 1. 0. 2. 0.] , [4.19610000e+03 1.21614000e+05 8.60931486e+10 2.51200000e+01]
12 , 2019-08-25 04:55:47 , [2. 0. 0. 3. 0.] , [4.3978000e+03 1.1592900e+05 8.6036505e+10 3.6290000e+01]
13 , 2019-08-25 05:59:51 , [1. 1. 0. 3. 0.] , [4.35150000e+03 1.17180000e+05 8.60368063e+10 3.63800000e+01]
14 , 2019-08-25 07:03:58 , [2. 0. 0. 2. 0.] , [3.77810000e+03 1.35018000e+05 8.60859856e+10 3.57900000e+01]
15 , 2019-08-25 08:07:51 , [1. 0. 0. 3. 0.] , [4.66590000e+03 1.09339000e+05 8.60241768e+10 2.76200000e+01]
16 , 2019-08-25 09:11:58 , [2. 1. 0. 2. 0.] , [4.09160000e+03 1.24662000e+05 8.60801061e+10 2.85700000e+01]
17 , 2019-08-25 10:16:10 , [0. 0. 0. 2. 0.] , [4.05350000e+03 1.25774000e+05 8.60802488e+10 2.62900000e+01]
18 , 2019-08-25 11:20:09 , [1. 0. 0. 3. 0.] , [4.68850000e+03 1.08877000e+05 8.59966196e+10 2.37400000e+01]
19 , 2019-08-25 12:24:28 , [0. 2. 0. 2. 0.] , [4.25840000e+03 1.19757000e+05 8.60873241e+10 2.42100000e+01]
20 , 2019-08-25 13:29:06 , [1. 0. 0. 2. 0.] , [3.77300000e+03 1.35303000e+05 8.60943509e+10 3.77800000e+01]
21 , 2019-08-25 14:33:43 , [0. 1. 0. 3. 0.] , [4.67830000e+03 1.09096000e+05 8.60373353e+10 2.58500000e+01]
22 , 2019-08-25 15:37:49 , [1. 0. 0. 3. 0.] , [4.72760000e+03 1.07929000e+05 8.60229122e+10 2.41700000e+01]
23 , 2019-08-25 16:42:13 , [0. 1. 0. 2. 0.] , [3.83190000e+03 1.33200000e+05 8.61015852e+10 3.75200000e+01]
24 , 2019-08-25 17:46:31 , [0. 0. 0. 4. 0.] , [4.80830000e+03 1.06059000e+05 8.59515848e+10 3.18500000e+01]
25 , 2019-08-25 18:50:39 , [1. 0. 0. 3. 0.] , [4.51200000e+03 1.13177000e+05 8.60177759e+10 3.22500000e+01]
26 , 2019-08-25 19:54:26 , [0. 2. 0. 4. 0.] , [4.86770000e+03 1.04802000e+05 8.59837067e+10 3.25800000e+01]
27 , 2019-08-25 20:58:22 , [1. 0. 0. 4. 0.] , [4.9614000e+03 1.0285500e+05 8.5950186e+10 3.1870000e+01]
28 , 2019-08-25 22:02:31 , [0. 0. 0. 3. 0.] , [4.37540000e+03 1.16648000e+05 8.60301063e+10 3.36500000e+01]
29 , 2019-08-25 23:06:31 , [1. 2. 0. 4. 0.] , [4.95800000e+03 1.03017000e+05 8.60147679e+10 3.06400000e+01]
30 , 2019-08-26 00:10:15 , [1. 0. 0. 4. 0.] , [5.20820000e+03 9.80490000e+04 8.59992036e+10 3.10200000e+01]
31 , 2019-08-26 01:09:36 , [1. 3. 0. 3. 0.] , [4.63750000e+03 1.10141000e+05 8.60371023e+10 3.01500000e+01]
32 , 2019-08-26 02:10:54 , [1. 1. 0. 4. 0.] , [4.89860000e+03 1.04158000e+05 8.59848252e+10 3.12700000e+01]
33 , 2019-08-26 03:12:48 , [1. 0. 0. 3. 0.] , [4.54700000e+03 1.12233000e+05 8.60197859e+10 3.15300000e+01]
34 , 2019-08-26 04:15:28 , [2. 2. 0. 4. 0.] , [4.95670000e+03 1.02892000e+05 8.60205523e+10 3.21900000e+01]
35 , 2019-08-26 05:18:03 , [1. 0. 0. 4. 0.] , [4.82490000e+03 1.05684000e+05 8.59840325e+10 3.27900000e+01]
36 , 2019-08-26 06:20:38 , [3. 1. 0. 4. 0.] , [4.98140000e+03 1.02350000e+05 8.59992772e+10 3.16700000e+01]
37 , 2019-08-26 07:23:21 , [1. 0. 0. 4. 0.] , [4.97320000e+03 1.02554000e+05 8.59940724e+10 3.17100000e+01]
38 , 2019-08-26 08:26:04 , [3. 3. 0. 3. 0.] , [4.59460000e+03 1.11100000e+05 8.60488145e+10 2.85000000e+01]
39 , 2019-08-26 09:28:30 , [2. 0. 0. 4. 0.] , [4.85840000e+03 1.05104000e+05 8.59982211e+10 3.17800000e+01]
40 , 2019-08-26 10:31:31 , [2. 3. 0. 2. 0.] , [4.13200000e+03 1.23462000e+05 8.61029034e+10 2.78400000e+01]
41 , 2019-08-26 11:35:06 , [1. 0. 0. 4. 0.] , [5.00720000e+03 1.01956000e+05 8.60064623e+10 3.17800000e+01]
42 , 2019-08-26 12:38:18 , [3. 0. 0. 4. 0.] , [4.87100000e+03 1.04930000e+05 8.59962461e+10 3.14800000e+01]
43 , 2019-08-26 13:41:29 , [1. 0. 0. 4. 0.] , [4.9381000e+03 1.0334100e+05 8.6066299e+10 3.2380000e+01]
44 , 2019-08-26 14:44:25 , [2. 1. 0. 4. 0.] , [5.01210000e+03 1.01852000e+05 8.59967147e+10 3.18600000e+01]
45 , 2019-08-26 15:47:21 , [1. 0. 0. 4. 0.] , [4.86200000e+03 1.04912000e+05 8.60001832e+10 3.25000000e+01]
------------------------------new:------------------------------
knobs:   [[3. 0. 1. 4. 0.]]
metrics:   [[5.02470000e+03 1.01642000e+05 8.59832276e+10 3.08800000e+01]]
rowlabels:   [1]
timestamp:   2019-08-26 16:50:32
------------------------------TARGET:------------------------------
knob:   ['rocksdb.writecf.bloom-filter-bits-per-key'
 'rocksdb.defaultcf.bloom-filter-bits-per-key'
 'rocksdb.writecf.optimize-filters-for-hits'
 'rocksdb.defaultcf.block-size'
 'rocksdb.defaultcf.disable-auto-compactions']
metric:   scan_latency
metric_lessisbetter:   1
------------------------------------------------------------
num of knobs ==  5
knobs:   ['rocksdb.writecf.bloom-filter-bits-per-key'
 'rocksdb.defaultcf.bloom-filter-bits-per-key'
 'rocksdb.writecf.optimize-filters-for-hits'
 'rocksdb.defaultcf.block-size'
 'rocksdb.defaultcf.disable-auto-compactions']
num of metrics ==  4
metrics:   ['scan_throughput' 'scan_latency' 'store_size' 'compaction_cpu']
------------------------------------------------------------
################## data ##################

由于时间有限我们先看前 45 轮的结果。这个推荐结果还没有完全收敛,但基本上满足optimize-filters-for-hits==False,block-size==32KB 或者 64KB,disable-auto-compactions==False,这三个也是对结果影响最明显的参数了。根据 Intel 的 SSD 白皮书,SSD 对 32KB 和 64KB 大小的随机读性能其实差不多。bloom filter 的位数对 scan 操作的影响也不大。这个实验结果也是符合预期了。

 

之后我们还会测试long scan的结果

 

Ref:

 

与OtterTune的不同点

我们的试验场景和 OtterTune 还是有一些区别的,主要集中在以下几点:

  • AutoTikv 直接和 DB 运行在同一台机器上,而不是像 OtterTune 一样设置一个集中式的训练服务器。但其实这样并不会占用很多资源,还避免了不同机器配置不一样造成数据不一致的问题。
  • 省去了 workload mapping(OtterTune 加了这一步来从 repository 中挑出和当前 workload 最像的训练样本,而我们目前默认 workload 类型只有一种)
  • 要调的 knobs 比较少,省去了 identity important knobs(OtterTune 是通过 Lasso Regression 选出 10 个最重要的 knob 进行调优)
  • 另外我们重构了 OtterTune 的架构,减少了对具体数据库系统的耦合度。更方便将整个模型和 pipeline 移植到其他系统上(只需修改 controller.py 中具体操作数据库系统的语句即可,其它都不用修改),也更适合比起 SQL 更加轻量的 KV 数据库。
  • 最后我们顺手解决了 OtterTune 中只能调整 global knob,无法调节不同 session 中同名 knob 的问题。

 

Ref:正式开始编码之前对OtterTune的一些详细解析

 

一些扩展思路

由于时间有限,这里只实现了一些很基础的功能。围绕这个模型还有很多可以扩展的地方。这里记录几个扩展思路:

  • Q:如何动态适应不断变化的 workload?(比如一会读一会写)
  • A:可以根据训练样本的 timestamp 设置一个阈值,很久远的就丢弃掉
  •  
  • Q:有时候 ML 模型有可能陷入局部最优(尝试的 knob 组合不全,限于若干个当前效果还不错的 knob 循环推荐了)
  • A:前面讲过在贝叶斯优化中,exploration 和 exploitation 有一个 trade off,是用一个系数 k 决定的。后面会尝试调节这个系数。
  •  
  • 目前对于 enum 类型的 knob,在 ML model 里是以离散化后的数值的形式存储的(比如 0, 1, 2, 3)。如果后面出现了没有明确大小关系的 enum 类型 knob,需要改成 one-hot 的类型。

 

总结

一个复杂的系统需要很多环节的取舍和平衡,才能使得总体运行效果达到最好。这需要对整个系统各个环节都有很深入的理解。调试 AutoTikv 的时候也发现过很多参数设置的结果并不符合预期的情况,后来仔细分析了 grafana 中的图表才发现其中的一些门路:

  • 有些参数对结果的影响并没有很大。比如这个参数起作用的场景根本没有触发,或者说和它相关的硬件并没有出现性能瓶颈
  • 有些参数直接动态调整是达不到效果的,或者需要跑足够长时间的 workload 才能看出效果。例如 block cache size 刚从小改大的一小段时间肯定是装不满的,必须要等 workload 足够把它填满之后,才能看出大缓存对总体 cache hit 的提升效果
  • 有些参数的效果和预期相反,分析了发现该参数其实是有副作用的,在某些场景下就不大行了(比如上面的 bloom filter 那个例子)
  • 有些 workload 并不是完全的读或者写,还会掺杂一些别的操作。而人工判断预期效果的时候很可能忽略这一点(比如上面的 writeheavy)。特别是在实际生产环境中,DBA 并不能提前知道会遇到什么样的 workload。这大概也就是自动调优的作用吧

 

 


Ref:

贝叶斯优化

https://blog.csdn.net/Leon_winter/article/details/86604553

https://blog.csdn.net/a769096214/article/details/80920304

 

 

https://docs.google.com/document/d/1raibF5LLmmYvfYo8rMK_TP4EJPDj2RzlSZFp1a3ligU/edit?ts=5ce5c60a#heading=h.losu3j60zo6r

 

posted on 2019-08-19 22:17  Pentium.Labs  阅读(1728)  评论(2编辑  收藏  举报



Pentium.Lab Since 1998