Pentium.Labs

System全家桶:https://zhuanlan.zhihu.com/c_1238468913098731520

导航

OtterTune源码解析

为了方便后面对ottertune进行魔(hu)改(gao),需要先搞清楚它的源码结构和pipeline

 

OtterTune分为两大部分:

  server side: 包括一个MySQL数据库(用于存储调优数据,供ml model用),Django(FrontEnd User Interface),Celery(用于调度ML task);

  client side: Target_DBMS(存储用户的业务数据用,支持多种DBMS),Controller(用于控制target DBMS),Driver(用于调用controller,入口是fabfile.py)

 

各个操作的代号(S1-S4, C1-C6)含义可参考:https://www.cnblogs.com/pdev/p/10903628.html

 

                                                                             Client Side                                                                             

1. Driver

Driver是用户运行client的入口。用户并不直接执行controller的命令,而是通过driver来控制它。

Driver使用了Python的fabric库编写。作者在里面预置了很多常用的命令(例如开关target DBMS, run oltpbench等等)。

fabfile.py

这是Driver的核心。在Client侧最后运行操作(C6)就需要调用这个文件。

在C6操作中,需要用fab loop和fab run_loops来让client周期性在target DBMS上采集knob/metric sample(In each loop, it collects target DBMS info, uploads to the server, gets new recommended configuration, installs the config and restarts DBMS. Users can continue to run loops until they are satisfied with the recommended configuration)。

  fab loop               runs one single loop.

 1 @task
 2 def loop():
 3     # free cache, clean Linux PageCache 
 4     free_cache()
 5 
 6     # remove oltpbench log and controller log
 7     clean_logs()
 8 
 9     # restart database, shell "sudo service postgresql restart"
10     restart_database()
11 
12     # check whether there are enough free space on disk
13     if check_disk_usage() > MAX_DISK_USAGE:
14         LOG.WARN('Exceeds max disk usage %s', MAX_DISK_USAGE)
15 
16     # run controller as another process. Run the following command line in "../controller" folder: 
17     # sudo gradle run -PappArgs="-c CONF_controller_config -d output/" --no-daemon > CONF_controller_log
18     p = Process(target=run_controller, args=())
19     p.start()
20     LOG.info('Run the controller')
21 
22     # check whether the controller is ready(has created the log files)
23     while not _ready_to_start_oltpbench():
24         pass
25     # run oltpbench as a background job. Run the following command line in CONF_oltpbench_home folder: 
26     # ./oltpbenchmark -b CONF_oltpbench_workload -c CONF_oltpbench_config --execute=true -s 5 -o outputfile > CONF_oltpbench_log 2>&1 &
27     run_oltpbench_bg()
28     LOG.info('Run OLTP-Bench')
29 
30     # the controller starts the first collection
31 
32     # check whether 'Warmup complete, starting measurements' is in CONF_oltpbench_log file
33     while not _ready_to_start_controller():
34         pass
35     # shell 'sudo kill -2 CTL_PID'
36     # send a signal to the process CTL_PID, where CTL_PID is the content of '../controller/pid.txt' (pid of java controller)
37     signal_controller()
38     LOG.info('Start the first collection')
39 
40     # stop the experiment    
41 
42     # check whether 'Output Raw data into file' is in CONF_oltpbench_log file
43     while not _ready_to_shut_down_controller():
44         pass
45     # shell 'sudo kill -2 CTL_PID'
46     # send a signal to the process CTL_PID, where CTL_PID is the content of '../controller/pid.txt' (pid of java controller)
47     signal_controller()
48     LOG.info('Start the second collection, shut down the controller')
49 
50     # wait until controller exited
51     p.join()
52 
53     # add user defined target objective
54     # add_udf()
55 
56     # save result file: 'knobs.json', 'metrics_after.json', 'metrics_before.json', 'summary.json'
57     save_dbms_result()
58 
59     # upload result to Django web interface
60     upload_result()
61 
62     # get result
63     # shell 'python3 ../../script/query_and_get.py CONF_upload_url CONF_upload_code 5'
64     get_result()
65 
66     # change target DBMS config
67     # shell 'sudo python3 PostgresConf.py next_config CONF_database_conf'
68     change_conf()
View Code

 

 fab run_loops:max_iter=10    runs 10 loops. You can set max_iter to change the maximum iterations.

 1 # intervals of restoring the databse
 2 RELOAD_INTERVAL = 10
 3 
 4 @task
 5 def run_loops(max_iter=1):
 6     # dump database if it's not done before.
 7     # shell 'PGPASSWORD=CONF_password pg_dump -U CONF_username -F c -d CONF_database_name > CONF_database_save_path/CONF_database_name.dump'
 8     dump = dump_database()
 9 
10     for i in range(int(max_iter)):
11         # restore database every RELOAD_INTERVAL
12         # shell these operations:
13         #     PGPASSWORD=CONF_password dropdb -e --if-exists CONF_database_name -U CONF_username
14         #     PGPASSWORD=CONF_password createdb -e CONF_database_name -U CONF_username
15         #     PGPASSWORD=CONF_password pg_restore -U CONF_username -j 8 -F c -d CONF_database_name CONF_database_save_path/CONF_database_name.dump
16         if RELOAD_INTERVAL > 0:
17             if i % RELOAD_INTERVAL == 0:
18                 if i == 0 and dump is False:
19                     restore_database()
20                 elif i > 0:
21                     restore_database()
22 
23         LOG.info('The %s-th Loop Starts / Total Loops %s', i + 1, max_iter)
24         loop()
25         LOG.info('The %s-th Loop Ends / Total Loops %s', i + 1, max_iter)
View Code

 

上面有些以CONF_开头的变量,其实都是从driver_config.json中读到的。在C1操作时我们曾经配置过这个文件。

 1 {
 2   "database_type" : "postgres",
 3   "database_name" : "tpcc",
 4   "database_disk": "/dev/sda1",
 5   "database_conf": "/etc/postgresql/9.6/main/postgresql.conf",
 6   "database_save_path": "/home/tidb/ottertune",
 7   "username" : "postgres",
 8   "password" : "asdfgh",
 9   "oltpbench_home": "/home/tidb/oltpbench",
10   "oltpbench_config": "/home/tidb/oltpbench/config/tpcc_config_postgres.xml",
11   "oltpbench_workload": "tpcc",
12   "oltpbench_log" : "/home/tidb/ottertune/client/driver/oltp.log",
13   "controller_config": "/home/tidb/ottertune/client/controller/config/sample_postgres_config.json",
14   "controller_log" : "/home/tidb/ottertune/client/driver/controller.log",
15   "save_path": "/home/tidb/results",
16   "upload_url" : "http://127.0.0.1:8000",
17   "upload_code" : "I5I10PXK3PK27FM86YYS",
18   "lhs_knob_path" : "/home/tidb/ottertune/client/driver/knobs/postgres-96.json",
19   "lhs_save_path" : "/home/tidb/ottertune/client/driver/configs"
20 }
View Code

 

 2. Controller

在Driver中会通过gradle来启动Controller。命令行参数为"-c CONF_controller_config -d output/"。

其中CONF_controller_config表示Controller的配置文件,这里是sample_postgres_config.json。里面会记录一些关于target DBMS的信息

1 {
2   "database_type" : "postgres",
3   "database_url" : "jdbc:postgresql://localhost:5432/postgres",
4   "username" : "postgres",
5   "password" : "asdfgh",
6   "upload_code" : "DEPRECATED",
7   "upload_url" : "DEPRECATED",
8   "workload_name" : "tpcc"
9 }
View Code

 

Controller的总体结构如下:

 1 Controller
 2 │  README.md
 3 │  log4j.properties
 4 │  build.gradle
 5 ├─config
 6 │    sample_saphana_config.json
 7 │    sample_postgres_config.json
 8 │    sample_mysql_config.json
 9 ├─gradle
10 │  └─wrapper
11 │         gradle-wrapper.properties
12 ├─sample_output
13 │  ├─mysql
14 │  │      summary.json
15 │  │      metrics_before.json
16 │  │      metrics_after.json
17 │  │      knobs.json
18 │  ├─postgres
19 │  │      metrics_after.json
20 │  │      summary.json
21 │  │      metrics_before.json
22 │  │      knobs.json
23 │  └─saphana
24 │         knobs.json
25 │         metrics_after.json
26 │         metrics_before.json
27 │         summary.json
28 └─src
29     ├─main
30     │  └─java
31     │      └─com
32     │          └─controller
33     │              │ ResultUploader.java                upload result
34     │              │ Main.java                          main函数入口
35     │              │ ControllerConfiguration.java       ControllerConfiguration类,用于存储target DBMS的相关信息(from sample_postgres_config.json)
36     │              ├─types
37     │              │    JSONSchemaType.java             json读写库,用于判断输入的json是否合法
38     │              │    DatabaseType.java               预定义可支持的target DBMS的种类, 配合ControllerConfiguration用
39     │              ├─util
40     │              │  │ ClassUtil.java                  常用函数库
41     │              │  │ CollectionUtil.java             常用函数库
42     │              │  │ FileUtil.java                   文件操作库
43     │              │  │ JSONSerializable.java           json读写库
44     │              │  │ JSONUtil.java                   json读写库
45     │              │  │ ValidationUtils.java            json读写库
46     │              │  └─json                            java的json读写库
47     │              │       JSONStringer.java
48     │              │       Test.java
49     │              │       JSONWriter.java
50     │              │       JSONTokener.java
51     │              │       JSONObject.java
52     │              │       JSONString.java
53     │              │       JSONArray.java
54     │              │       JSONException.java
55     │              ├─collectors
56     │              │    MySQLCollector.java             extends DBCollector, 用于收集MySQL的knob/metric。
57     │              │    PostgresCollector.java          extends DBCollector, 用于收集postgres的knob/metric。包括使用SHOW ALL命令收集到的parameters,还有使用SELECT查到的internal metrics。
58     │              │    SAPHanaCollector.java           extends DBCollector, 
59     │              │    DBParameterCollector.java       DBCollector的interface
60     │              │    DBCollector.java                DBCollector类,用于存储一个knob/metric sample。包括dbParameters, dbMetrics两个HashMap。另外也包含一些连接db需要的信息(如url, 用户名密码等)
61     │              └─json_validation_schema             JSONSchemaType用,,用于判断输入的json是否合法
62     │                   summary_schema.json
63     │                   schema.json
64     │                   config_schema.json
65     └─test
66         └─java
67             └─com
68                 └─controller
69                     └─collectors
70                            TestMySQLJSON.java
71                            TestPostgresJSON.java
72                            AbstractJSONValidationTestCase.java
73                            TestInvalidJSON.java
View Code

 

controller\src\main\java\com\controller\Main.java

 在fab loop中,调用Controller的命令行格式为:sudo gradle run -PappArgs="-c CONF_controller_config -d output/" --no-daemon > CONF_controller_log

 这里用到了两个命令行参数:-c用来读配置文件,-d用来指定输出目录

这个java文件配合前面的fabfile.py工作(fabfile.py会通过signal来控制main.java的执行进度)。整个client的pipeline大致如下:

 

这是一份result的示例,包括一次fab loop出来metric_before, metric_after, knobs, summary。

从中我们可以看到metrics分为global和local两类,分别对应整个DBMS的和每个table的metrics(注意原paper的4.1节提到了This works because OtterTune currently only considers global knobs. We defer the problem of tuning table- or component-specific knobs as future work.)。

另外还有一些xls文件,这些是从server的ottertune数据库内dump出来的表。我们重点关注以下两个表:

website_metricdata  (对应每次采集到的metrics,表格各列分别是id, name, creation_time, data[表示Numeric DBMS metrics], metrics[表示All DBMS metrics], dbms_id, session_id。)

website_knobdata  (对应每次采集到的knobs,表格各列分别是id, name, creation_time, data[表示Tunable DBMS parameters], knobs[表示All DBMS parameters], dbms_id, session_id。)

假设我们以Throughput作为总体要优化的metric(即DBMS independent external metric),可以看到在metricdata表的data列中有一项是""throughput_txn_per_sec"": 653.8888888888889,这个和网页上显示出的Throughput是一致的。也就是说,这两列就是server用来训练ml model的Data Repository(但注意metrics这列并没有这一项)。

另外注意,knobdata表的knobs项和knobs.json文件是一致的;而metricdata表的两项数据都和metrics_before.json/metrics_after.json的数据不一致,个人猜测metricdata表的数据是对原始数据进行了降维和cluster的结果。

由此可见,client提交给server的数据包括DBMS internal metrics(这里就是postgres的metrics),DBMS external metrics(这里是throughput_txn_per_sec),knobs

 

controller\src\main\java\com\controller\collectors\PostgresCollector.java

里面定义了获取postgres的parameters和metrics需要的查询语句

parameter查询示例:在psql命令行下执行SHOW ALL;

metric查询示例:在psql命令行下运行select * from pg_stat_database;

 

  

                                                                             Server Side                                                                             

server端对于接收到的(client上传的)result file运行ML model,并推荐一个新的DBMS configuration。其中还涉及到很多web后端的部分(例如区分不同的session来支持多个client等),为简单起见我们忽略这些部分,只关心ML model的实现。

1. website

  根据paper的描述,对于client送来的数据,server会按如下pipeline来推荐新的DBMS configuration:1)Workload characterization, 2)Identify important knobs, 3)Automated Tuning

  pipeline前两步的实现在periodic_tasks.py的run_background_tasks()函数中。这个函数会每隔一段时间自动运行一次(即使client没有上传sample),相当于在后台自动进行数据预处理。

  首先它会从client接收新上传的knob/metrics sample(若client没动作则忽略),并将其和之前上传的相同workload的sample合并起来(knob_data和metric_data)。下面是一个client运行过24次fab loop之后,server侧的knob_data和metric_data的实例(即包含24个sample):下载  

[2019-06-05 02:31:28,692: INFO/MainProcess] Received task: run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]
[2019-06-05 02:31:28,693: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: LEN of workloads
[2019-06-05 02:31:28,695: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: 1
[2019-06-05 02:31:28,696: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: ====in unique_workloads====
[2019-06-05 02:31:28,697: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: tpcc

[2019-06-05 02:31:28,697: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: <QuerySet [<Result: 1>, <Result: 2>, <Result: 3>, <Result: 4>, <Result: 5>, <Result: 6>, <Result: 7>, <Result: 8>, <Result: 9>, <Result: 10>, <Result: 11>, <Result: 12>, <Result: 13>, <Result: 14>, <Result: 15>, <Result: 16>, <Result: 17>, <Result: 18>, <Result: 19>, <Result: 20>, '...(remaining elements truncated)...']>

[2019-06-05 02:31:28,698: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]: ====in unique_workloads====



[2019-06-05 02:31:28,748: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]:     #knob_data
{
'data': array([
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000],
[ 0.00000,  200.00000,  524288.00000,  100.00000,  2.00000, 0.50000,  262144.00000,  300000.00000,  0.00000,  5.00000, 1000.00000,  100.00000,  4294967296.00000,  1.00000,  8.00000, 8.00000,  67108864.00000,  0.00000,  1073741824.00000,  8.00000, 8388608.00000,  83886080.00000,  4.00000,  1.00000, 134217728.00000,  8388608.00000,  4194304.00000,  1.00000, 200.00000,  1048576.00000,  4194304.00000]
]), 
'rowlabels': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], 
'columnlabels': ['global.backend_flush_after', 'global.bgwriter_delay', 'global.bgwriter_flush_after', 'global.bgwriter_lru_maxpages', 'global.bgwriter_lru_multiplier', 'global.checkpoint_completion_target', 'global.checkpoint_flush_after', 'global.checkpoint_timeout', 'global.commit_delay', 'global.commit_siblings', 'global.deadlock_timeout', 'global.default_statistics_target', 'global.effective_cache_size', 'global.effective_io_concurrency', 'global.from_collapse_limit', 'global.join_collapse_limit', 'global.maintenance_work_mem', 'global.max_parallel_workers_per_gather', 'global.max_wal_size', 'global.max_worker_processes', 'global.min_parallel_relation_size', 'global.min_wal_size', 'global.random_page_cost', 'global.seq_page_cost', 'global.shared_buffers', 'global.temp_buffers', 'global.wal_buffers', 'global.wal_sync_method', 'global.wal_writer_delay', 'global.wal_writer_flush_after', 'global.work_mem']
}

[2019-06-05 02:31:28,750: INFO/MainProcess] run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e]:     #metric_data
{
'data': array([
[ 0.00000,  0.00000,  1722.66667,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  136682.66667,  1726.44444,  0.00000,  0.00000, 0.00000,  0.00000,  240.00000,  30344.66667,  3250.00000, 55494.11111,  6327.88889,  547.44444,  2.22222,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  547.44444],
[ 0.00000,  0.00000,  1573.88889,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  104311.11111,  1573.88889,  0.00000,  0.00000, 0.00000,  0.00000,  188.88889,  22995.55556,  2520.11111, 42250.11111,  4916.77778,  420.00000,  1.88889,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  420.00000],
[ 0.00000,  0.00000,  1849.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  127318.37500,  1849.00000,  0.00000,  0.00000, 0.00000,  0.00000,  203.75000,  28032.75000,  3082.12500, 53200.37500,  5787.87500,  542.50000,  3.37500,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  542.50000],
[ 0.00000,  0.00000,  1601.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  107102.11111,  1603.00000,  0.00000,  0.00000, 0.00000,  0.00000,  162.22222,  23739.44444,  2603.77778, 44657.00000,  4714.44444,  455.44444,  1.66667,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  455.44444],
[ 0.00000,  0.00000,  1825.25000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  125129.37500,  1826.50000,  0.00000,  0.00000, 0.00000,  0.00000,  210.00000,  27711.12500,  3052.00000, 51322.00000,  5771.50000,  514.87500,  2.75000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  514.87500],
[ 0.00000,  0.00000,  1859.87500,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  128574.37500,  1859.87500,  0.00000,  0.00000, 0.00000,  0.00000,  222.50000,  28343.62500,  3116.87500, 53665.87500,  6024.37500,  548.62500,  1.87500,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  548.62500],
[ 0.00000,  0.00000,  1713.33333,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  128270.77778,  1713.55556,  0.00000,  0.00000, 0.00000,  0.00000,  208.88889,  28480.11111,  3070.22222, 53108.88889,  5797.77778,  534.11111,  2.33333,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  534.11111],
[ 0.00000,  0.00000,  1708.55556,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  126679.66667,  1708.55556,  0.00000,  0.00000, 0.00000,  0.00000,  196.66667,  27874.77778,  3090.66667, 51776.66667,  5643.66667,  518.88889,  3.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  518.88889],
[ 0.00000,  0.00000,  1745.11111,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  131574.44444,  1749.66667,  0.00000,  0.00000, 0.00000,  0.00000,  217.77778,  28679.00000,  3190.00000, 54254.77778,  6039.11111,  554.22222,  2.33333,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  554.22222],
[ 0.00000,  0.00000,  2122.28571,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  146507.57143,  2122.42857,  0.00000,  0.00000, 0.00000,  0.00000,  230.00000,  32624.71429,  3510.71429, 60823.28571,  6565.14286,  612.14286,  1.85714,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  612.14286],
[ 0.00000,  0.00000,  1632.22222,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  106987.55556,  1632.33333,  0.00000,  0.00000, 0.00000,  0.00000,  203.33333,  23917.22222,  2536.00000, 44102.55556,  5144.11111,  437.77778,  2.33333,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  437.77778],
[ 0.00000,  0.00000,  1726.33333,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  126287.00000,  1726.33333,  0.00000,  0.00000, 0.00000,  0.00000,  186.66667,  27283.55556,  3141.33333, 51592.33333,  5575.77778,  525.66667,  2.66667,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  525.66667],
[ 0.00000,  0.00000,  1704.88889,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  117649.44444,  1708.22222,  0.00000,  0.00000, 0.00000,  0.00000,  212.22222,  25811.11111,  2843.44444, 48149.00000,  5573.44444,  485.55556,  1.66667,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  485.55556],
[ 0.00000,  0.00000,  1749.55556,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  127589.88889,  1752.33333,  0.00000,  0.00000, 0.00000,  0.00000,  216.66667,  27837.77778,  3115.88889, 52392.66667,  5933.22222,  532.88889,  3.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  532.88889],
[ 0.00000,  0.00000,  2029.25000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  160316.37500,  2029.25000,  0.00000,  0.00000, 0.00000,  0.00000,  273.75000,  35874.25000,  3793.75000, 66179.12500,  7341.75000,  653.87500,  2.62500,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  653.87500],
[ 0.00000,  0.00000,  2005.62500,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  148021.12500,  2010.12500,  0.00000,  0.00000, 0.00000,  0.00000,  240.00000,  32267.00000,  3617.00000, 61340.25000,  6775.37500,  629.37500,  3.37500,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  629.37500],
[ 0.00000,  0.00000,  1737.88889,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  126782.00000,  1742.22222,  0.00000,  0.00000, 0.00000,  0.00000,  226.66667,  27851.22222,  3051.00000, 52269.55556,  6005.88889,  527.55556,  1.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  527.55556],
[ 0.00000,  0.00000,  1769.88889,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  127593.44444,  1769.88889,  0.00000,  0.00000, 0.00000,  0.00000,  213.33333,  28690.66667,  3023.88889, 52830.22222,  5797.66667,  526.33333,  2.22222,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  526.33333],
[ 0.00000,  0.00000,  1816.00000,  0.00000,  0.00000,  0.00000, 11.11111,  0.00000,  0.00000,  0.00000,  0.00000,  0.11111, 0.00000,  0.00000,  135974.55556,  1820.11111,  0.00000,  0.00000, 0.00000,  0.00000,  274.44444,  30169.66667,  3218.88889, 55979.66667,  6727.55556,  562.11111,  2.44444,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  562.11111],
[ 0.00000,  0.00000,  1772.88889,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  127041.77778,  1779.33333,  0.00000,  0.00000, 0.00000,  0.00000,  226.66667,  27772.44444,  3067.88889, 52149.66667,  5978.66667,  527.33333,  2.55556,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  527.33333],
[ 0.00000,  0.00000,  1733.22222,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  122698.11111,  1737.11111,  0.00000,  0.00000, 0.00000,  0.00000,  194.44444,  27286.33333,  2970.00000, 50195.22222,  5494.22222,  498.88889,  2.11111,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  498.88889],
[ 0.00000,  0.00000,  1816.33333,  0.00000,  0.00000,  0.00000, 11.11111,  0.00000,  0.00000,  0.00000,  0.00000,  0.11111, 0.00000,  0.00000,  130279.33333,  1816.55556,  0.00000,  0.00000, 0.00000,  0.00000,  223.33333,  28512.55556,  3171.66667, 53534.00000,  6055.66667,  544.11111,  1.77778,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  544.11111],
[ 0.00000,  0.00000,  1816.00000,  0.00000,  0.00000,  0.00000, 15.55556,  0.00000,  0.00000,  0.00000,  0.00000,  0.11111, 0.00000,  0.00000,  135903.00000,  1816.00000,  0.00000,  0.00000, 0.00000,  0.00000,  255.55556,  30097.44444,  3215.44444, 56213.44444,  6563.22222,  560.88889,  2.77778,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  560.88889],
[ 0.00000,  0.00000,  1755.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  121303.22222,  1759.55556,  0.00000,  0.00000, 0.00000,  0.00000,  192.22222,  26446.66667,  2979.33333, 50221.44444,  5474.77778,  513.22222,  2.33333,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  0.00000,  0.00000,  0.00000,  0.00000, 0.00000,  0.00000,  513.22222]
]), 
'rowlabels': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], 
'columnlabels': ['pg_stat_archiver.archived_count', 'pg_stat_archiver.failed_count', 'pg_stat_bgwriter.buffers_alloc', 'pg_stat_bgwriter.buffers_backend', 'pg_stat_bgwriter.buffers_backend_fsync', 'pg_stat_bgwriter.buffers_checkpoint', 'pg_stat_bgwriter.buffers_clean', 'pg_stat_bgwriter.checkpoint_sync_time', 'pg_stat_bgwriter.checkpoint_write_time', 'pg_stat_bgwriter.checkpoints_req', 'pg_stat_bgwriter.checkpoints_timed', 'pg_stat_bgwriter.maxwritten_clean', 'pg_stat_database.blk_read_time', 'pg_stat_database.blk_write_time', 'pg_stat_database.blks_hit', 'pg_stat_database.blks_read', 'pg_stat_database.conflicts', 'pg_stat_database.deadlocks', 'pg_stat_database.temp_bytes', 'pg_stat_database.temp_files', 'pg_stat_database.tup_deleted', 'pg_stat_database.tup_fetched', 'pg_stat_database.tup_inserted', 'pg_stat_database.tup_returned', 'pg_stat_database.tup_updated', 'pg_stat_database.xact_commit', 'pg_stat_database.xact_rollback', 'pg_stat_database_conflicts.confl_bufferpin', 'pg_stat_database_conflicts.confl_deadlock', 'pg_stat_database_conflicts.confl_lock', 'pg_stat_database_conflicts.confl_snapshot', 'pg_stat_database_conflicts.confl_tablespace', 'pg_stat_user_indexes.idx_scan', 'pg_stat_user_indexes.idx_tup_fetch', 'pg_stat_user_indexes.idx_tup_read', 'pg_stat_user_tables.analyze_count', 'pg_stat_user_tables.autoanalyze_count', 'pg_stat_user_tables.autovacuum_count', 'pg_stat_user_tables.idx_scan', 'pg_stat_user_tables.idx_tup_fetch', 'pg_stat_user_tables.n_dead_tup', 'pg_stat_user_tables.n_live_tup', 'pg_stat_user_tables.n_mod_since_analyze', 'pg_stat_user_tables.n_tup_del', 'pg_stat_user_tables.n_tup_hot_upd', 'pg_stat_user_tables.n_tup_ins', 'pg_stat_user_tables.n_tup_upd', 'pg_stat_user_tables.seq_scan', 'pg_stat_user_tables.seq_tup_read', 'pg_stat_user_tables.vacuum_count', 'pg_statio_user_indexes.idx_blks_hit', 'pg_statio_user_indexes.idx_blks_read', 'pg_statio_user_tables.heap_blks_hit', 'pg_statio_user_tables.heap_blks_read', 'pg_statio_user_tables.idx_blks_hit', 'pg_statio_user_tables.idx_blks_read', 'pg_statio_user_tables.tidx_blks_hit', 'pg_statio_user_tables.tidx_blks_read', 'pg_statio_user_tables.toast_blks_hit', 'pg_statio_user_tables.toast_blks_read', 'throughput_txn_per_sec']
}



[2019-06-05 02:31:36,672: ERROR/MainProcess] Task run_background_tasks[bfa9548c-b286-4675-aaa0-7ae6312d757e] raised unexpected: AssertionError('Need more data to train the model',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/tidb/ottertune/server/website/website/tasks/periodic_tasks.py", line 111, in run_background_tasks
    dbms=workload.dbms)
  File "/home/tidb/ottertune/server/website/website/tasks/periodic_tasks.py", line 257, in run_knob_identification
    assert len(nonconst_knob_matrix) > 0, "Need more data to train the model"
AssertionError: Need more data to train the model
View Code

 

  假设N是sample数量,P是每个sample内的knob数量(只包括Tunable knobs,与前面website_knobdata表的data项对应。这里是31个),Q是每个sample内的metric数量(这里是61个)。

    knob_data可以看作一个hash table,包括data(N*P的list,存储所有的Tunable knobs数据),rowlabels(长度N的list,存储1-N的数字),columnlabels(长度P的list,存储每个knob的名字)。

    metric_data也可以看作一个hash table,包括data(N*Q的list,存储所有的metric数据),rowlabels(长度N的list,存储1-N的数字),columnlabels(长度Q的list,存储每个metric的名字)

 

  之后会调用run_workload_characterization()函数,对metric_data进行降维和聚类操作,并存储到pruned_metrics里(pruned_metrics包括这些保留下的metrics的名字)。pruned_metrics里包含的metrics很少,这里只有2个。如下图

[2019-06-07 01:52:31,364: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: ----------pruned_metrics----------
[2019-06-07 01:52:31,364: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: ['pg_stat_bgwriter.buffers_alloc', 'pg_stat_database.tup_returned']
[2019-06-07 01:52:31,364: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: ----------pruned_metrics----------

    对于精简后的metrics data,会被保存到pruned_metric_data中。设当前有36个sample,那么pruned_metrics_data如下所示:

[2019-06-07 01:52:31,367: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: ----------pruned_metric_data----------
[2019-06-07 01:52:31,367: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: {'data': array([[ 1845.111,  73727.111], [ 1861.667,  72836.111], [ 1865.111,  71055.222], [ 1712.889,  54248.889], [ 1836.444,  70974.556], [ 1835.222,  68532.556], [ 1941.889,  70615.778], [ 1826.667,  66300.889], [ 1913.333,  68670.444], [ 1823.889,  60201.778], [ 1901.222,  68815.222], [ 1662.111,  45577.444], [ 1835.111,  67402.778], [ 1896.778,  65052.333], [ 1740.333,  48970.889], [ 2088.500,  68338.500], [ 1762.100,  65846.700], [ 1835.667,  63783.111], [ 1524.600,  40146.100], [ 1734.333,  49186.000], [ 1985.667,  71331.111], [ 1758.889,  49650.222], [ 1988.333,  71295.222], [ 1838.000,  55996.778], [ 2008.222,  71495.222], [ 1439.444,  28370.444], [ 1729.667,  45568.444], [ 1938.111,  65747.333], [ 1910.222,  51863.222], [ 2069.333,  58888.556], [ 2328.333,  66940.778], [ 2408.222,  66183.000], [ 1832.889,  64115.667], [ 1989.000,  63310.778], [ 1861.889,  57102.667], [ 1844.333,  68077.111]]), 'rowlabels': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 'columnlabels': ['pg_stat_bgwriter.buffers_alloc', 'pg_stat_database.tup_returned']}
[2019-06-07 01:52:31,367: INFO/MainProcess] run_background_tasks[085bccf9-8188-4842-baee-c4223a78679f]: ----------pruned_metric_data----------

    再之后调用run_knob_identification()函数,输入knob_data和pruned_metrics,检测出重要的knobs,并返回ranked_knobs。如下图

[2019-06-05 18:33:16,661: INFO/MainProcess] run_background_tasks[82b6937f-ca01-4534-87d1-a1c0a98351c2]: ----------ranked_knobs----------
[2019-06-05 18:33:16,661: INFO/MainProcess] run_background_tasks[82b6937f-ca01-4534-87d1-a1c0a98351c2]: ['global.wal_sync_method', 'global.wal_writer_delay', 'global.wal_buffers', 'global.temp_buffers', 'global.shared_buffers', 'global.seq_page_cost', 'global.random_page_cost', 'global.min_wal_size', 'global.min_parallel_relation_size', 'global.max_worker_processes', 'global.max_wal_size', 'global.max_parallel_workers_per_gather', 'global.maintenance_work_mem', 'global.join_collapse_limit', 'global.from_collapse_limit', 'global.effective_io_concurrency', 'global.effective_cache_size', 'global.default_statistics_target', 'global.backend_flush_after', 'global.bgwriter_delay', 'global.bgwriter_flush_after', 'global.wal_writer_flush_after', 'global.bgwriter_lru_maxpages', 'global.checkpoint_completion_target', 'global.checkpoint_flush_after', 'global.checkpoint_timeout', 'global.commit_delay', 'global.commit_siblings', 'global.deadlock_timeout', 'global.bgwriter_lru_multiplier', 'global.work_mem']
[2019-06-05 18:33:16,661: INFO/MainProcess] run_background_tasks[82b6937f-ca01-4534-87d1-a1c0a98351c2]: ----------ranked_knobs----------

  

下面分别介绍这三个步骤:

1. Workload characterization:use DBMS internal metrics to characterize workload

这步的目的是对原始的metrics进行降维和聚类,将作用相似(相关度很高,values are strongly correlated)的metrics合并。

它一共分为两步:首先是降维技术,称为因子分析,将高维DBMS指标转换为低维数据。 然后我们使用第二种技术,称为k-means ,用于聚类这种低维数据,成为有意义的群体。 使用降维技术是许多聚类算法的预处理步骤,因为它们减少了数据中“噪音”的数量。 这个提高了聚类分析的稳健性和质量。(原文:Using a dimensionality reduction technique is a preprocessing step for many clustering algorithms because they reduce the amount of “noise” in the data [31, 30]. This improves the robustness and the quality of the cluster analysis)

这部分的代码实现位于periodic_tasks.py的run_workload_characterization()函数中。该函数的输入包括(metric_data[原始的metrics sample])。一开始它会对原始数据进行检测,筛选出metrics互不相同的metric sample(因为如果knob全都一样,就无法通过ML model来学习了),存到nonconst_matrix中。

首先对于原始的knob-metric samples,使用Factor Analysis对其进行降维。然后对于降维后的low-dimension data,使用k-means对作用相似的metrics进行聚类。

 

2. Identify important knobs: identify which knobs have strongest impact on DBMS target metric(eg, throughput)

OtterTune uses the Lasso path algorithm [29] to determine the order of importance of the DBMS’s knobs.

OtterTune constructs a set of independent variables (X) and one or more dependent variables (y) from the data in its repository. The independent variables(X) are the DBMS’s knobs (or functions of these knobs) and the dependent variables(y) are the metrics that OtterTune collects during an observation period from the DBMS. OtterTune uses the Lasso path algorithm [29] to determine the order of importance of the DBMS’s knobs. 

在运行Lasso之前,还需要对原始数据进行Normalization(原文sec 5.1最后一段)。另外需要考虑不同knob之间可能存在的dependency(原文sec 5.2)

最后,OtterTune 必须决定在做出配置建议时使用多少个旋钮,旋钮用的太多会明显增加 OtterTune 的调优时间,而旋钮用的太少则难以找到最好的配置。OtterTune 用了一个增量方法来自动化这个过程,在一次调优过程中,逐步增加使用的旋钮。这个方法让 OtterTune 可以先用少量最重要的旋钮来探索并调优配置,然后再扩大范围考虑其他旋钮。

这部分的代码实现位于periodic_tasks.py的run_knob_identification()函数中。

该函数的输入包括(knob_data=knob_data[knobs sample], metric_data=pruned_metric_data[精简后的metrics sample])。首先它会对原始数据进行检测,筛选出knob互不相同的knob/metric sample(因为如果knob全都一样,就无法通过ML model来学习了),存到nonconst_metric_matrix中。之后对其再进行一些预处理(比如识别出可以调节的knob、standardize to N(0,1)等),然后送入lasso path algorithm。最后会输出一个list ranked_knobs,即按重要性排序的knobs名称。

 

3. Automated Tuning:

经过前面两步之后,我们已经有了这些数据:(1) the set of non-redundant metrics  (2) the set of most impactful configuration knobs  (3) the data from previous tuning sessions stored in its data repository

Automated Tuning这部分又可以分为两步操作:1) Identify which workload from a previous tuning session is most emblematic of the target workload. It does this by comparing the session’s metrics with those from the previously seen workloads to see which ones react similarly to different knob settings.   2) Once OtterTune has matched the target workload to the most similar one in 1014its repository, it then starts the second step of the analysis where it chooses a configuration that is explicitly selected to maximize the target objective.

注意在Tuning时,每次只调整最impactful的knobs。

1) Workload Mapping

这一步的目的是对于当前待优化的workload,在data repository中找到一个(之前跑过的)和它最相似的workload。

首先用一个matrix把data repository里的数据都存起来。设S[m][i][j]表示当使用configuration #j来运行workload #i时,metric #m的值。

记m表示任意一个metric。对于当前待优化的workload #w,它和另一个workload #i的差异被定义为向量S[m][w]向量S[m][i]之间的Euclidean distance。记为ED[m][i]

之后对于所有的metrics #m,都计算出它的ED[m][i]。最终workload #i的score定义为所有ED[m][i]的平均值,即Score[i]=Average(ED[m][i] for each m)

最终取出Score最小的workload即可。这个就是和当前待优化的workload最相似的workload。

 

其实这步就是通过计算不同workloads之间的metrics数据的欧氏距离,来确定最接近的一个workload。但要注意这篇paper假设所有workload所运行在的DBMS的硬件配置都是一致的(原文Sec3.1:We assume that this hardware profile is a single identifier from a list of pre-defined types (e.g., an instance type on Amazon EC2). We defer the problem of automatically determining the hardware capabilities of a DBMS deployment to future work.  /  In the first step, OtterTune tries to “understand” the target workload and map it to a workload for the same DBMS and hardware profile that it has seen (and tuned) in the past.),因此对于更一般的情况,即使是同一个workload,在不同硬件配置下跑出的metrics也会有很大差距,这种方法可能就需要进行修改了。

另外在计算之前还要进行一些normalization之类的预处理(Before computing the score, it is critical that all metrics are of the same order of magnitude.)。

 

这部分的代码位于async_tasks.py文件的map_workload()函数中。该函数的输入target_data包括X_matrix(knobs数据集)、y_matrix(metrics数据集)、rowlabels(每个sample的序号,这里有25个sample)、X_columnlabels(knobs的编号,这里有31个)、y_columnlabels(metrics的编号,这里有61个)、newest_result_id(其中最新的,也就是client刚提交的sample序号)。注意target_data中只包括当前待优化的workload种类的sample。如下图:

[2019-06-06 22:08:14,963: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: ##############map_workload called##############
[2019-06-06 22:08:14,965: INFO/MainProcess] Task aggregate_target_results[b7847e24-173a-4c72-9db3-d3e23367ae36] succeeded in 0.09672254799988877s: {'X_matrix': array([[ 0.000,  200.000,  524288.000,  100.000,  2.000,  0.500,
         262144.000,  300000.000,  0.000,  5.000,...
[2019-06-06 22:08:14,965: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: X_matrix
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: (25, 31)
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: [ 0.000  200.000  524288.000  100.000  2.000  0.500  262144.000  300000.000
  0.000  5.000  1000.000  100.000  4294967296.000  1.000  8.000  8.000
  67108864.000  0.000  1073741824.000  8.000  8388608.000  83886080.000
  4.000  1.000  134217728.000  8388608.000  4194304.000  1.000  200.000
  1048576.000  4194304.000]
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: y_matrix
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: (25, 61)
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: [ 0.000  0.000  1845.111  0.000  0.000  0.000  33.333  0.000  0.000  0.000
  0.000  0.333  0.000  0.000  168455.333  1871.000  0.000  0.000  0.000
  0.000  291.111  39427.778  4287.222  73727.111  8055.444  740.778  3.444
  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
  0.000  0.000  0.000  740.778]
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,966: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: rowlabels
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: X_columnlabels
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: ['global.backend_flush_after', 'global.bgwriter_delay', 'global.bgwriter_flush_after', 'global.bgwriter_lru_maxpages', 'global.bgwriter_lru_multiplier', 'global.checkpoint_completion_target', 'global.checkpoint_flush_after', 'global.checkpoint_timeout', 'global.commit_delay', 'global.commit_siblings', 'global.deadlock_timeout', 'global.default_statistics_target', 'global.effective_cache_size', 'global.effective_io_concurrency', 'global.from_collapse_limit', 'global.join_collapse_limit', 'global.maintenance_work_mem', 'global.max_parallel_workers_per_gather', 'global.max_wal_size', 'global.max_worker_processes', 'global.min_parallel_relation_size', 'global.min_wal_size', 'global.random_page_cost', 'global.seq_page_cost', 'global.shared_buffers', 'global.temp_buffers', 'global.wal_buffers', 'global.wal_sync_method', 'global.wal_writer_delay', 'global.wal_writer_flush_after', 'global.work_mem']
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: y_columnlabels
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: ['pg_stat_archiver.archived_count', 'pg_stat_archiver.failed_count', 'pg_stat_bgwriter.buffers_alloc', 'pg_stat_bgwriter.buffers_backend', 'pg_stat_bgwriter.buffers_backend_fsync', 'pg_stat_bgwriter.buffers_checkpoint', 'pg_stat_bgwriter.buffers_clean', 'pg_stat_bgwriter.checkpoint_sync_time', 'pg_stat_bgwriter.checkpoint_write_time', 'pg_stat_bgwriter.checkpoints_req', 'pg_stat_bgwriter.checkpoints_timed', 'pg_stat_bgwriter.maxwritten_clean', 'pg_stat_database.blk_read_time', 'pg_stat_database.blk_write_time', 'pg_stat_database.blks_hit', 'pg_stat_database.blks_read', 'pg_stat_database.conflicts', 'pg_stat_database.deadlocks', 'pg_stat_database.temp_bytes', 'pg_stat_database.temp_files', 'pg_stat_database.tup_deleted', 'pg_stat_database.tup_fetched', 'pg_stat_database.tup_inserted', 'pg_stat_database.tup_returned', 'pg_stat_database.tup_updated', 'pg_stat_database.xact_commit', 'pg_stat_database.xact_rollback', 'pg_stat_database_conflicts.confl_bufferpin', 'pg_stat_database_conflicts.confl_deadlock', 'pg_stat_database_conflicts.confl_lock', 'pg_stat_database_conflicts.confl_snapshot', 'pg_stat_database_conflicts.confl_tablespace', 'pg_stat_user_indexes.idx_scan', 'pg_stat_user_indexes.idx_tup_fetch', 'pg_stat_user_indexes.idx_tup_read', 'pg_stat_user_tables.analyze_count', 'pg_stat_user_tables.autoanalyze_count', 'pg_stat_user_tables.autovacuum_count', 'pg_stat_user_tables.idx_scan', 'pg_stat_user_tables.idx_tup_fetch', 'pg_stat_user_tables.n_dead_tup', 'pg_stat_user_tables.n_live_tup', 'pg_stat_user_tables.n_mod_since_analyze', 'pg_stat_user_tables.n_tup_del', 'pg_stat_user_tables.n_tup_hot_upd', 'pg_stat_user_tables.n_tup_ins', 'pg_stat_user_tables.n_tup_upd', 'pg_stat_user_tables.seq_scan', 'pg_stat_user_tables.seq_tup_read', 'pg_stat_user_tables.vacuum_count', 'pg_statio_user_indexes.idx_blks_hit', 'pg_statio_user_indexes.idx_blks_read', 'pg_statio_user_tables.heap_blks_hit', 'pg_statio_user_tables.heap_blks_read', 'pg_statio_user_tables.idx_blks_hit', 'pg_statio_user_tables.idx_blks_read', 'pg_statio_user_tables.tidx_blks_hit', 'pg_statio_user_tables.tidx_blks_read', 'pg_statio_user_tables.toast_blks_hit', 'pg_statio_user_tables.toast_blks_read', 'throughput_txn_per_sec']
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,967: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: newest_result_id
[2019-06-06 22:08:14,968: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: 25
[2019-06-06 22:08:14,968: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,968: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: bad
[2019-06-06 22:08:14,968: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: False
[2019-06-06 22:08:14,969: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: -----
[2019-06-06 22:08:14,969: INFO/MainProcess] map_workload[1767d38c-4734-494a-b193-4bf8f9669473]: --------------map_workload called--------------
View Code

首先它会对不同种类workload的X_matrix、y_matrix进行预处理。对于X_matrix中的每个sample,按照ranked_knobs中的顺序挑出前10项重要的knobs,并删掉剩下的knobs。对于y_matrix中的每个sample,只保留pruned_metrics中选出的几个重要的metrics的值。然后还会进行一些预处理操作(如StandardScaler)。最终放入X_target、y_target数组中。

之后开始准备计算欧氏距离。注意代码中的操作和paper有不一致:对于每种workload,用这种workload自己的knob/metric samples(即代码中的X_scaled;里面不包括client刚交上来的那一个newest_result)训练一个Gaussian Process的model,然后再放在包含所有sample(代码中的X_target;包括了所有workload各自遇到过的knobs的并集)的数据集上进行predict。最后用这个predict出来的结果与原始的y_target计算欧氏距离:对于每个sample #i,计算dists[i]=sqrt(sum{(predictions[i][m]-y_target[i][m])^2 for all metrics #m}),然后将所有sample的dists[i]值取平均数,即为这个workload的score。

(注意对于y_target中,相同knob在不同workload上的情况,If we have multiple observations from running workload i with configuration j, then entry Xm;i;j is the median of all observed values of metric m.)

最后,取score最小的workload种类,即为返回的结果。

 

2) Configuration Recommendation

这部分的代码位于async_tasks.py文件的configuration_recommendation()函数中

这步使用了Gaussian Process regression,这也是一种ML model。而和神经网络之类的方法相比,高斯过程模型属于无参数模型,相对解决的问题复杂度及与其它算法比较减少了算法计算量;而且在训练样本很少的情况下表现比NN更好。

在这一步中,a Gaussian Process Regression model is trained to estimate the value of the tuning target objective metric under different knob configurations.  最终选出value最好的一个knob configuration方案作为推荐结果。以下就是configuration_recommendation()函数的输入target_data:

 1  ##############configuration_recommendation called##############
 2  X_matrix
 3  (25, 31)
 4  [ 0.000  200.000  524288.000  100.000  2.000  0.500  262144.000  300000.000
 5   0.000  5.000  1000.000  100.000  4294967296.000  1.000  8.000  8.000
 6   67108864.000  0.000  1073741824.000  8.000  8388608.000  83886080.000
 7   4.000  1.000  134217728.000  8388608.000  4194304.000  1.000  200.000
 8   1048576.000  4194304.000]
 9  -----
10  y_matrix
11  (25, 61)
12  [ 0.000  0.000  1845.111  0.000  0.000  0.000  33.333  0.000  0.000  0.000
13   0.000  0.333  0.000  0.000  168455.333  1871.000  0.000  0.000  0.000
14   0.000  291.111  39427.778  4287.222  73727.111  8055.444  740.778  3.444
15   0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
16   0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
17   0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
18   0.000  0.000  0.000  740.778]
19  -----
20  rowlabels
21  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
22  -----
23  X_columnlabels
24  ['global.backend_flush_after', 'global.bgwriter_delay', 'global.bgwriter_flush_after', 'global.bgwriter_lru_maxpages', 'global.bgwriter_lru_multiplier', 'global.checkpoint_completion_target', 'global.checkpoint_flush_after', 'global.checkpoint_timeout', 'global.commit_delay', 'global.commit_siblings', 'global.deadlock_timeout', 'global.default_statistics_target', 'global.effective_cache_size', 'global.effective_io_concurrency', 'global.from_collapse_limit', 'global.join_collapse_limit', 'global.maintenance_work_mem', 'global.max_parallel_workers_per_gather', 'global.max_wal_size', 'global.max_worker_processes', 'global.min_parallel_relation_size', 'global.min_wal_size', 'global.random_page_cost', 'global.seq_page_cost', 'global.shared_buffers', 'global.temp_buffers', 'global.wal_buffers', 'global.wal_sync_method', 'global.wal_writer_delay', 'global.wal_writer_flush_after', 'global.work_mem']
25  -----
26  y_columnlabels
27  ['pg_stat_archiver.archived_count', 'pg_stat_archiver.failed_count', 'pg_stat_bgwriter.buffers_alloc', 'pg_stat_bgwriter.buffers_backend', 'pg_stat_bgwriter.buffers_backend_fsync', 'pg_stat_bgwriter.buffers_checkpoint', 'pg_stat_bgwriter.buffers_clean', 'pg_stat_bgwriter.checkpoint_sync_time', 'pg_stat_bgwriter.checkpoint_write_time', 'pg_stat_bgwriter.checkpoints_req', 'pg_stat_bgwriter.checkpoints_timed', 'pg_stat_bgwriter.maxwritten_clean', 'pg_stat_database.blk_read_time', 'pg_stat_database.blk_write_time', 'pg_stat_database.blks_hit', 'pg_stat_database.blks_read', 'pg_stat_database.conflicts', 'pg_stat_database.deadlocks', 'pg_stat_database.temp_bytes', 'pg_stat_database.temp_files', 'pg_stat_database.tup_deleted', 'pg_stat_database.tup_fetched', 'pg_stat_database.tup_inserted', 'pg_stat_database.tup_returned', 'pg_stat_database.tup_updated', 'pg_stat_database.xact_commit', 'pg_stat_database.xact_rollback', 'pg_stat_database_conflicts.confl_bufferpin', 'pg_stat_database_conflicts.confl_deadlock', 'pg_stat_database_conflicts.confl_lock', 'pg_stat_database_conflicts.confl_snapshot', 'pg_stat_database_conflicts.confl_tablespace', 'pg_stat_user_indexes.idx_scan', 'pg_stat_user_indexes.idx_tup_fetch', 'pg_stat_user_indexes.idx_tup_read', 'pg_stat_user_tables.analyze_count', 'pg_stat_user_tables.autoanalyze_count', 'pg_stat_user_tables.autovacuum_count', 'pg_stat_user_tables.idx_scan', 'pg_stat_user_tables.idx_tup_fetch', 'pg_stat_user_tables.n_dead_tup', 'pg_stat_user_tables.n_live_tup', 'pg_stat_user_tables.n_mod_since_analyze', 'pg_stat_user_tables.n_tup_del', 'pg_stat_user_tables.n_tup_hot_upd', 'pg_stat_user_tables.n_tup_ins', 'pg_stat_user_tables.n_tup_upd', 'pg_stat_user_tables.seq_scan', 'pg_stat_user_tables.seq_tup_read', 'pg_stat_user_tables.vacuum_count', 'pg_statio_user_indexes.idx_blks_hit', 'pg_statio_user_indexes.idx_blks_read', 'pg_statio_user_tables.heap_blks_hit', 'pg_statio_user_tables.heap_blks_read', 'pg_statio_user_tables.idx_blks_hit', 'pg_statio_user_tables.idx_blks_read', 'pg_statio_user_tables.tidx_blks_hit', 'pg_statio_user_tables.tidx_blks_read', 'pg_statio_user_tables.toast_blks_hit', 'pg_statio_user_tables.toast_blks_read', 'throughput_txn_per_sec']
28  -----
29  newest_result_id
30  25
31  -----
32  bad
33  False
34  -----
35  mapped_workload
36  (1, 'tpcc', 11.449912651941599)
37  -----
38  scores
39  {1: ('tpcc', 11.449912651941599)}
40  -----
41  --------------configuration_recommendation called--------------
View Code

这是另一个例子: 

 1 ---------------------------------
 2 X_matrix    (1, 31)
 3 [[0.00000000e+00 2.00000000e+02 5.24288000e+05 1.00000000e+02
 4   2.00000000e+00 5.00000000e-01 2.62144000e+05 3.00000000e+05
 5   0.00000000e+00 5.00000000e+00 1.00000000e+03 1.00000000e+02
 6   4.29496730e+09 1.00000000e+00 8.00000000e+00 8.00000000e+00
 7   6.71088640e+07 0.00000000e+00 1.07374182e+09 8.00000000e+00
 8   8.38860800e+06 8.38860800e+07 4.00000000e+00 1.00000000e+00
 9   1.34217728e+08 8.38860800e+06 4.19430400e+06 1.00000000e+00
10   2.00000000e+02 1.04857600e+06 4.19430400e+06]]
11 ---------------------------------
12 y_matrix    (1, 61)
13 [[0.00000000e+00 0.00000000e+00 2.00057143e+03 0.00000000e+00
14   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
15   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
16   0.00000000e+00 0.00000000e+00 1.11175000e+05 2.00085714e+03
17   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
18   1.97142857e+02 2.62194286e+04 2.82971429e+03 4.85960000e+04
19   5.40785714e+03 4.85285714e+02 7.14285714e-01 0.00000000e+00
20   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
21   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
22   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
23   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
24   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
25   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
26   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
27   0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
28   4.85285714e+02]]
29 ---------------------------------
30 rowlabels
31 [19]
32 ---------------------------------
33 X_columnlabels
34 ['global.backend_flush_after', 'global.bgwriter_delay', 'global.bgwriter_flush_after', 'global.bgwriter_lru_maxpages', 'global.bgwriter_lru_multiplier', 'global.checkpoint_completion_target', 'global.checkpoint_flush_after', 'global.checkpoint_timeout', 'global.commit_delay', 'global.commit_siblings', 'global.deadlock_timeout', 'global.default_statistics_target', 'global.effective_cache_size', 'global.effective_io_concurrency', 'global.from_collapse_limit', 'global.join_collapse_limit', 'global.maintenance_work_mem', 'global.max_parallel_workers_per_gather', 'global.max_wal_size', 'global.max_worker_processes', 'global.min_parallel_relation_size', 'global.min_wal_size', 'global.random_page_cost', 'global.seq_page_cost', 'global.shared_buffers', 'global.temp_buffers', 'global.wal_buffers','global.wal_sync_method', 'global.wal_writer_delay', 'global.wal_writer_flush_after', 'global.work_mem']
35 ---------------------------------
36 y_columnlabels
37 ['pg_stat_archiver.archived_count', 'pg_stat_archiver.failed_count', 'pg_stat_bgwriter.buffers_alloc', 'pg_stat_bgwriter.buffers_backend', 'pg_stat_bgwriter.buffers_backend_fsync', 'pg_stat_bgwriter.buffers_checkpoint', 'pg_stat_bgwriter.buffers_clean', 'pg_stat_bgwriter.checkpoint_sync_time', 'pg_stat_bgwriter.checkpoint_write_time', 'pg_stat_bgwriter.checkpoints_req', 'pg_stat_bgwriter.checkpoints_timed', 'pg_stat_bgwriter.maxwritten_clean', 'pg_stat_database.blk_read_time', 'pg_stat_database.blk_write_time', 'pg_stat_database.blks_hit', 'pg_stat_database.blks_read', 'pg_stat_database.conflicts', 'pg_stat_database.deadlocks', 'pg_stat_database.temp_bytes', 'pg_stat_database.temp_files', 'pg_stat_database.tup_deleted', 'pg_stat_database.tup_fetched', 'pg_stat_database.tup_inserted', 'pg_stat_database.tup_returned', 'pg_stat_database.tup_updated', 'pg_stat_database.xact_commit', 'pg_stat_database.xact_rollback', 'pg_stat_database_conflicts.confl_bufferpin', 'pg_stat_database_conflicts.confl_deadlock', 'pg_stat_database_conflicts.confl_lock', 'pg_stat_database_conflicts.confl_snapshot', 'pg_stat_database_conflicts.confl_tablespace', 'pg_stat_user_indexes.idx_scan', 'pg_stat_user_indexes.idx_tup_fetch', 'pg_stat_user_indexes.idx_tup_read', 'pg_stat_user_tables.analyze_count', 'pg_stat_user_tables.autoanalyze_count', 'pg_stat_user_tables.autovacuum_count', 'pg_stat_user_tables.idx_scan', 'pg_stat_user_tables.idx_tup_fetch', 'pg_stat_user_tables.n_dead_tup', 'pg_stat_user_tables.n_live_tup', 'pg_stat_user_tables.n_mod_since_analyze', 'pg_stat_user_tables.n_tup_del', 'pg_stat_user_tables.n_tup_hot_upd', 'pg_stat_user_tables.n_tup_ins', 'pg_stat_user_tables.n_tup_upd', 'pg_stat_user_tables.seq_scan', 'pg_stat_user_tables.seq_tup_read','pg_stat_user_tables.vacuum_count', 'pg_statio_user_indexes.idx_blks_hit', 'pg_statio_user_indexes.idx_blks_read', 'pg_statio_user_tables.heap_blks_hit', 'pg_statio_user_tables.heap_blks_read', 'pg_statio_user_tables.idx_blks_hit', 'pg_statio_user_tables.idx_blks_read', 'pg_statio_user_tables.tidx_blks_hit', 'pg_statio_user_tables.tidx_blks_read', 'pg_statio_user_tables.toast_blks_hit', 'pg_statio_user_tables.toast_blks_read', 'throughput_txn_per_sec']
38 ---------------------------------
39 newest_result_id
40 19
41 ---------------------------------
42 bad
43 True
44 ---------------------------------
45 config_recommend
46 {'global.backend_flush_after': 1522010, 'global.bgwriter_delay': 9149, 'global.bgwriter_flush_after': 471075, 'global.bgwriter_lru_maxpages': 430, 'global.bgwriter_lru_multiplier': 3.0098488567060078, 'global.checkpoint_completion_target':0.08955746130258635, 'global.checkpoint_flush_after': 455731, 'global.checkpoint_timeout': 19236398, 'global.commit_delay': 77545, 'global.commit_siblings': 464, 'global.deadlock_timeout': 1955746883, 'global.default_statistics_target': 3571, 'global.effective_cache_size': 6287603076, 'global.effective_io_concurrency': 617, 'global.from_collapse_limit': 835373562, 'global.join_collapse_limit': 2113163502, 'global.maintenance_work_mem': 956493954, 'global.max_parallel_workers_per_gather': 774, 'global.max_wal_size': 11811263118911574, 'global.max_worker_processes': 179262, 'global.min_parallel_relation_size': 3563800869080, 'global.min_wal_size': 18601145769953536, 'global.random_page_cost': 9.19520558840551e+307, 'global.seq_page_cost': 1.38643328389727e+308, 'global.shared_buffers': 181678, 'global.temp_buffers': 864188196, 'global.wal_buffers': 431497396, 'global.wal_sync_method': 1, 'global.wal_writer_delay': 1250, 'global.wal_writer_flush_after': 7482882703966, 'global.work_mem': 5870930202}
View Code

 target_data是一个dict,包含以下几个字段:X_matrix(knobs数据集,数据来源包括所有之前的workload【?】+当前的target workload,整体作为训练集)、y_matrix(metrics数据集,数据来源包括所有之前的workload【?】+当前的target workload,整体作为训练集)、rowlabels(每个sample的序号,这里有25个sample,那就是1排到25。其实没啥意义。。。)、X_columnlabels(每个knobs的名称,这里有31个)、y_columnlabels(每个metrics的名称,这里有61个)、newest_result_id(其中最新的,也就是client刚提交的target workload的序号。一般是最大的那个)、bad(True表示当前训练数据太少,是randomly generated的推荐参数。False表示是真的通过ML model推荐出来的)、mapped_workload(mapped workload的序号)、score(我也不知道是个啥qwq...但是用不到啦)、config_recommend(只有bad==True时才出现。当map_workload阶段发现训练数据太少,并randomly generate一个推荐参数时,该字段用来存储生成的参数,configuration_recommendation函数就直接返回这个参数,不通过ML学习了。)。

以下是 X_matrix中knob数值的存储形式 和 postgresql.conf中保存的原始形式的对应表。可以看出基本是去掉单位后转换成一致的量纲即可:表格下载

 

 

 

因为OtterTune bug的原因(详见https://www.cnblogs.com/pdev/p/10903628.html),无法通过运行来调试这部分的代码。我通过阅读configuration_recommendation()函数的源代码整理出了大致的逻辑:

def configuration_recommendation(target_data[input=map_workload的返回结果]):
    X_workload, X_columnlabels = knob data of matched workload
    y_workload, y_columnlabels = metric data of matched workload
    newest_result, X_target, y_target = knob/metric data of target workload

    X_matrix = np.vstack([X_target, X_workload])
    X_scaled = transform(X_matrix)        # Scale to N(0,1)
    y_matrix = np.vstack([y_target, y_workload])
    y_scaled = transform(y_matrix)

    X_samples = randomly generated knobs #随机采样
    X_min, X_max = min/max value constrains on knobs

    model = GPRGD()
    model.fit(X_scaled, y_scaled)
    res = model.predict(X_samples)
    ans = res.find_the_one_with_best_metrics
    return(ans.knobs)

这里省略了对knob进行normalization的一些细节。可以看出:

  model的训练数据来自match到的workload待优化的目标workload
  训练好的model将会在随机采样生成的knobs上预测该配置下metrics的数值,并从中选出最优的,返回它的knob。

 

 

 2. analysis

该目录下的文件主要用于实现Machine Learning model

 

ottertune\server\analysis\
 │ base.py
 │ cluster.py
 │ constraints.py
 │ factor_analysis.py
 │ gp.py      class GPRNP, numpy version of Gaussian Process Regression. map_workload中预测未知workload/knob上的metric value用
 │ gp_tf.py       class GPRGD, tensorflow version of Gaussian Process Regression. configuration_recommendation中推荐configuration用
 │ lasso.py
 │ preprocessing.py
 │ util.py
 │ 
 └─tests
    │ test_cluster.py
    │ test_constraints.py
    │ test_gpr.py
    └─test_preprocessing.py

 

posted on 2019-05-31 07:11  Pentium.Labs  阅读(2375)  评论(0编辑  收藏  举报



Pentium.Lab Since 1998