本文根据 StarRocks 官网 TPC-H 基准测试 操作

准备

硬件

项目 内容
机器 3 台华为云服务器
CPU 16 core
内存 64GB
网络 1Gbits/s
磁盘 高效云盘 200GB

软件

内核版本:Linux 3.10.0-1160.59.1.el7.x86_64

操作系统版本:CentOS Linux release 7.9.2009 (Core)

软件版本:StarRocks 2.5.1,

  • 说明: 1 FE + 3 BE (混部)

部署 StarRocks

具体部署步骤略

架构

1Fe + 3 BE

执行 ddl

执行 sql 文件: sql/tpch/ddl_100/tpch_create.sql 创建 tpch 表

测试数据

StarRocks 官网获取 starrokcs tpch 测试 工具包

wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/tpch-poc-0.1.2.zip
unzip tpch-poc-0.1.2.zip
cd tpch-poc-0.1.2
cd benchmark 

生成数据

  • 就是玩一下,先生成 10 G 数据看看

[root@ecs-0003 benchmark]# ./bin/gen_data/gen-tpch.sh 10 data_100
[INFO] gen 10GB data under /root/tpch-poc-0.1.2/benchmark/data_100
[INFO] generate data...
[INFO] gen data of table: customer
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: lineitem
[INFO] gen <1>th part data of table: lineitem
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen <2>th part data of table: lineitem
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: nation
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: orders
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: parts
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: partsupp
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: region
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] gen data of table: suppliers
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[INFO] refine the data in /root/tpch-poc-0.1.2/benchmark/data_100
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] sed file:/root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
233M	/root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
3.6G	/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
3.6G	/root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
4.0K	/root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
1.7G	/root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
1.2G	/root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
231M	/root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
4.0K	/root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
14M	/root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl

导入数据

[root@ecs-0003 benchmark]# python3 src/db_table_operation.py stream_load data_100
[INFO] 2023-02-16 14:54:16 db_table_operation.py[101] stream load from dir:/root/tpch-poc-0.1.2/benchmark/data_100
[INFO] 2023-02-16 14:54:16 config_util.py[43] concurrency load number for table: customer is not set, use 1 by default.
[INFO] 2023-02-16 14:54:16 db_table_operation.py[25] stream load start. table: customer, path: /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] 2023-02-16 14:54:16 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl -H "column_separator:|"  -H "columns:C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT"  http://10.201.0.198:18030/api/tpch/customer/_stream_load}
[INFO] 2023-02-16 14:54:18 db_table_operation.py[41] stream load success. table: customer, path: /root/tpch-poc-0.1.2/benchmark/data_100/customer.tbl
[INFO] 2023-02-16 14:54:18 config_util.py[41] concurrency load number for table: lineitem is 10.
[INFO] 2023-02-16 14:54:18 db_table_operation.py[25] stream load start. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] 2023-02-16 14:54:18 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1 -H "column_separator:|"  -H "columns:l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment"  http://10.201.0.198:18030/api/tpch/lineitem/_stream_load}
[INFO] 2023-02-16 14:54:18 db_table_operation.py[25] stream load start. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] 2023-02-16 14:54:18 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2 -H "column_separator:|"  -H "columns:l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment"  http://10.201.0.198:18030/api/tpch/lineitem/_stream_load}
[INFO] 2023-02-16 14:54:52 db_table_operation.py[41] stream load success. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.1
[INFO] 2023-02-16 14:54:53 db_table_operation.py[41] stream load success. table: lineitem, path: /root/tpch-poc-0.1.2/benchmark/data_100/lineitem.tbl.2
[INFO] 2023-02-16 14:54:53 config_util.py[41] concurrency load number for table: orders is 5.
[INFO] 2023-02-16 14:54:53 db_table_operation.py[25] stream load start. table: orders, path: /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] 2023-02-16 14:54:53 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl -H "column_separator:|"  -H "columns:o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment"  http://10.201.0.198:18030/api/tpch/orders/_stream_load}
[INFO] 2023-02-16 14:55:07 db_table_operation.py[41] stream load success. table: orders, path: /root/tpch-poc-0.1.2/benchmark/data_100/orders.tbl
[INFO] 2023-02-16 14:55:07 config_util.py[43] concurrency load number for table: part is not set, use 1 by default.
[INFO] 2023-02-16 14:55:07 db_table_operation.py[25] stream load start. table: part, path: /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] 2023-02-16 14:55:07 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl -H "column_separator:|"  -H "columns:p_partkey,p_name,p_mfgr,p_brand,p_type,p_size,p_container,p_retailprice,p_comment"  http://10.201.0.198:18030/api/tpch/part/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: part, path: /root/tpch-poc-0.1.2/benchmark/data_100/part.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: region is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: region, path: /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl -H "column_separator:|"  http://10.201.0.198:18030/api/tpch/region/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: region, path: /root/tpch-poc-0.1.2/benchmark/data_100/region.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: nation is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: nation, path: /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl -H "column_separator:|"  http://10.201.0.198:18030/api/tpch/nation/_stream_load}
[INFO] 2023-02-16 14:55:09 db_table_operation.py[41] stream load success. table: nation, path: /root/tpch-poc-0.1.2/benchmark/data_100/nation.tbl
[INFO] 2023-02-16 14:55:09 config_util.py[43] concurrency load number for table: partsupp is not set, use 1 by default.
[INFO] 2023-02-16 14:55:09 db_table_operation.py[25] stream load start. table: partsupp, path: /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] 2023-02-16 14:55:09 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl -H "column_separator:|"  -H "columns:ps_partkey,ps_suppkey,ps_availqty,ps_supplycost,ps_comment"  http://10.201.0.198:18030/api/tpch/partsupp/_stream_load}
[INFO] 2023-02-16 14:55:16 db_table_operation.py[41] stream load success. table: partsupp, path: /root/tpch-poc-0.1.2/benchmark/data_100/partsupp.tbl
[INFO] 2023-02-16 14:55:16 config_util.py[43] concurrency load number for table: supplier is not set, use 1 by default.
[INFO] 2023-02-16 14:55:16 db_table_operation.py[25] stream load start. table: supplier, path: /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl
[INFO] 2023-02-16 14:55:16 db_table_operation.py[27] stream load command: {curl --location-trusted -u root:123456 -T /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl -H "column_separator:|"  -H "columns:S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT"  http://10.201.0.198:18030/api/tpch/supplier/_stream_load}
[INFO] 2023-02-16 14:55:16 db_table_operation.py[41] stream load success. table: supplier, path: /root/tpch-poc-0.1.2/benchmark/data_100/supplier.tbl

查看导入结果

执行 测试

10G 数据

[root@ecs-0003 benchmark]# python3 src/benchmark.py -p -d tpch
[INFO] 2023-02-16 15:48:09 benchmark.py[217] benchmark args:Namespace(check_result=False, dataset='tpch', log_quiet=False, log_verbose=False, performance=True, scale=100, sql_file='')
[INFO] 2023-02-16 15:48:09 benchmark.py[97] test sql in dirs:[tpch]
[INFO] 2023-02-16 15:48:09 starrocks_lib.py[266] get sql info from sql_dir:/root/tpch-poc-0.1.2/benchmark/sql/tpch/query/tpch
------ dataset: tpch, concurrency: 1 ------
sql\time(ms)\parallel_num	8
q1	378.0
q2	157.0
q3	95.0
q4	77.0
q5	186.0
q6	41.0
q7	192.0
q8	195.0
q9	336.0
q10	96.0
q11	81.0
q12	84.0
q13	208.0
q14	56.0
q15	89.0
q16	69.0
q17	84.0
q18	369.0
q19	77.0
q20	84.0
q21	342.0
q22	110.0

总耗时: 3.4s

  • hive 外表的 TPC-H 测试也跑了一遍,耗时 24s,比原生的 Hive 还是要快太多了

欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文
flink 菜鸟公众号

posted on 2023-02-20 15:55  Flink菜鸟  阅读(551)  评论(0编辑  收藏  举报