clickhouse 19.14.m.n简单测试(更新到25.3)

  ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). 是Yandex开发的。ClickHouse的优势是使用了面向列的DBMS,比MySQL等面向行的数据库有单表统计查询优势,同时又比同样属于面向列DBMS的更有效的节省空间(不适用固定大小存储数据),使用数据压缩后,可以传输更多,延迟小等。如下:

tpch和即席查询

虽说clickhouse支持多表,但是官网宣传的通常是最强项。所以clickhouse的多表和ELT能力可以忽略不计,最多算是能力补上而已。

在clickhouse自身的说明https://clickhouse.com/docs/zh/academic_overview#6-2-benchmarks中,TPCH查询一共22个,有11个不支持,在其余的11个查询中,5个(6个)查询在ClickHouse中执行得更快(Snowfake)。所以clickhouse并不是适合作为tpch和即席查询,但是作为单表统计是可以的。

 

官网:https://clickhouse.yandex/

  安装比较简单,可以使用yum安装,也可以下载rpm(从github https://github.com/ClickHouse/ClickHouse下载),然后离线安装(LZ使用的是离线安装)。

rpm -ivh clickhouse-common-static-22.3.12.19.x86_64.rpm
rpm -ivh clickhouse-client-22.3.12.19.x86_64.rpm
rpm -ivh clickhouse-server-22.3.12.19.x86_64.rpm

新版本会在安装期间提示密码,如下:

[root@hs193 ~]# rpm -ivh clickhouse-server-25.3.5.42.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-server-0:25.3.5.42-1  ################################# [100%]
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-client already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-local already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-benchmark already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-obfuscator already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-compressor already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-format already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper-converter already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-disks to /usr/bin/clickhouse.
Symlink /usr/bin/ch already exists. Will keep it.
Symlink /usr/bin/chl already exists. Will keep it.
Symlink /usr/bin/chc already exists. Will keep it.
Creating clickhouse group if it does not exist.
 groupadd -r clickhouse
Creating clickhouse user if it does not exist.
 useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
Creating log directory /var/log/clickhouse-server/.
Creating data directory /var/lib/clickhouse/.
Creating pid directory /var/run/clickhouse-server.
 chown -R clickhouse:clickhouse '/var/log/clickhouse-server/'
 chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
 chown  clickhouse:clickhouse '/var/lib/clickhouse/'
Set up the password for the default user: 这里设置
Password for the default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
Cannot set 'net_admin' or 'ipc_lock' or 'sys_nice' or 'net_bind_service' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually.
 chown -R clickhouse:clickhouse '/etc/clickhouse-server'

ClickHouse has been successfully installed.

Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client --password

Created symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /usr/lib/systemd/system/clickhouse-server.service.

 

推荐使用22.10+版本(该版本增加了各种randXXX函数用于支持快速生成测试数据 https://blog.csdn.net/neweastsun/article/details/130140681)。

配置远程访问

vi /etc/clickhouse-server/config.xml

 设置默认用户default的密码

vi /etc/clickhouse-server/users.xml

启停

#启动命令
/etc/init.d/clickhouse-server start
#停止
/etc/init.d/clickhouse-server stop
#重启
/etc/init.d/clickhouse-server restart

客户端测试

[root@oracle19c-asm ~]# clickhouse-client --password '123456'
ClickHouse client version 22.3.10.22 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.10 revision 54455.

oracle19c-asm.localdomain :) 
oracle19c-asm.localdomain :) 
oracle19c-asm.localdomain :) show users;

SHOW USERS

Query id: 35784160-967d-46b2-bfa4-31e59ac4387f


0 rows in set. Elapsed: 0.004 sec. 

Received exception from server (version 22.3.10):
Code: 497. DB::Exception: Received from localhost:9000. DB::Exception: default: Not enough privileges. To execute this query it's necessary to have grant SHOW USERS ON *.*. (ACCESS_DENIED)

oracle19c-asm.localdomain :) select current_user;

SELECT current_user

Query id: f8f9e07d-a3ac-456c-b8b9-42b8bdb684e7


0 rows in set. Elapsed: 0.002 sec. 

Received exception from server (version 22.3.10):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'current_user' while processing query: 'SELECT current_user', required columns: 'current_user'. (UNKNOWN_IDENTIFIER)

oracle19c-asm.localdomain :) select user();

SELECT user()

Query id: d28a8548-ce70-4aac-bcf3-d3effa6fa8b9

┌─currentUser()─┐
│ default       │
└───────────────┘

1 rows in set. Elapsed: 0.002 sec. 

oracle19c-asm.localdomain :) exit
Bye.

 造数据(模拟行情)

-- 创建表
CREATE TABLE market_data (
    symbol String,
    trade_date Date,
    open_time DateTime,
    open_price Float64,
    high_price Float64,
    low_price Float64,
    close_price Float64,
    volume UInt64,
    turnover Float64
) ENGINE = MergeTree()
ORDER BY (symbol, trade_date, open_time);

-- 使用generateRandom插入1亿条数据
INSERT INTO market_data
SELECT 
    concat('SYMBOL_', toString(rand() % 1000)) AS symbol,
    toDate('2020-01-01') + (rand() % 365) AS trade_date,
    toDateTime('2020-01-01 09:30:00') + (rand() % 23400) AS open_time,
    100 + (randNormal(0, 5)) AS open_price,
    open_price + abs(randNormal(0, 2)) AS high_price,
    open_price - abs(randNormal(0, 2)) AS low_price,
    open_price + randNormal(0, 1)) AS close_price,
    toUInt64(rand() % 1000000) AS volume,
    (rand() % 1000000) * close_price AS turnover
FROM generateRandom('dummy UInt8', 100000000, 10);

 

导入mysql中的表定义和数据

CREATE TABLE TA_TSHARECURRENTS_INNODB ENGINE = MergeTree order by c_cserialno AS
SELECT * FROM mysql('192.168.XX.XX:3306', 'ta_cs', 'TA_TSHARECURRENTS_INNODB', 'root', '123456');
其中该表有300多万条记录。速度还是很快的。
c_cserialno必须在mysql那边定义为not null,而不管数据是否本身有为null,否则执行时会报"DB::Exception: Sorting key cannot contain nullable columns"。

DBeaver客户端访问

数据类型

clickhouse支持的数据类型和一般的RDBMS以及列式数据库不同,这个比较坑爹。

UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
Float32, Float64
Boolean(不存在,使用 UInt8的0和1代替)
String(任意长度,代替VARCHAR, BLOB, CLOB,或者其他DBMS来的字符数据类型)
FixedString(N) (N bytes长度的字符串)
Date (日期,例如:1970-01-01 )
DateTime (日期时间,例如:1970-01-01 00:00:00)
Enum8, Enum16 (Enum8、Enum16 定义为键值对 ‘String’ = Int8 和 ‘String’ = Int16,例如:x Enum8(‘hello’ = 1, ‘world’ = 2))
Array(T) (数组,复合数据类型,例如:select [1, 2] AS x, toTypeName(x) ;)
Tuple(T1, T2, …) (元组,复合数据类型,例如:SELECT tuple(1,’a’) AS x, toTypeName(x) ;)
Nullable(TypeName) (没有值的时候填充null,TypeName不可复合数据类型,例如:CREATE TABLE t_null(x Int8,y Nullable(Int8)))

SQL语法以及内置函数

除此之外,clickhouse很多的内置函数也都不同于主流的RDBMS,sql语法差异也比较大,在LZ测试的比较复杂的SQL中,mariadb columnstore/greenplum稍微调整下就能支持,到了clickhouse,半天没改出来,报错信息也各种迷糊,虽然文档比较全了,但是感觉到真正广泛使用还是友好性太差。

clickhouse的优势

分页查询很快,这是其相比其它列式数据库如columnstore/greenplum最大的优势。

-- 500毫秒
select * from TA_TCUSTOMERINFO_INNODB x order by x.c_fundacco limit 100;
-- 1秒内
select * from TA_TCUSTOMERINFO_INNODB x order by x.c_fundacco limit 100000,100;
--- 7.5秒
select * from TA_TCUSTOMERINFO_INNODB x order by x.c_fundacco limit 2000000,100;

性能监控(等待事件)

现在,clickhouse也支持等待事件。可以参考https://dev.to/shiviyer/tips-and-tricks-for-troubleshooting-clickhouse-wait-events-lc1,https://chistadata.com/how-locks-and-wait-events-are-implemented-in-clickhouse/

常见FAQ

Q1:无法在本地连接ClickHouse,报错:Connecton reject。 A1:clickhouse默认不放开远程访问,可以通过修改配置文件的方式放开。
修改服务器的配置文件/etc/clickhouse-server/config.xml,第65行,放开注释即可。
修改之后,重启服务:
service clickhouse-server stop
service clickhouse-server start

Q2:修改metrika.xml里面的shard或者replica参数之后,数据库无法启动;
A2:如果之前已经建了Replica表,那么必须在修改上述两个参数之前把已经创建好的Replica表删掉,才能去修改,否则会无法启动。报错日志放在/var/log/clickhouse-server/下。 

 clickhouse postgresql_fdw的性能测试如下:

https://www.percona.com/blog/2019/05/09/improving-olap-workload-performance-for-postgresql-with-clickhouse-database/

https://www.percona.com/blog/2019/05/01/benchmark-clickhouse-database-and-clickhousedb_fdw/

参考:

https://blog.csdn.net/yangwei234/article/details/124111943 clickhouse入门

clickhouse可视化运维工具ckman https://blog.csdn.net/u010280075/article/details/135716052 

posted @ 2019-12-15 11:36  zhjh256  阅读(1155)  评论(0)    收藏  举报