clickhouse分片部署(3分片单副本采用metrika.xml配置文件)
环境
OS:Centos 7
clickhouse:24.8.11.5
拓扑结构:3分片单副本
节点ip 角色
192.168.1.102 分片1
192.168.1.105 分片2, zookeeper单点
192.168.1.106 分片3
1.安装zookeeper(单机)
参考如下安装部署连接:
https://www.cnblogs.com/hxlasky/p/18683624
#####################clickhouse部署(每个节点都安装)############################
1.准备安装介质
介质下载地址:
https://packages.clickhouse.com/rpm/stable/
或是
https://packagecloud.io/Altinity/clickhouse
我这里下载的是如下三个包
clickhouse-client-24.8.11.5.x86_64.rpm
clickhouse-common-static-24.8.11.5.x86_64.rpm
clickhouse-server-24.8.11.5.x86_64.rpm
[root@node1 soft]# mkdir /soft/clickhouse
[root@node1 soft]# cd /soft/clickhouse
wget https://packages.clickhouse.com/rpm/stable/clickhouse-client-24.8.11.5.x86_64.rpm
wget https://packages.clickhouse.com/rpm/stable/clickhouse-common-static-24.8.11.5.x86_64.rpm
wget https://packages.clickhouse.com/rpm/stable/clickhouse-server-24.8.11.5.x86_64.rpm
2.安装
[root@localhost clickhouse]#rpm -ivh clickhouse-common-static-24.8.11.5.x86_64.rpm
[root@localhost clickhouse]#rpm -ivh clickhouse-client-24.8.11.5.x86_64.rpm
[root@localhost clickhouse]#rpm -ivh clickhouse-server-24.8.11.5.x86_64.rpm
3.修改数据目录和日志目录
创建数据目录和日志目录
[root@localhost ~]# mkdir -p /home/middle/clickhouse/data
[root@localhost ~]# mkdir -p /home/middle/clickhouse/log
[root@localhost ~]# chown -R clickhouse:clickhouse /home/middle/clickhouse/
修改数据目录
vi /etc/clickhouse-server/config.xml
原来的
<!-- Path to data directory, with trailing slash. -->
<path>/var/lib/clickhouse/</path>
修改为
<!-- Path to data directory, with trailing slash. -->
<path>/home/middle/clickhouse/data/</path>
修改日志目录
原来的:
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
修改为:
<level>trace</level>
<log>/home/middle/clickhouse/log/clickhouse-server.log</log>
<errorlog>/home/middle/clickhouse/log/clickhouse-server.err.log</errorlog>
4.配置远程登录
[root@localhost clickhouse]#vi /etc/clickhouse-server/config.xml
把下面2行注释去掉,之前是注释的,现在要开启
原来的:
<!-- Default values - try listen localhost on IPv4 and IPv6. -->
<!--
<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
-->
修改为:
<!-- Default values - try listen localhost on IPv4 and IPv6. -->
<listen_host>::1</listen_host>
<listen_host>0.0.0.0</listen_host>
5.修改允许删除掉任何表和表分区,将如下两行去掉注释,使其生效
[root@localhost clickhouse]#vi /etc/clickhouse-server/config.xml
原来的:
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
修改为:
<max_table_size_to_drop>0</max_table_size_to_drop>
<max_partition_size_to_drop>0</max_partition_size_to_drop>
6.拷贝配置文件到另外的实例
scp /etc/clickhouse-server/config.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/config.xml root@192.168.1.106:/etc/clickhouse-server/
7.启动clickhouse-server
每个节点都尝试启动
centos 7需要使用systemctl启动
systemctl status clickhouse-server.service
systemctl start clickhouse-server.service
systemctl stop clickhouse-server.service
clickhouse默认使用的是9000端口,注意避免端口冲突:
<tcp_port>9000</tcp_port>
8.尝试登录
每个节点都尝试登录,目前是免密登录的
clickhouse-client --host localhost
#####################集群部署##########################
1.修改config.xml配置文件,去掉default栏目定义的默认分片
找到配置文件的<remote_servers>栏目,修改default里面的配置
注释掉原有的<shard>段落,或是直接删除掉,目的是我们自定义默认分片,创建表的时候直接使用on cluster default

2.拷贝配置文件到另外机器
scp /etc/clickhouse-server/config.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/config.xml root@192.168.1.106:/etc/clickhouse-server/
3.准备集群配置文件(/etc/clickhouse-server/config.d/metrika.xml)
vi /etc/clickhouse-server/config.d/metrika.xml
<yandex>
<remote_servers>
<default>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.102</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.105</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.106</host>
<port>9000</port>
</replica>
</shard>
</default>
</remote_servers>
<zookeeper>
<node index="1">
<host>192.168.1.105</host>
<port>2181</port>
</node>
</zookeeper>
<macros>
<shard>01</shard>
<replica>192.168.1.102</replica>
</macros>
<networks>
<ip>::/0</ip>
</networks>
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
</yandex>
说明:
A.红色部分根据分片实际情况修改.
B.我这里分片名称定义为default 当然也可以自己取其他的名称.
4.配置文件拷贝到另外的节点
scp /etc/clickhouse-server/config.d/metrika.xml root@192.168.1.105:/etc/clickhouse-server/config.d/
scp /etc/clickhouse-server/config.d/metrika.xml root@192.168.1.106:/etc/clickhouse-server/config.d/
该配置文件拷贝过去后需要进行修<macros></macros>
分片1:
<macros>
<shard>01</shard>
<replica>192.168.1.102</replica>
</macros>
分片2:
<macros>
<shard>02</shard>
<replica>192.168.1.105</replica>
</macros>
分片3:
<macros>
<shard>03</shard>
<replica>192.168.1.106</replica>
</macros>
5.修改文件权限为clickhouse
chown -R clickhouse:clickhouse /etc/clickhouse-server/config.d
6.重启动每个节点
systemctl restart clickhouse-server.service
7.验证
[root@master clickhouse-server]# clickhouse-client --host 192.168.1.102 --port=9000
select cluster,shard_num,shard_weight,internal_replication,replica_num,host_name,host_address,port,is_local from system.clusters;

创建数据库
CREATE DATABASE db_test ON CLUSTER default;

这里需要加上ON CLUSTER default,这样每个分片都执行创建库,否则只在本地分片上创建而已
创建分布式表
需要先创建本地表
use db_test;
CREATE TABLE tb_test_local01 ON CLUSTER default
(
id Int32,
name String,
timestamp DateTime DEFAULT now(),
day date DEFAULT now()
)
engine = MergeTree
partition by day
order by id
settings index_granularity = 8192;
然后创建分布式表
CREATE TABLE tb_test_all01 ON CLUSTER default(
id Int32,
name String,
timestamp DateTime DEFAULT now(),
day date DEFAULT now()
) ENGINE = Distributed(default, db_test, tb_test_local01,id);
写入数据
insert into tb_test_all01(id,name) values(1,'name1');
insert into tb_test_all01(id,name) values(2,'name2');
insert into tb_test_all01(id,name) values(3,'name3');
insert into tb_test_all01(id,name) values(4,'name4');
insert into tb_test_all01(id,name) values(5,'name5');
insert into tb_test_all01(id,name) values(6,'name6');
insert into tb_test_all01(id,name) values(7,'name7');
insert into tb_test_all01(id,name) values(8,'name8');
insert into tb_test_all01(id,name) values(9,'name9');
insert into tb_test_all01(id,name) values(10,'name10');
查询分布式表

可以看到所有的数据都可以查得到,我们当前的default账号是没有密码的,在<shard></shard>段落里也没有指定账号密码,若是配置了default的账号密码需要在<shard></shard>段落指定如下信息:
<user>default</user>
<password>明文密码</password>
分布式表和本地表新增字段和修改字段
参考如下连接
https://www.cnblogs.com/hxlasky/p/18689278
8.设置default账号的密码
修改users.xml配置文件
生产加密密码,我这里明文密码是 clickhouse123 对其加密后生成的串如下:
[root@master clickhouse-server]# echo -n clickhouse123 | openssl dgst -sha256
(stdin)= 2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9
修改配置文件
vi /etc/clickhouse-server/users.xml
在
<users><default></default></users>
加入如下项(红色部分):
<password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>
同时注释掉原有的<password> 段落,如下:

同时确保access_management设置为1,旧版本默认是0的

把该节点的users.xml文件拷贝到其他各节点
scp /etc/clickhouse-server/users.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/users.xml root@192.168.1.106:/etc/clickhouse-server/
每个节点重启动:
systemctl restart clickhouse-server.service
这个时候就需要密码登录了
clickhouse-client --host 192.168.1.102 --port=9000 --password=clickhouse123
clickhouse-client --host 192.168.1.105 --port=9000 --password=clickhouse123
clickhouse-client --host 192.168.1.106 --port=9000 --password=clickhouse123
9.修改/etc/clickhouse-server/config.d/metrika.xml配置文件
若修改了default的账号密码,没有在<shard></shard>添加<user></user><password></password>的话,查询分布式表会报如下错误:

修改集群配置文件
vi /etc/clickhouse-server/config.d/metrika.xml
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.102</host>
<port>9000</port>
<user>default</user>
<password>clickhouse123</password>
</replica>
</shard>
每个节点都需要进行修改,不能使用scp拷贝,因为每个配置文件不一样<macros></macros> 不一样
说明:
这里不能设置加密的密码,若使用如下配置会报错误
<password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>
10.重启集群
systemctl restart clickhouse-server.service
11.创建账号并赋予权限
修改配置文件实现,修改users.xml文件,<user></user>栏目加上如下项目
<hxl>
<password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>
<access_management>0</access_management>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
<allow_databases>
<database>db_hxl</database>
</allow_databases>
</hxl>
password_sha256_hex这里的值是通过如下生成的:
[root@master clickhouse-server]# echo -n clickhouse123 | openssl dgst -sha256
(stdin)= 2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9
也可以使用明文密码
<password>clickhouse</password>
12.sql模式创建用户
说明:命令行模式创建的账号,不会体现在 /etc/clickhouse-server/users.xml 文件中
使用默认账号登录
只在一个节点上执行:
使用default账号登录创建新账号,注意都必须加上on cluster default
clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123
create user hxlsql identified with plaintext_password by 'hxlsql123' on cluster default;
grant all on db_hxl.* to hxlsql on cluster default;
grant cluster on *.* to hxlsql on cluster default;
grant REMOTE ON *.* to hxlsql on cluster default;
使用新账号登录
clickhouse-client -h 192.168.1.102 -m -u hxlsql --password hxlsql123
use db_hxl;
CREATE TABLE tb_test_local04 ON CLUSTER default
(
id Int32,
name String,
timestamp DateTime DEFAULT now(),
day date DEFAULT now()
)
engine = MergeTree
partition by day
order by id
settings index_granularity = 8192;
CREATE TABLE tb_test_all04 ON CLUSTER default(
id Int32,
name String,
timestamp DateTime DEFAULT now(),
day date DEFAULT now()
) ENGINE = Distributed(default, db_hxl, tb_test_local04,id);
insert into tb_test_all04(id,name) values(1,'name1');
insert into tb_test_all04(id,name) values(2,'name2');
insert into tb_test_all04(id,name) values(3,'name3');
insert into tb_test_all04(id,name) values(4,'name4');
insert into tb_test_all04(id,name) values(5,'name5');
insert into tb_test_all04(id,name) values(6,'name6');
insert into tb_test_all04(id,name) values(7,'name7');
insert into tb_test_all04(id,name) values(8,'name8');
insert into tb_test_all04(id,name) values(9,'name9');
insert into tb_test_all04(id,name) values(10,'name10');
13.如何查看是本地 xml配置的账号还是sql模式创建的
[root@master ~]# clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123
[root@master ~]# clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123
select name,storage,auth_type from system.users;
┌─name────┬─storage─────────┬─auth_type──────────┐
1. │ default │ users_xml │ sha256_password │
2. │ hxl01 │ users_xml │ plaintext_password │
3. │ hxl │ users_xml │ sha256_password │
4. │ hxlsql │ local_directory │ plaintext_password │
└─────────┴─────────────────┴────────────────────┘
14.本地表和分布式表说明
删除分布上表不会删除本地表
drop table tb_test_all on cluster default;
删除一个分片的本地表,整个分布式表无法使用
分片规则
分片键sharding_key要求返回一个整形类型的取值,包括Int系列和UInt系列,分片键sharding_key可以使如下几种情况
a.直接指定分片字段,整形字段
ENGINE = Distributed(default, db_test, tb_test_local,id);
B.使用rand()函数
发现不是很均匀
ENGINE = Distributed(default, db_test, tb_test_local,rand());
C.根据整形字段的hash值划分
ENGINE = Distributed(default, db_test, tb_test_local,intHash64(id));
D.hiveHash
ENGINE = Distributed(default, db_test, tb_test_local,hiveHash(id));
本地表写入数据分布式表也可以查得到
没有本地表时也可以创建分布式表但是查询错误

浙公网安备 33010602011771号