clickhouse分片部署(3分片单副本采用metrika.xml配置文件)

环境
OS:Centos 7
clickhouse:24.8.11.5
拓扑结构:3分片单副本
节点ip              角色
192.168.1.102 分片1
192.168.1.105 分片2, zookeeper单点
192.168.1.106 分片3

 

1.安装zookeeper(单机)
参考如下安装部署连接:
https://www.cnblogs.com/hxlasky/p/18683624

 

#####################clickhouse部署(每个节点都安装)############################

1.准备安装介质
介质下载地址:
https://packages.clickhouse.com/rpm/stable/
或是
https://packagecloud.io/Altinity/clickhouse

我这里下载的是如下三个包
clickhouse-client-24.8.11.5.x86_64.rpm
clickhouse-common-static-24.8.11.5.x86_64.rpm
clickhouse-server-24.8.11.5.x86_64.rpm

[root@node1 soft]# mkdir /soft/clickhouse
[root@node1 soft]# cd /soft/clickhouse

wget https://packages.clickhouse.com/rpm/stable/clickhouse-client-24.8.11.5.x86_64.rpm
wget https://packages.clickhouse.com/rpm/stable/clickhouse-common-static-24.8.11.5.x86_64.rpm
wget https://packages.clickhouse.com/rpm/stable/clickhouse-server-24.8.11.5.x86_64.rpm

 

2.安装
[root@localhost clickhouse]#rpm -ivh clickhouse-common-static-24.8.11.5.x86_64.rpm
[root@localhost clickhouse]#rpm -ivh clickhouse-client-24.8.11.5.x86_64.rpm
[root@localhost clickhouse]#rpm -ivh clickhouse-server-24.8.11.5.x86_64.rpm

 

3.修改数据目录和日志目录

创建数据目录和日志目录
[root@localhost ~]# mkdir -p /home/middle/clickhouse/data
[root@localhost ~]# mkdir -p /home/middle/clickhouse/log
[root@localhost ~]# chown -R clickhouse:clickhouse /home/middle/clickhouse/

修改数据目录

vi /etc/clickhouse-server/config.xml

原来的
    <!-- Path to data directory, with trailing slash. -->
    <path>/var/lib/clickhouse/</path>
修改为
    <!-- Path to data directory, with trailing slash. -->
    <path>/home/middle/clickhouse/data/</path>

 

修改日志目录

原来的:
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
修改为:
        <level>trace</level>
        <log>/home/middle/clickhouse/log/clickhouse-server.log</log>
        <errorlog>/home/middle/clickhouse/log/clickhouse-server.err.log</errorlog>

 

4.配置远程登录

[root@localhost clickhouse]#vi /etc/clickhouse-server/config.xml
把下面2行注释去掉,之前是注释的,现在要开启

原来的:
    <!-- Default values - try listen localhost on IPv4 and IPv6. -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->
修改为:
    <!-- Default values - try listen localhost on IPv4 and IPv6. -->
    
    <listen_host>::1</listen_host>
<listen_host>0.0.0.0</listen_host>

 

5.修改允许删除掉任何表和表分区,将如下两行去掉注释,使其生效

[root@localhost clickhouse]#vi /etc/clickhouse-server/config.xml

原来的:
    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
修改为:
    <max_table_size_to_drop>0</max_table_size_to_drop> 
    <max_partition_size_to_drop>0</max_partition_size_to_drop>

 

6.拷贝配置文件到另外的实例

scp /etc/clickhouse-server/config.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/config.xml root@192.168.1.106:/etc/clickhouse-server/

 

7.启动clickhouse-server

每个节点都尝试启动
centos 7需要使用systemctl启动
systemctl status clickhouse-server.service
systemctl start clickhouse-server.service
systemctl stop clickhouse-server.service

clickhouse默认使用的是9000端口,注意避免端口冲突:
<tcp_port>9000</tcp_port>

 

8.尝试登录

每个节点都尝试登录,目前是免密登录的
clickhouse-client --host localhost

 

#####################集群部署##########################

1.修改config.xml配置文件,去掉default栏目定义的默认分片

找到配置文件的<remote_servers>栏目,修改default里面的配置
注释掉原有的<shard>段落,或是直接删除掉,目的是我们自定义默认分片,创建表的时候直接使用on cluster default

 

2.拷贝配置文件到另外机器
scp /etc/clickhouse-server/config.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/config.xml root@192.168.1.106:/etc/clickhouse-server/

 

 3.准备集群配置文件(/etc/clickhouse-server/config.d/metrika.xml)

vi /etc/clickhouse-server/config.d/metrika.xml

<yandex>
    <remote_servers>
        <default>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.1.102</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.1.105</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.1.106</host>
                    <port>9000</port>
                </replica>
            </shard>
        </default>
    </remote_servers>
   
    <zookeeper>
        <node index="1">
            <host>192.168.1.105</host>
            <port>2181</port>
        </node>
    </zookeeper>
    <macros>
        <shard>01</shard> 
        <replica>192.168.1.102</replica>
    </macros>
    <networks>
        <ip>::/0</ip>
    </networks>
    <clickhouse_compression>
        <case>
            <min_part_size>10000000000</min_part_size>
            <min_part_size_ratio>0.01</min_part_size_ratio>
            <method>lz4</method>
        </case>
    </clickhouse_compression>
</yandex>

 

说明:
A.红色部分根据分片实际情况修改.
B.我这里分片名称定义为default 当然也可以自己取其他的名称.

 

4.配置文件拷贝到另外的节点

scp /etc/clickhouse-server/config.d/metrika.xml root@192.168.1.105:/etc/clickhouse-server/config.d/
scp /etc/clickhouse-server/config.d/metrika.xml root@192.168.1.106:/etc/clickhouse-server/config.d/

 

该配置文件拷贝过去后需要进行修<macros></macros>

分片1:
    <macros>
        <shard>01</shard>
        <replica>192.168.1.102</replica>
    </macros>
分片2:
    <macros>
        <shard>02</shard> 
        <replica>192.168.1.105</replica>
    </macros>
分片3:
    <macros>
        <shard>03</shard> 
        <replica>192.168.1.106</replica>
    </macros>

 

5.修改文件权限为clickhouse
chown -R clickhouse:clickhouse /etc/clickhouse-server/config.d

 

6.重启动每个节点

systemctl restart clickhouse-server.service

 

7.验证

[root@master clickhouse-server]# clickhouse-client --host 192.168.1.102 --port=9000

select cluster,shard_num,shard_weight,internal_replication,replica_num,host_name,host_address,port,is_local from system.clusters;

 

 

 

创建数据库

CREATE DATABASE db_test ON CLUSTER default;

 

这里需要加上ON CLUSTER default,这样每个分片都执行创建库,否则只在本地分片上创建而已

 

创建分布式表

 

需要先创建本地表
use db_test;
CREATE TABLE tb_test_local01 ON CLUSTER default
(
   id Int32,
   name String,
   timestamp DateTime DEFAULT now(),
   day date DEFAULT now()
)
engine = MergeTree
partition by day
order by id
settings index_granularity = 8192;


然后创建分布式表
CREATE TABLE tb_test_all01 ON CLUSTER default(
   id Int32,
   name String,
   timestamp DateTime DEFAULT now(),
   day date DEFAULT now()
) ENGINE = Distributed(default, db_test, tb_test_local01,id);

写入数据
insert into tb_test_all01(id,name) values(1,'name1');
insert into tb_test_all01(id,name) values(2,'name2');
insert into tb_test_all01(id,name) values(3,'name3');
insert into tb_test_all01(id,name) values(4,'name4');
insert into tb_test_all01(id,name) values(5,'name5');
insert into tb_test_all01(id,name) values(6,'name6');
insert into tb_test_all01(id,name) values(7,'name7');
insert into tb_test_all01(id,name) values(8,'name8');
insert into tb_test_all01(id,name) values(9,'name9');
insert into tb_test_all01(id,name) values(10,'name10');

 

查询分布式表

 

可以看到所有的数据都可以查得到,我们当前的default账号是没有密码的,在<shard></shard>段落里也没有指定账号密码,若是配置了default的账号密码需要在<shard></shard>段落指定如下信息:
<user>default</user>
<password>明文密码</password>

 

分布式表和本地表新增字段和修改字段

参考如下连接

https://www.cnblogs.com/hxlasky/p/18689278

 

8.设置default账号的密码

修改users.xml配置文件

生产加密密码,我这里明文密码是 clickhouse123 对其加密后生成的串如下:
[root@master clickhouse-server]# echo -n clickhouse123 | openssl dgst -sha256
(stdin)= 2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9

修改配置文件
vi /etc/clickhouse-server/users.xml

<users><default></default></users>
加入如下项(红色部分):
<password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>

同时注释掉原有的<password> 段落,如下:

 同时确保access_management设置为1,旧版本默认是0的

 

把该节点的users.xml文件拷贝到其他各节点
scp /etc/clickhouse-server/users.xml root@192.168.1.105:/etc/clickhouse-server/
scp /etc/clickhouse-server/users.xml root@192.168.1.106:/etc/clickhouse-server/

每个节点重启动:
systemctl restart clickhouse-server.service

这个时候就需要密码登录了
clickhouse-client --host 192.168.1.102 --port=9000 --password=clickhouse123
clickhouse-client --host 192.168.1.105 --port=9000 --password=clickhouse123
clickhouse-client --host 192.168.1.106 --port=9000 --password=clickhouse123

 

9.修改/etc/clickhouse-server/config.d/metrika.xml配置文件

若修改了default的账号密码,没有在<shard></shard>添加<user></user><password></password>的话,查询分布式表会报如下错误:

 

修改集群配置文件

vi /etc/clickhouse-server/config.d/metrika.xml

            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>192.168.1.102</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>clickhouse123</password>
                </replica>
            </shard>

每个节点都需要进行修改,不能使用scp拷贝,因为每个配置文件不一样<macros></macros> 不一样
说明:
这里不能设置加密的密码,若使用如下配置会报错误
<password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>

 

10.重启集群

systemctl restart clickhouse-server.service

 

11.创建账号并赋予权限

修改配置文件实现,修改users.xml文件,<user></user>栏目加上如下项目

      <hxl>
        <password_sha256_hex>2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9</password_sha256_hex>
        <access_management>0</access_management>
        <networks incl="networks" replace="replace">
                <ip>::/0</ip>
        </networks>
        <profile>default</profile>
        <quota>default</quota>
        <allow_databases>
                <database>db_hxl</database>
        </allow_databases>
      </hxl>

password_sha256_hex这里的值是通过如下生成的:
[root@master clickhouse-server]# echo -n clickhouse123 | openssl dgst -sha256
(stdin)= 2cca9d8714615f4132390a3db9296d39ec051b3faff87be7ea5f7fe0e2de14c9

也可以使用明文密码
<password>clickhouse</password>

 

12.sql模式创建用户

说明:命令行模式创建的账号,不会体现在 /etc/clickhouse-server/users.xml 文件中
使用默认账号登录

只在一个节点上执行:
使用default账号登录创建新账号,注意都必须加上on cluster default

clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123

create user hxlsql identified with plaintext_password by 'hxlsql123' on cluster default;
grant all on db_hxl.* to hxlsql on cluster default;
grant cluster on *.* to hxlsql on cluster default;
grant REMOTE ON *.* to hxlsql on cluster default;

使用新账号登录
clickhouse-client -h 192.168.1.102 -m -u hxlsql --password hxlsql123

use db_hxl;
CREATE TABLE tb_test_local04 ON CLUSTER default
(
   id Int32,
   name String,
   timestamp DateTime DEFAULT now(),
   day date DEFAULT now()
)
engine = MergeTree
partition by day
order by id
settings index_granularity = 8192;


CREATE TABLE tb_test_all04 ON CLUSTER default(
   id Int32,
   name String,
   timestamp DateTime DEFAULT now(),
   day date DEFAULT now()
) ENGINE = Distributed(default, db_hxl, tb_test_local04,id);

insert into tb_test_all04(id,name) values(1,'name1');
insert into tb_test_all04(id,name) values(2,'name2');
insert into tb_test_all04(id,name) values(3,'name3');
insert into tb_test_all04(id,name) values(4,'name4');
insert into tb_test_all04(id,name) values(5,'name5');
insert into tb_test_all04(id,name) values(6,'name6');
insert into tb_test_all04(id,name) values(7,'name7');
insert into tb_test_all04(id,name) values(8,'name8');
insert into tb_test_all04(id,name) values(9,'name9');
insert into tb_test_all04(id,name) values(10,'name10');

 

13.如何查看是本地 xml配置的账号还是sql模式创建的
[root@master ~]# clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123

[root@master ~]# clickhouse-client -h 192.168.1.102 -m -u default --password clickhouse123


select name,storage,auth_type from system.users;

   ┌─name────┬─storage─────────┬─auth_type──────────┐
1. │ default │ users_xml       │ sha256_password    │
2. │ hxl01   │ users_xml       │ plaintext_password │
3. │ hxl     │ users_xml       │ sha256_password    │
4. │ hxlsql  │ local_directory │ plaintext_password │
   └─────────┴─────────────────┴────────────────────┘

 

14.本地表和分布式表说明

删除分布上表不会删除本地表
drop table tb_test_all on cluster default;

 

删除一个分片的本地表,整个分布式表无法使用

 

分片规则

分片键sharding_key要求返回一个整形类型的取值,包括Int系列和UInt系列,分片键sharding_key可以使如下几种情况

a.直接指定分片字段,整形字段
ENGINE = Distributed(default, db_test, tb_test_local,id);

B.使用rand()函数
发现不是很均匀
ENGINE = Distributed(default, db_test, tb_test_local,rand());

C.根据整形字段的hash值划分
ENGINE = Distributed(default, db_test, tb_test_local,intHash64(id));

D.hiveHash
ENGINE = Distributed(default, db_test, tb_test_local,hiveHash(id));

 

本地表写入数据分布式表也可以查得到

 

没有本地表时也可以创建分布式表但是查询错误

 

 

posted @ 2025-01-24 14:33  slnngk  阅读(122)  评论(0)    收藏  举报