Kafka入门之PHP语言描述
Kafka入门之PHP语言描述
-
准备
在阅读下面的文章之前,你最好会熟练使用 docker。
本人的系统环境如下:
CentOS Linux release 8.2.2004 (Core)
Docker version 19.03.13
docker-compose version 1.27.4
Zookeeper version 3.6.2
Kafka version 2.13-2.6.0
PHP version 7.4.12
为了方便诸君阅读实践以及自己后期回顾,我会将部分命令记录下来。
下载指定版本的 docker:
yum install -y docker-ce-19.03.14 docker-ce-cli-19.03.14 containerd.io更改 docker 默认存储路径:
vim /usr/lib/systemd/system/docker.service # ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --graph /data/docker # 在 ExecStart 一行最后加上 --graph /data/docker,我这里把默认存储路径改成了 /data/docker。 # 更新系统配置 systemctl daemon-reload # 启动 docker systemctl start docker下载指定版本的 docker-compose:
curl -L https://get.daocloud.io/docker/compose/releases/download/1.27.4/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose; chmod 777 /usr/local/bin/docker-compose -
开始
Kafka 属于分布式的消息引擎系统,它的主要功能是提供一套完备的消息发布与订阅解决方案。在 Kafka 中,发布订阅的对象是主题(Topic),你可以为每个业务、每个应用甚至是每类数据都创建专属的主题。
Kafka 是依赖于 Zookeeper 的。Zookeeper 是一个分布式协调框架,负责协调管理并保存 Kafka 集群的所有元数据信息,比如集群都有哪些 Broker 在运行、创建了哪些 Topic,每个 Topic 都有多少分区(Partition)以及这些分区的 Leader 副本都在哪些机器上等信息。
Kafka 配置文件中与 Zookeeper 相关的主要是
zookeeper.connect和zookeeper.connection.timeout.ms两个参数,分别是连接设置和连接超时时间设置。Kafka 的服务器端由被称为 Broker 的服务进程构成,即一个 Kafka 集群由多个 Broker 组成,Broker 负责接收和处理客户端发送过来的请求,以及对消息进行持久化。虽然多个 Broker 进程能够运行在同一台机器上,但更常见的做法是将不同的 Broker 分散运行在不同的机器上,这样如果集群中某一台机器宕机,即使在它上面运行的所有 Broker 进程都挂掉了,其他机器上的 Broker 也依然能够对外提供服务。这其实就是 Kafka 提供高可用的手段之一。
实现高可用的另一个手段就是备份机制(Replication)。备份的思想很简单,就是把相同的数据拷贝到多台机器上,而这些相同的数据拷贝在 Kafka 中被称为副本(Replica)。副本的数量是可以配置的,这些副本保存着相同的数据,但却有不同的角色和作用。Kafka 定义了两类副本:领导者副本(Leader Replica)和追随者副本(Follower Replica)。前者对外提供服务,这里的对外指的是与客户端程序进行交互;而后者只是被动地追随领导者副本而已,不能与外界进行交互。副本是在分区这个层级定义的。每个分区下可以配置若干个副本,其中只能有 1 个领导者副本和 N-1 个追随者副本。生产者向分区写入消息,每条消息在分区中的位置信息由一个叫位移(Offset)的数据来表征。分区位移总是从 0 开始,假设一个生产者向一个空分区写入了 10 条消息,那么这 10 条消息的位移依次是 0、1、2、......、9。
简单来说,Kafka 的架构是这样的:
Broker => Topic => Partition => Replication。生产者将消息写入分区,进行副本备份。然后消费者将分区中领导者副本的消息进行消费。 -
单例
单例模式比较简单,一般只适用于开发环境,因此我不会费过多的笔墨。重要的参数配置以及注意事项,我会在集群里面写。
目录结构:
├── docker-compose │ └── docker-compose.yml ├── kafka │ ├── conf │ │ └── server.properties │ ├── data │ ├── log # chmod 777 log │ └── start-kafka.sh # chmod 777 start-kafka.sh └── zookeeper ├── conf │ └── zoo.cfg ├── data │ └── myid └── log # chmod 777 logzoo.cfg 配置文件:
tickTime=2000 # 客户端与服务端之间的心跳检测时间间隔,单位毫秒 clientPort=2181 # 端口 dataDir=/data # 数据路径 dataLogDir=/log # 日志路径server.properties 配置文件:
# Broker broker.id=1 # 机器 ID log.dirs=/kafka/data # 数据文件 log.retention.hours=168 # 消息数据被保存多长时间 log.retention.bytes=-1 # 为消息保存的总磁盘容量大小,-1不限 message.max.bytes=2097152 # 能接收的最大消息大小 auto.create.topics.enable=false # 不允许自动创建 Topic auto.leader.rebalance.enable=false # 不允许自动更换 Leader unclean.leader.election.enable=false # 不允许副本竞选 Leader offsets.topic.replication.factor=1 # Offset Topic 的副本数 <= 机器数 replication.factor=1 # Topic 的副本数 <= 机器数 min.insync.replicas=1 # 消息至少写入多少个副本,默认1 # Connect listeners=PLAINTEXT://:9091 # 对内端口 advertised.listeners=PLAINTEXT://××.××.××.××:9091 # 对外端口 zookeeper.connect=zookeeper:2181 # Zookeeper 端口 zookeeper.connection.timeout.ms=18000 # Zookeeper 连接超时时间 # Producer acks=all # 写入所有副本才算发送成功 retries=1 # retries > 0 能够自动重试消息发送 # Consumer enable.auto.commit=false # 不允许自动提交start-kafka.sh 脚本文件:
#!/bin/bash -e exec "$KAFKA_HOME/bin/kafka-server-start.sh" "$KAFKA_HOME/config/server.properties"docker-compose 配置文件:
services: zookeeper: image: zookeeper:3.6.2 container_name: zookeeper restart: always ports: - 2181:2181 volumes: - /work/docker/zookeeper/conf/zoo.cfg:/conf/zoo.cfg - /work/docker/zookeeper/data:/data - /work/docker/zookeeper/log:/log networks: - kafka-net kafka: image: wurstmeister/kafka:2.13-2.6.0 container_name: kafka restart: always ports: - 9091:9091 volumes: - /work/docker/kafka/conf/server.properties:/opt/kafka/config/server.properties - /work/docker/kafka/data:/kafka/data - /work/docker/kafka/log:/opt/kafka/logs - /work/docker/kafka/start-kafka.sh:/usr/bin/start-kafka.sh networks: - kafka-net networks: kafka-net: driver: bridge接着是客户端与 Kafka 的交互,我是用的是 PHP,需要安装 RdKafka 拓展。
RdKafka 文档链接:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/book.rdkafka.html
生产者:
<?php $conf = new RdKafka\Conf(); $conf->set('metadata.broker.list', '××.××.××.××:9091'); $producer = new RdKafka\Producer($conf); $topic = $producer->newTopic("test"); for ($i = 0; $i < 10; $i++) { $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Message $i"); $producer->poll(0); } for ($flushRetries = 0; $flushRetries < 10; $flushRetries++) { $result = $producer->flush(10000); if (RD_KAFKA_RESP_ERR_NO_ERROR === $result) { break; } } if (RD_KAFKA_RESP_ERR_NO_ERROR !== $result) { throw new \RuntimeException('Was unable to flush, messages might be lost!'); }消费者:
<?php $conf = new \RdKafka\Conf(); $rk = new RdKafka\Consumer($conf); $rk->addBrokers("××.××.××.××:9091"); $topicConf = new RdKafka\TopicConf(); $topic = $rk->newTopic('test', $topicConf); // Start consuming partition 0 // Start consuming offset 0, or you can control the offset by yourself $topic->consumeStart(0, 0); while (true) { $message = $topic->consume(0, 120 * 10000); switch ($message->err) { case RD_KAFKA_RESP_ERR_NO_ERROR: var_dump($message); break; case RD_KAFKA_RESP_ERR__PARTITION_EOF: echo "No more messages; will wait for more\n"; break; case RD_KAFKA_RESP_ERR__TIMED_OUT: echo "Timed out\n"; break; default: throw new \Exception($message->errstr(), $message->err); break; } }注意:因为我设置了
auto.create.topics.enable=false,所以示例中的testTopic 需要提前手动创建好。开发环境中我一般给1个分区,1个副本(单例下 Topic 只好有一个副本)。另外$topic->consumeStart(0, 0)中,第一个0是指第一个分区,我这里只有一个分区。第二个0是指 Offset 偏移量,我这里从0开始。有一种手动维护 Offset 的方案,在 MySQL 中,为每一个 Topic 创建一个 offset 字段,每成功消费一个消息,offset++。为性能考虑,offset++ 在 Redis 里面完成,每成功消费1000次,写一次数据库。这样就不需要考虑消费者消费后 Offset 有没有提交。如果你想让 Kafka 自行维护偏移量的话,将第二个参数改为 RD_KAFKA_OFFSET_STORED,并且设置 group id 等相关参数。详见:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/rdkafka.examples-low-level-consumer-basic.html -
集群
线上环境,一定是使用分布式的高可用的 Kafka 集群。Kafka 是依赖于 Zookeeper 的,所以为了高可用,Zookeeper 也必须是集群。
Zookeeper 和 Kafka 的目录结构不变,只是从一台机器变成了三台机器。我这里没有三台机器,也不想搞三个虚拟机,就还用 Docker 模拟。
-
Zookeeper 集群
zoo.cfg 配置文件:
tickTime=2000 initLimit=10 # 集群初始化时间限制(单位:多少个心跳检测时间) syncLimit=5 # 集群同步时间限制(单位:多少个心跳检测时间) clientPort=2181 dataDir=/data dataLogDir=/log server.1=0.0.0.0:2888:3888 # 本机器使用 0.0.0.0 来代替 IP server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888docker-compose 配置文件:
services: zookeeper1: image: zookeeper:3.6.2 container_name: zookeeper1 restart: always ports: - 2181:2181 volumes: - /work/docker/zookeeper1/conf/zoo.cfg:/conf/zoo.cfg - /work/docker/zookeeper1/data:/data - /work/docker/zookeeper1/log:/log networks: - kafka-net zookeeper2: image: zookeeper:3.6.2 container_name: zookeeper2 restart: always ports: - 2182:2181 volumes: - /work/docker/zookeeper2/conf/zoo.cfg:/conf/zoo.cfg - /work/docker/zookeeper2/data:/data - /work/docker/zookeeper2/log:/log networks: - kafka-net zookeeper3: image: zookeeper:3.6.2 container_name: zookeeper3 restart: always ports: - 2183:2181 volumes: - /work/docker/zookeeper3/conf/zoo.cfg:/conf/zoo.cfg - /work/docker/zookeeper3/data:/data - /work/docker/zookeeper3/log:/log networks: - kafka-net networks: kafka-net: driver: bridge可以看到,Zookeeper 集群已经搭建成功(我这里 zookeeper3 是 leader):
[root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper1 bash -c './bin/zkServer.sh status' ZooKeeper JMX enabled by default Using config: /conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: follower [root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper2 bash -c './bin/zkServer.sh status' ZooKeeper JMX enabled by default Using config: /conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: follower [root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper3 bash -c './bin/zkServer.sh status' ZooKeeper JMX enabled by default Using config: /conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: leader注意:zookeeper data 文件中的 myid 需要分别设置成不一样的数字,我这里分别设置成1,2,3。如果是三台机器部署,需要同时开放
2181、2888、3888三个端口。其中:2181客户端连接,2888同步数据,3888选举。 -
Kafka 集群
server.properties 配置文件:
# Broker broker.id=1 log.dirs=/kafka/data log.retention.hours=168 log.retention.bytes=-1 message.max.bytes=2097152 auto.create.topics.enable=false auto.leader.rebalance.enable=false unclean.leader.election.enable=false offsets.topic.replication.factor=3 # Offset Topic 的副本数 <= 机器数 replication.factor=3 # Topic 的副本数 <= 机器数 min.insync.replicas=2 # 消息至少写入多少个副本 # Connect listeners=PLAINTEXT://:9091 advertised.listeners=PLAINTEXT://××.××.××.××:9091 zookeeper.connect=zookeeper1:2181,zookeeper2:2182,zookeeper3:2183 zookeeper.connection.timeout.ms=18000 # Producer acks=all # 写入所有副本才算发送成功 retries=1 # Consumer enable.auto.commit=false- 这里有几个参数需要重点讲一下:
broker.id代表 Broker 的 ID,需要唯一,建议和 Zookeeper 的 myid 一致。zookeeper.connect这是一个 CSV 格式的参数,单机的话就是 zk1:2181,集群的话就是 zk1:2181,zk2:2181,zk3:2181。一目了然。replication.factor代表 Topic 的副本数,集群中最好将其设置为 >= 3,以防止消息丢失。min.insync.replicas是 Broker 端的参数,控制的是消息至少要被写入到多少个副本才算是“已提交”。设置成大于 1 可以提升消息持久性。在实际环境中千万不要使用默认值 1。确保 replication.factor > min.insync.replicas。如果两者相等,那么只要有一个副本挂机,整个分区就无法正常工作了。我们不仅要改善消息的持久性,防止数据丢失,还要在不降低可用性的基础上完成。推荐设置成 replication.factor = min.insync.replicas + 1。acks是 Producer 端的参数,和 min.insync.replicas 看起来有点冲突。但事实上,min.insync.replicas 是给 all 设置了一个下限。打个比方,一共有三个副本,挂了一台机器,就剩两个副本了,这个时候 all 还是3,但显然 Producer 已经写入不了三个副本了。这个时候 min.insync.replicas 的作用就体现出来了,只要副本数 >= 2,就还可以继续写入。offsets.topic.replication.factor参数的默认值是3,和 replication.factor 设置成一样的就行了。所谓 Offset Topic 就是位移主题,在集群搭建成功的时候会自动生成,用来保存 Kafka 消费者的位移信息。默认50个分区。
docker-compose 配置文件:
kafka1: image: wurstmeister/kafka:2.13-2.6.0 container_name: kafka1 restart: always ports: - 9091:9091 volumes: - /work/docker/kafka1/conf/server.properties:/opt/kafka/config/server.properties - /work/docker/kafka1/data:/kafka/data - /work/docker/kafka1/log:/opt/kafka/logs - /work/docker/kafka1/start-kafka.sh:/usr/bin/start-kafka.sh networks: - kafka-net kafka2: image: wurstmeister/kafka:2.13-2.6.0 container_name: kafka2 restart: always ports: - 9092:9092 volumes: - /work/docker/kafka2/conf/server.properties:/opt/kafka/config/server.properties - /work/docker/kafka2/data:/kafka/data - /work/docker/kafka2/log:/opt/kafka/logs - /work/docker/kafka2/start-kafka.sh:/usr/bin/start-kafka.sh networks: - kafka-net kafka3: image: wurstmeister/kafka:2.13-2.6.0 container_name: kafka3 restart: always ports: - 9093:9093 volumes: - /work/docker/kafka3/conf/server.properties:/opt/kafka/config/server.properties - /work/docker/kafka3/data:/kafka/data - /work/docker/kafka3/log:/opt/kafka/logs - /work/docker/kafka3/start-kafka.sh:/usr/bin/start-kafka.sh networks: - kafka-net集群搭建好了之后,查看 Topics,发现已经多了一个
__consumer_offsets位移主题:[root@iZuf6c82diwquwsq69eqejZ docker]# docker exec -it kafka1 bash -c './opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181,zookeeper2:2182,zookeeper3:2183 --describe' Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600 Topic: __consumer_offsets Partition: 0 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 1 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 2 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 3 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 4 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 5 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 6 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 7 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 8 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 9 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 11 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 12 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 13 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 14 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 15 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 16 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 17 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 18 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 19 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 20 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 21 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 22 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 23 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 24 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 25 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 26 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 27 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 28 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 29 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 30 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 31 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 32 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 33 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 34 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 35 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 36 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 37 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 38 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 39 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 40 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 41 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 42 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 43 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: __consumer_offsets Partition: 44 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: __consumer_offsets Partition: 45 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2 Topic: __consumer_offsets Partition: 46 Leader: 1 Replicas: 1,2,3 Isr: 1,3,2 Topic: __consumer_offsets Partition: 47 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1 Topic: __consumer_offsets Partition: 48 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: __consumer_offsets Partition: 49 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2注意:集群搭建好之后推荐使用
kafka-manager来管理 Clusters 和 Topics:docker run -d -p 8080:9000 -e ZK_HOSTS="zookeeper1:2181,zookeeper2:2182,zookeeper3:2183" --name=kafka-manager --net=docker-compose_kafka-net sheepkiller/kafka-manager - 这里有几个参数需要重点讲一下:
在客户端连接中,只需要把
$rk->addBrokers("××.××.××.××:9091")之类的换成$rk->addBrokers("××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093")就行了。 -
-
实战
在实际生产中,使用 Kafka 的场景有很多。比如异步写日志、下单后的业务逻辑处理、登录后的业务逻辑处理等。对于高并发的业务场景,一个分区不一定够用。下面的示例中,我创建了一个 userLogin 主题,设置了4个分区3个副本。
[root@iZuf6c82diwquwsq69eqejZ docker]# docker exec -it kafka1 bash -c './opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181,zookeeper2:2182,zookeeper3:2183 --describe --topic=userLogin' Topic: userLogin PartitionCount: 4 ReplicationFactor: 3 Configs: Topic: userLogin Partition: 0 Leader: 1 Replicas: 1,3,2 Isr: 1,3,2 Topic: userLogin Partition: 1 Leader: 2 Replicas: 2,1,3 Isr: 2,1,3 Topic: userLogin Partition: 2 Leader: 3 Replicas: 3,2,1 Isr: 3,2,1 Topic: userLogin Partition: 3 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3生产者:
<?php class UserLoginProducer { public function login() { try { $params = [ 'user_id' => 74, 'login_area' => 'Shanghai', 'login_time' => time() ]; $this->handleAfterLogin($params); return json_encode(['code' => 200]); } catch (\Exception $e) { return json_encode([ 'code' => 400, 'msg' => $e->getMessage() ]); } } protected function handleAfterLogin(array $params) { $conf = new RdKafka\Conf(); $conf->set('metadata.broker.list', '××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093'); $conf->setDrMsgCb(function ($kafka, $message) { if ($message->err) { throw new \Exception('message permanently failed to be delivered'); } else { // message successfully delivered } }); $producer = new RdKafka\Producer($conf); $topic = $producer->newTopic("userLogin"); // The first argument is the partition. RD_KAFKA_PARTITION_UA stands for unassigned, and lets librdkafka choose the partition. // The second argument are message flags and should be either 0 // or RD_KAFKA_MSG_F_BLOCK to block produce on full queue. The message payload can be anything. $topic->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($params)); // Polls the producer for events and calls the corresponding callbacks (if registered). $producer->poll(0); // This should be done prior to destroying a producer instance // to make sure all queued and in-flight produce requests are completed // before terminating. Use a reasonable value for $timeout_ms. $result = $producer->flush(10000); if (RD_KAFKA_RESP_ERR_NO_ERROR !== $result) { throw new \RuntimeException('Was unable to flush, messages might be lost!'); } } } $user = new UserLoginProducer(); $res = $user->login(); print_r($res); // {"code":200}注意:对于生产者而言,最最重要的就是确保消息已经成功发送出去了。这里的发送出去,不仅指的是 Producer 发送成功,还需要 Broker 接收成功才行。要知道 Broker 有没有接收成功,我们必须加一个回调。
$conf->setDrMsgCb和$producer->poll(0)实现了这个回调。详见:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/rdkafka-conf.setdrmsgcb.html还有,开发应该清楚地知道自己把消息发送到了哪个分区。
$topic->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($params))第一个参数是分区。RD_KAFKA_PARTITION_UA 是不指定分区的意思,如果不指定分区,rdkafka 会自己指定分区。如果只有一个分区那好说,RD_KAFKA_PARTITION_UA 就是0的意思。像我这里一共有四个分区,在我不知道 rdkafka 选择分区逻辑的情况下,我倾向于自己把控。因此,这里把 RD_KAFKA_PARTITION_UA 改为 random_int(0, 3) (随机发送到4个分区中的一个)更加合适。还有就是,在一个分区上,消息是有序的,在不同的分区上,消息可能会乱序。所以,在对顺序有要求的场景下,一定要自己把控分区,把对顺序有要求的消息放在同一个分区上。消费者:
<?php $conf = new RdKafka\Conf(); // Set a rebalance callback to log partition assignments (optional) $conf->setRebalanceCb(function (RdKafka\KafkaConsumer $kafka, $err, array $partitions = null) { switch ($err) { case RD_KAFKA_RESP_ERR__ASSIGN_PARTITIONS: echo "Assign: "; var_dump($partitions); $kafka->assign($partitions); break; case RD_KAFKA_RESP_ERR__REVOKE_PARTITIONS: echo "Revoke: "; var_dump($partitions); $kafka->assign(NULL); break; default: throw new \Exception($err); } }); // Configure the group.id. All consumer with the same group.id will consume // different partitions. $conf->set('group.id', 'userLoginConsumerGroup'); // Initial list of Kafka brokers $conf->set('metadata.broker.list', '××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093'); // Set where to start consuming messages when there is no initial offset in // offset store or the desired offset is out of range. // 'earliest': start from the beginning $conf->set('auto.offset.reset', 'earliest'); $conf->set('enable.auto.commit', 'false'); $consumer = new RdKafka\KafkaConsumer($conf); // Subscribe to topic 'userLogin' $consumer->subscribe(['userLogin']); echo "Waiting for partition assignment... (make take some time when quickly re-joining the group after leaving it.)\n"; while (true) { $message = $consumer->consume(120 * 1000); switch ($message->err) { case RD_KAFKA_RESP_ERR_NO_ERROR: // var_dump($message); if (handleAfterLogin($message->payload)) { $consumer->commit($message); echo '[ ' . date('Y-m-d H:i:s') . ' ] ' . $message->payload . ' consume successful' . "\n"; } break; case RD_KAFKA_RESP_ERR__PARTITION_EOF: echo "No more messages; will wait for more\n"; break; case RD_KAFKA_RESP_ERR__TIMED_OUT: echo "Timed out\n"; break; default: throw new \Exception($message->errstr(), $message->err); break; } } function handleAfterLogin($params) { $data = json_decode($params); if ($data->user_id == 74) { return true; } return false; }注意:对于消费者而言,最最重要的就是确保消息被正确消费。消费成功时,提交偏移量,消费失败时,什么都不变,做到不重复消费。
$conf->set('enable.auto.commit', 'false')和消费成功后$consumer->commit($message)保证了这一点。对于多个分区的生产环境,我们最好使用消费组。我这里是四个分区,同一个消费组下启动了两个消费者,所以每个消费者消费两个分区。最好有几个分区,就设置几个消费者。而且尽量不要在一个消费组下新增或者删除消费者,这样将导致 rebalance,严重影响性能。运行结果:
# consumer1 $ php ./UserLoginHighConsumer.php Waiting for partition assignment... (make take some time when quickly re -joining the group after leaving it.) Assign: array(2) { [0]=> object(RdKafka\TopicPartition)#4 (3) { ["topic"]=> string(9) "userLogin" ["partition"]=> int(2) ["offset"]=> int(-1001) } [1]=> object(RdKafka\TopicPartition)#5 (3) { ["topic"]=> string(9) "userLogin" ["partition"]=> int(3) ["offset"]=> int(-1001) } } [ 2021-01-07 16:05:45 ] {"user_id":74,"login_area":"Shanghai","login_tim e":1610006745} consume successful [ 2021-01-07 16:05:47 ] {"user_id":74,"login_area":"Shanghai","login_tim e":1610006747} consume successful # consumer2 $ php ./UserLoginHighConsumer.php Waiting for partition assignment... (make take some time when quickly re -joining the group after leaving it.) Assign: array(2) { [0]=> object(RdKafka\TopicPartition)#4 (3) { ["topic"]=> string(9) "userLogin" ["partition"]=> int(0) ["offset"]=> int(-1001) } [1]=> object(RdKafka\TopicPartition)#5 (3) { ["topic"]=> string(9) "userLogin" ["partition"]=> int(1) ["offset"]=> int(-1001) } } [ 2021-01-07 16:05:46 ] {"user_id":74,"login_area":"Shanghai","login_tim e":1610006746} consume successful [ 2021-01-07 16:05:48 ] {"user_id":74,"login_area":"Shanghai","login_tim e":1610006748} consume successful [ 2021-01-07 16:05:49 ] {"user_id":74,"login_area":"Shanghai","login_tim e":1610006749} consume successful
参考资料:《Kafka核心技术与实战》

浙公网安备 33010602011771号