Kafka教程讲义

课程链接:分布式订阅消息系统Kafka架构 - 网易云课堂 (163.com)

代码文档:分布式发布订阅消息系统 Kafka架构设计 链接:https://pan.baidu.com/s/1VyLF1OP9r2-_rWlM7WzQXQ

提取码:pdsx

笔记: https://files.cnblogs.com/files/henuliulei/Kafka%E8%AF%BE%E7%A8%8B%E8%AE%B2%E4%B9%89.zip

1章 初识Kafka

 

tips 学完这一章你可以知道Kafka基本原理,

了解关键术语概念可以使用Kafka进行消息系统开发

通过Java语言来使用Kafka进行消息收发

    Kafka最初是由LinkedIn公司采用Scala语言开发的一个多分区、多副本并且基于ZooKeeper协调的分布
式消息系统,现在已经捐献给了Apache基金会。目前Kafka已经定位为一个分布式流式处理平台,它以
高吞吐、可持久化、可水平扩展、支持流处理等多种特性而被广泛应用。
     Apache Kafka是一个分布式的发布-订阅消息系统,能够支撑海量数据的数据传递。在离线和实时的消
息处理业务系统中,Kafka都有广泛的应用。Kafka将消息持久化到磁盘中,并对消息创建了备份保证了
数据的安全。Kafka在保证了较高的处理速度的同时,又能保证数据处理的低延迟和数据的零丢失。

 

特征

1)高吞吐量、低延迟:kafka每秒可以处理几十万条消息,它的延迟最低只有几毫秒,每个主题可以
分多个分区, 消费组对分区进行消费操作;
2)可扩展性:kafka集群支持热扩展;
3)持久性、可靠性:消息被持久化到本地磁盘,并且支持数据备份防止数据丢失;
4)容错性:允许集群中节点失败(若副本数量为n,则允许n-1个节点失败);
5)高并发:支持数千个客户端同时读写;
北京市昌平区建材城西路金燕龙办公楼一层 电话:400-618-9090
使用场景
1)日志收集:一个公司可以用Kafka可以收集各种服务的log,通过kafka以统一接口服务的方式开放
给各种consumer,例如HadoopHbaseSolr等;
2)消息系统:解耦和生产者和消费者、缓存消息等;
3)用户活动跟踪:Kafka经常被用来记录web用户或者app用户的各种活动,如浏览网页、搜索、点
击等活动,这些活动信息被各个服务器发布到kafkatopic中,然后订阅者通过订阅这些topic来做实时
的监控分析,或者装载到Hadoop、数据仓库中做离线分析和挖掘;
4)运营指标:Kafka也经常用来记录运营监控数据。包括收集各种分布式应用的数据,生产各种操作
的集中反馈,比如报警和报告;
5)流式处理:比如spark streamingstorm

技术优势

可伸缩性Kafka 的两个重要特性造就了它的可伸缩性。
1Kafka 集群在运行期间可以轻松地扩展或收缩(可以添加或删除代理),而不会宕机。
2、可以扩展一个 Kafka 主题来包含更多的分区。由于一个分区无法扩展到多个代理,所以它的容量受
到代理磁盘空间的限制。能够增加分区和代理的数量意味着单个主题可以存储的数据量是没有限制的。
容错性和可靠性Kafka 的设计方式使某个代理的故障能够被集群中的其他代理检测到。由于每个主题
都可以在多个代理上复制,所以集群可以在不中断服务的情况下从此类故障中恢复并继续运行。
吞吐量:代理能够以超快的速度有效地存储和检索数据。
适应人群
本教程为专注于使用Apache Kafka消息传递系统或者大数据分析领域发展事业的专业人士做好准备,它
将给你足够的理解如何使用Kafka集群。
课程亮点
l 知识覆盖度广泛;
l 知识覆盖度深入;
l 由浅入深讲解思路;
l 案例分析全面;
l 适应于想学习Kafka技术的不同人群;

Apache官网:http://apache.org
Kafka官网:http://kafka.apache.org

1.1 概念详解

kafka消费者主动拉取消息,缺点是主动拉取消息需要不断轮询去队列看有没有消息,比较耗费性能,优点是可以根据自己消费能力去消费消息,充分发挥自己的性能。队列主动推送消息给消费者的优点是避免了轮询但是消费者的消费能力不足会导致崩溃,或者有些消费者能力强导致消费者能力不能吃充分发挥,性能下降。kafka是消费者主动拉取消息。消费组的消费者个数要和分区个数一致效率最高。每个topic的分区数要小于broker的数量,尽量保证每个分区位于不同的broker,

自动提交因存在提交时间会存在重复消费问题,而且耗时,但是当出错时可以多次重复提交,安全性高。异步提交因为出现问题重复提交会导致相邻两个异步提交产生重复导致重复消费,所以异步没有重试机制,异步机制适合数据量大的场景

 

的模式

 

 

 

Producer
生产者即数据的发布者,该角色将消息发布到Kafkatopic中。broker接收到生产者发送的消息后,
broker将该消息追加到当前用于追加数据的segment文件中。生产者发送的消息,存储到一个partition
中,生产者也可以指定数据存储的partition

Consumer
消费者可以从broker中读取数据。消费者可以消费多个topic中的数据。
Topic
在Kafka中,使用一个类别属性来划分数据的所属类,划分数据的这个类称为topic。如果把Kafka看做
为一个数据库,topic可以理解为数据库中的一张表,topic的名字即为表名。

Partition
topic中的数据分割为一个或多个partition。每个topic至少有一个partition。每个partition中的数据使
用多个segment文件存储。partition中的数据是有序的,partition间的数据丢失了数据的顺序。如果
topic有多个partition,消费数据时就不能保证数据的顺序。在需要严格保证消息的消费顺序的场景下,
需要将partition数目设为1。
Partition offset
每条消息都有一个当前Partition下唯一的64字节的offset,它指明了这条消息的起始位置。
Replicas of partition
副本是一个分区的备份。副本不会被消费者消费,副本只用于防止数据丢失,即消费者不从为follower
的partition中消费数据,而是从为leader的partition中读取数据。副本之间是一主多从的关系。
Broker

Kafka 集群包含一个或多个服务器,服务器节点称为brokerbroker存储topic的数据。如果某topic

Npartition,集群有Nbroker,那么每个broker存储该topic的一个partition。如果某topicN
partition,集群有(N+M)broker,那么其中有Nbroker存储该topic的一个partition,剩下的M
broker不存储该topicpartition数据。如果某topicNpartition,集群中broker数目少于N个,那么
一个broker存储该topic的一个或多个partition。在实际生产环境中,尽量避免这种情况的发生,这种
情况容易导致Kafka集群数据不均衡。
Leader
每个partition有多个副本,其中有且仅有一个作为LeaderLeader是当前负责数据的读写的
partition
Follower
Follower跟随Leader,所有写请求都通过Leader路由,数据变更会广播给所有FollowerFollower
Leader保持数据同步。如果Leader失效,则从Follower中选举出一个新的Leader。当Follower
Leader挂掉、卡住或者同步太慢,leader会把这个follower“in sync replicas”ISR)列表中删除,重
新创建一个Follower
Zookeeper
Zookeeper负责维护和协调broker。当Kafka系统中新增了broker或者某个broker发生故障失效时,由
ZooKeeper通知生产者和消费者。生产者和消费者依据Zookeeperbroker状态信息与broker协调数据
的发布和订阅任务。
AR(Assigned Replicas)
分区中所有的副本统称为AR
ISR(In-Sync Replicas)
所有与Leader部分保持一定程度的副(包括Leader副本在内)本组成ISR
OSR(Out-of-Sync-Replicas)
Leader副本同步滞后过多的副本。
HW(High Watermark)
高水位,标识了一个特定的offset,消费者只能拉取到这个offset之前的消息。
LEO(Log End Offset)
即日志末端位移(log end offset),记录了该副本底层日志(log)中下一条消息的位移值。注意是下一条消
息!也就是说,如果LEO=10,那么表示该副本保存了10条消息,位移值范围是[0, 9]

1.2 安装与配置

1.2.1 java环境

我们使用Linux系统进行教学演示,通过虚拟机安装CentOS或者Win10系统自己挂载的ubuntu系统都
可以。
首先需要安装Java环境,同时配置环境变量,步骤如下:
官网下载JDK
https://www.oracle.com/technetwork/java/javase/downloads/jdk12-downloads-5295953.html

解压缩,放到指定目录
配置环境变量
/etc/profile文件中配置如下变量
测试jdk

1.2.2 ZooKeeper的安装

 

 

 

1.2.3 Kafka的安装与配置

 

 

 


1.2.4 Kafka测试消息生产与消费

 

 

 1.3 Java第一个程序

通过Java程序来进行Kafka收发消息的教学演示
1.3.1 准备

Kafka自身提供的Java客户端来演示消息的收发,与KafkaJava客户端相关的Maven依赖如下:

<
properties>
<scala.version>2.11</scala.version>
<slf4j.version>1.7.21</slf4j.version>
<kafka.version>2.0.0</kafka.version>
<lombok.version>1.18.8</lombok.version>
<junit.version>4.11</junit.version>
<gson.version>2.2.4</gson.version>
<protobuff.version>1.5.4</protobuff.version>
<spark.version>2.3.1</spark.version>
</properties>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>

所有代码的总依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.heima.kafka</groupId>
    <artifactId>kafka_book_demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <description>
        Kafka学习
    </description>

    <properties>
        <scala.version>2.11</scala.version>
        <slf4j.version>1.7.21</slf4j.version>
        <kafka.version>2.0.0</kafka.version>
        <lombok.version>1.18.8</lombok.version>
        <junit.version>4.11</junit.version>
        <gson.version>2.2.4</gson.version>
        <protobuff.version>1.5.4</protobuff.version>
        <spark.version>2.3.1</spark.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>${kafka.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_${scala.version}</artifactId>
            <version>${kafka.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>${gson.version}</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit.version}</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>${lombok.version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>io.protostuff</groupId>
            <artifactId>protostuff-core</artifactId>
            <version>${protobuff.version}</version>
        </dependency>

        <dependency>
            <groupId>io.protostuff</groupId>
            <artifactId>protostuff-runtime</artifactId>
            <version>${protobuff.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-streams</artifactId>
            <version>${kafka.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.9.4</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.module</groupId>
            <artifactId>jackson-module-scala_2.11</artifactId>
            <version>2.9.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.58</version>
        </dependency>
        <dependency>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.8.1</version>
        </dependency>


<!--        <dependency>-->
<!--            <groupId>org.apache.storm</groupId>-->
<!--            <artifactId>storm-core</artifactId>-->
<!--            <version>1.1.3</version>-->
<!--        </dependency>-->
<!--        <dependency>-->
<!--            <groupId>org.apache.storm</groupId>-->
<!--            <artifactId>storm-kafka</artifactId>-->
<!--            <version>1.1.3</version>-->
<!--        </dependency>-->

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>
View Code

 

1.3.2 创建生产者

见代码库 com.heima.kafka.chapter1.ProducerFastStart

/**
* Kafka 消息生产者
北京市昌平区建材城西路金燕龙办公楼一层 电话:400-618-90901.3.3 创建消费者
见代码库 com.heima.kafka.chapter1.ConsumerFastStart
*/
public class ProducerFastStart {
// Kafka集群地址
private static final String brokerList = "localhost:9092";
// 主题名称-之前已经创建
private static final String topic = "heima";
public static void main(String[] args) {
Properties properties = new Properties();
// 设置key序列化器
properties.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
//另外一种写法
//properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
// 设置重试次数
properties.put(ProducerConfig.RETRIES_CONFIG, 10);
// 设置值序列化器
properties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
// 设置集群地址
properties.put("bootstrap.servers", brokerList);
// KafkaProducer 线程安全
KafkaProducer<String, String> producer = new KafkaProducer<>
(properties);
ProducerRecord<String, String> record = new ProducerRecord<>(topic,
"Kafka-demo-001", "hello, Kafka!");
try {
producer.send(record);
} catch (Exception e) {
e.printStackTrace();
} p
roducer.close();
}
}

1.3.3 创建消费者

/
**
* Kafka 消息消费者
*/
public class ConsumerFastStart {
// Kafka集群地址
private static final String brokerList = "127.0.0.1:9092";
// 主题名称-之前已经创建
private static final String topic = "heima";
// 消费组
private static final String groupId = "group.demo";
public static void main(String[] args) {
Properties properties = new Properties();
properties.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
北京市昌平区建材城西路金燕龙办公楼一层 电话:400-618-90901.3.4 效果展示
waring:使用java连接linux下kafka集群需要设置hosts绑定;
先启动消费端,再启动生产端进行消息的发送
发送消息:
订阅消息:
1.4 服务端常用参数配置
参数配置:config/server.properties
properties.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
properties.put("bootstrap.servers", brokerList);
properties.put("group.id", groupId);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>
(properties);
consumer.subscribe(Collections.singletonList(topic));
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.value());
}
}
}
}

1.3.4 效果展示

waring:使用java连接linuxkafka集群需要设置hosts绑定;
先启动消费端,再启动生产端进行消息的发送

 

 1.4服务端常用参数配置

ConsumerFastStart

package com.heima.kafka.chapter1;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

/**
 * Kafka 消息消费者
 */
public class ConsumerFastStart {
    // Kafka集群地址
    private static final String brokerList = "8.131.224.65:9092";   //kafka集群对外开放端口,服务器保证该端口外部可访问,同时配置文件要指定当前节点的外网ip/port
    // 主题名称-之前已经创建
    private static final String topic = "heima";
    // 消费组
    private static
    final String groupId = "group.demo";  //随便取

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("bootstrap.servers", brokerList);
        properties.put("group.id", groupId);

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList(topic));

        while (true) {
            ConsumerRecords<String, String> records =
                    consumer.poll(Duration.ofMillis(1000));
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.value());
            }
        }
    }
}
View Code

ProducerFastStart

package com.heima.kafka.chapter1;

import org.apache.kafka.clients.producer.*;

import java.util.Properties;

/**
 * Kafka 消息生产者
 */
public class ProducerFastStart {
    // Kafka集群地址
    private static final String brokerList = "8.131.224.65:9092";
    // 主题名称-之前已经创建
    private static final String topic = "heima";

    public static void main(String[] args) {
        Properties properties = new Properties();
        // 设置key序列化器
        properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        //另外一种写法,抛弃了
        //properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 设置重试次数
        properties.put(ProducerConfig.RETRIES_CONFIG, 10);
        // 设置值序列化器
        properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        // 设置集群地址
        properties.put("bootstrap.servers", brokerList);
        // KafkaProducer 线程安全
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        ProducerRecord<String, String> record = new ProducerRecord<>(topic, "Kafka-demo-001", "hello, Kafka!");
        try {
            producer.send(record, new Callback() {
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
                    if(e==null){
                        System.out.println(recordMetadata.partition() + ":" + recordMetadata.offset());
                    }
                }
            });
            //RecordMetadata recordMetadata = producer.send(record).get();
            //System.out.println("part:" + recordMetadata.partition() + ";topic:" + recordMetadata.topic());
        } catch (Exception e) {
            e.printStackTrace();
        }
        producer.close();
    }
}
View Code

 

 

 2章 生产者详解

2.1 消息发送

2.1.1 Kafka Java客户端数据生产流程解析

 

①、首先要构造一个 ProducerRecord 对象,该对象可以声明主题Topic、分区Partition、键 Key以及
Value,主题和值是必须要声明的,分区和键可以不用指定。
②、调用send() 方法进行消息发送。
③、因为消息要到网络上进行传输,所以必须进行序列化,序列化器的作用就是把消息的 key
value对象序列化成字节数组。
北京市昌平区建材城西路金燕龙办公楼一层 电话:400-618-9090
④、接下来数据传到分区器,如果之间的 ProducerRecord 对象指定了分区,那么分区器将不再做
任何事,直接把指定的分区返回;如果没有,那么分区器会根据 Key 来选择一个分区,选择好分区之
后,生产者就知道该往哪个主题和分区发送记录了。
⑤、接着这条记录会被添加到一个记录批次里面,这个批次里所有的消息会被发送到相同的主题和
分区。会有一个独立的线程来把这些记录批次发送到相应的 Broker 上。
③、Broker成功接收到消息,表示发送成功,返回消息的元数据(包括主题和分区信息以及记录在
分区里的偏移量)。发送失败,可以选择重试或者直接抛出异常。
依赖的包 <kafka.version>2.0.0</kafka.version>

 

 

 

2.1.4 序列化器

消息要到网络上进行传输,必须进行序列化,而序列化器的作用就是如此。
Kafka 提供了默认的字符串序列化器(org.apache.kafka.common.serialization.StringSerializer),
还有整型(IntegerSerializer)和字节数组(BytesSerializer)序列化器,这些序列化器都实现了接口
org.apache.kafka.common.serialization.Serializer)基本上能够满足大部分场景的需求。

2.1.5 自定义序列化器

 

见代码库:com.heima.kafka.chapter2.CompanySerializer

/**
* 自定义序列化器
北京市昌平区建材城西路金燕龙办公楼一层 电话:400-618-9090使用自定义的序列化器
见代码库:com.heima.kafka.chapter2.ProducerDefineSerializer
*/
public class CompanySerializer implements Serializer<Company> {
@Override
public void configure(Map configs, boolean isKey) {
} @
Override
public byte[] serialize(String topic, Company data) {
if (data == null) {
return null;
} b
yte[] name, address;
try {
if (data.getName() != null) {
name = data.getName().getBytes("UTF-8");
} else {
name = new byte[0];
} i
f (data.getAddress() != null) {
address = data.getAddress().getBytes("UTF-8");
} else {
address = new byte[0];
} B
yteBuffer buffer = ByteBuffer.
allocate(4 + 4 + name.length + address.length);
buffer.putInt(name.length);
buffer.put(name);
buffer.putInt(address.length);
buffer.put(address);
return buffer.array();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} r
eturn new byte[0];
} @
Override
public void close() {
}
}

2.1.6 分区器

本身kafka有自己的分区策略的,如果未指定,就会使用默认的分区策略:
Kafka根据传递消息的key来进行分区的分配,即hash(key) % numPartitions。如果Key相同的话,那么
就会分配到统一分区。
源代码org.apache.kafka.clients.producer.internals.DefaultPartitioner分析


public int partition(String topic, Object key, byte[] keyBytes, Object value,
byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = this.nextValue(topic);
List<PartitionInfo> availablePartitions =
cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = Utils.toPositive(nextValue) %
availablePartitions.size();
return
((PartitionInfo)availablePartitions.get(part)).partition();
} else {
return Utils.toPositive(nextValue) % numPartitions;
}
} else {
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}

 

 

 

 

 

 

2.2 发送原理剖析

 

 消息发送的过程中,涉及到两个线程协同工作,主线程首先将业务数据封装成ProducerRecord对象,
之后调用send()方法将消息放入RecordAccumulator(消息收集器,也可以理解为主线程与Sender线程
直接的缓冲区)中暂存,Sender线程负责将消息信息构成请求,并最终执行网络I/O的线程,它从
RecordAccumulator中取出消息并批量发送出去,需要注意的是,KafkaProducer是线程安全的,多个
线程间可以共享使用同一个KafkaProducer对象.

2.3 其他生产者参数

 

 

 

总结

本章主要讲了生产者客户端的用法以及整体流程架构,主要内容包括配置参数的详解、消息的发送方
式、序列化器、分区器、拦截器等,在实际使用中,Kafka已经提供了良好的Java客户端支持,提高了
开发效率.

 

3章 消费者详解

3.1 概念入门

 

 

 

3.2 消息接收

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 CheckOffsetAndCommit.java

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 位移提交
 */
public class CheckOffsetAndCommit {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

        // 手动提交开启
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);


        TopicPartition tp = new TopicPartition(topic, 0);
        consumer.assign(Arrays.asList(tp));
        long lastConsumedOffset = -1;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(1000);
            if (records.isEmpty()) {
                break;
            }
            List<ConsumerRecord<String, String>> partitionRecords = records.records(tp);
            lastConsumedOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
            consumer.commitSync();//同步提交消费位移
        }
        System.out.println("comsumed offset is " + lastConsumedOffset);
        OffsetAndMetadata offsetAndMetadata = consumer.committed(tp);
        System.out.println("commited offset is " + offsetAndMetadata.offset());
        long posititon = consumer.position(tp);
        System.out.println("the offset of the next record is " + posititon);
    }
}
View Code

CommitSyncInRebalance

package com.heima.kafka.chapter3;

import com.heima.kafka.ConsumerClientConfig;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

import java.time.Duration;
import java.util.*;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 再均衡监听器
 */
public class CommitSyncInRebalance extends ConsumerClientConfig {
    public static final AtomicBoolean isRunning = new AtomicBoolean(true);


    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        Map<TopicPartition, OffsetAndMetadata> currentOffsets = new HashMap<>();
        consumer.subscribe(Arrays.asList(topic), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                // 劲量避免重复消费
                consumer.commitSync(currentOffsets);
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                //do nothing.
            }
        });

        try {
            while (isRunning.get()) {
                ConsumerRecords<String, String> records =
                        consumer.poll(Duration.ofMillis(1000));
                for (ConsumerRecord<String, String> record : records) {
                    System.out.println(record.offset() + ":" + record.value());
                    // 异步提交消费位移,在发生再均衡动作之前可以通过再均衡监听器的onPartitionsRevoked回调执行commitSync方法同步提交位移。
                    currentOffsets.put(new TopicPartition(record.topic(), record.partition()),
                            new OffsetAndMetadata(record.offset() + 1));
                }
                consumer.commitAsync(currentOffsets, null);
            }
        } finally {
            consumer.close();
        }

    }
}
View Code

CompanyDeserailizer

package com.heima.kafka.chapter3;

import com.heima.kafka.chapter2.Company;
import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.Map;

public class CompanyDeserailizer implements Deserializer<Company> {
    public void configure(Map<String, ?> configs, boolean isKey) {
    }

    public Company deserialize(String topic, byte[] data) {
        if (data == null) {
            return null;
        }
        if (data.length < 8) {
            throw new SerializationException("Size of data received " +
                    "by DemoDeserializer is shorter than expected!");
        }
        ByteBuffer buffer = ByteBuffer.wrap(data);
        int nameLen, addressLen;
        String name, address;

        nameLen = buffer.getInt();
        byte[] nameBytes = new byte[nameLen];
        buffer.get(nameBytes);
        addressLen = buffer.getInt();
        byte[] addressBytes = new byte[addressLen];
        buffer.get(addressBytes);

        try {
            name = new String(nameBytes, "UTF-8");
            address = new String(addressBytes, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            throw new SerializationException("Error occur when deserializing!");
        }

        return Company.builder().name(name).address(address).build();
    }

    public void close() {
    }
}
View Code

 

ConsumerInterceptorTTL

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.ConsumerInterceptor;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * 消费者拦截器
 */
public class ConsumerInterceptorTTL implements ConsumerInterceptor<String, String> {
    private static final long EXPIRE_INTERVAL = 10 * 1000;

    @Override
    public ConsumerRecords<String, String> onConsume(ConsumerRecords<String, String> records) {
        System.out.println("before:" + records);
        long now = System.currentTimeMillis();
        Map<TopicPartition, List<ConsumerRecord<String, String>>> newRecords
                = new HashMap<>();
        for (TopicPartition tp : records.partitions()) {
            List<ConsumerRecord<String, String>> tpRecords = records.records(tp);
            List<ConsumerRecord<String, String>> newTpRecords = new ArrayList<>();
            for (ConsumerRecord<String, String> record : tpRecords) {
                if (now - record.timestamp() < EXPIRE_INTERVAL) {
                    newTpRecords.add(record);
                }
            }
            if (!newTpRecords.isEmpty()) {
                newRecords.put(tp, newTpRecords);
            }
        }
        return new ConsumerRecords<>(newRecords);
    }

    @Override
    public void onCommit(Map<TopicPartition, OffsetAndMetadata> offsets) {
        offsets.forEach((tp, offset) ->
                System.out.println(tp + ":" + offset.offset()));
    }

    @Override
    public void close() {
    }

    @Override
    public void configure(Map<String, ?> configs) {
    }
}
View Code

 

FirstMultiConsumerThreadDemo

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;


public class FirstMultiConsumerThreadDemo {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "topic-demo";
    public static final String groupId = "group.demo";

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        int consumerThreadNum = 4;
        for (int i = 0; i < consumerThreadNum; i++) {
            new KafkaConsumerThread(props, topic).start();
        }
    }

    public static class KafkaConsumerThread extends Thread {
        private KafkaConsumer<String, String> kafkaConsumer;

        public KafkaConsumerThread(Properties props, String topic) {
            this.kafkaConsumer = new KafkaConsumer<>(props);
            this.kafkaConsumer.subscribe(Arrays.asList(topic));
        }

        @Override
        public void run() {
            try {
                while (true) {
                    ConsumerRecords<String, String> records =
                            kafkaConsumer.poll(Duration.ofMillis(100));
                    for (ConsumerRecord<String, String> record : records) {
                        //process record.
                        System.out.println(record.value());
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                kafkaConsumer.close();
            }
        }
    }
}
View Code

 

KafkaConsumerAnalysis

package com.heima.kafka.chapter3;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * Kafka 消费者分析
 */
@Slf4j
public class KafkaConsumerAnalysis {
    public static final String brokerList = "8.131.224.65:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    public static final AtomicBoolean isRunning = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        // 与KafkaProducer中设置保持一致
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        // 必填参数,该参数和KafkaProducer中的相同,制定连接Kafka集群所需的broker地址清单,可以设置一个或者多个
        props.put("bootstrap.servers", brokerList);
        // 消费者隶属于的消费组,默认为空,如果设置为空,则会抛出异常,这个参数要设置成具有一定业务含义的名称
        props.put("group.id", groupId);
        // 指定KafkaConsumer对应的客户端ID,默认为空,如果不设置KafkaConsumer会自动生成一个非空字符串
        props.put("client.id", "consumer.client.id.demo");

        // 指定消费者拦截器
        props.put(ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG,ConsumerInterceptorTTL.class.getName());
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        // 正则订阅主题
        //consumer.subscribe(Pattern.compile("heima*"));

        // 指定订阅的分区
        //consumer.assign(Arrays.asList(new TopicPartition("heima", 0)));

        try {
            while (isRunning.get()) {
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
                for (ConsumerRecord<String, String> record : records) {
                    System.out.println("topic = " + record.topic() + ", partition = " + record.partition() + ", offset = " + record.offset());
                    System.out.println("key = " + record.key() + ", value = " + record.value());
                    //do something to process record.
                }
            }
        } catch (Exception e) {
            log.error("occur exception ", e);
        } finally {
            consumer.close();
        }
    }
}
View Code

 

 OffsetCommitAsyncCallback

package com.heima.kafka.chapter3;

import com.heima.kafka.ConsumerClientConfig;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

import java.util.Arrays;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 异步提交
 */
@Slf4j
public class OffsetCommitAsyncCallback extends ConsumerClientConfig {

    private static AtomicBoolean running = new AtomicBoolean(true);

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        try {
            while (running.get()) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (ConsumerRecord<String, String> record : records) {
                    //do some logical processing.
                }
                // 异步回调
                consumer.commitAsync(new OffsetCommitCallback() {
                    @Override
                    public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets,
                                           Exception exception) {
                        if (exception == null) {
                            System.out.println(offsets);
                        } else {
                            log.error("fail to commit offsets {}", offsets, exception);
                        }
                    }
                });
            }
        } finally {
            consumer.close();
        }

        try {
            while (running.get()) {
                consumer.commitAsync();
            }
        } finally {
            try {
                consumer.commitSync();
            } finally {
                consumer.close();
            }
        }
    }
}
View Code

 

OffsetCommitSync

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 同步提交
 */
public class OffsetCommitSync {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        try {
            while (running.get()) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (ConsumerRecord<String, String> record : records) {
                    //do some logical processing.
                }
                consumer.commitSync();
            }
        } finally {
            consumer.close();
        }
    }
}
View Code

 

OffsetCommitSyncBatch

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;


public class OffsetCommitSyncBatch {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        final int minBatchSize = 200;
        List<ConsumerRecord> buffer = new ArrayList<>();
        while (running.get()) {
            ConsumerRecords<String, String> records = consumer.poll(1000);
            for (ConsumerRecord<String, String> record : records) {
                buffer.add(record);
            }
            if (buffer.size() >= minBatchSize) {
                //do some logical processing with buffer.
                consumer.commitSync();
                buffer.clear();
            }
        }
    }
}
View Code

 

OffsetCommitSyncPartition

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;


public class OffsetCommitSyncPartition {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        try {
            while (running.get()) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (TopicPartition partition : records.partitions()) {
                    List<ConsumerRecord<String, String>> partitionRecords =
                            records.records(partition);
                    for (ConsumerRecord<String, String> record : partitionRecords) {
                        //do some logical processing.
                    }
                    long lastConsumedOffset = partitionRecords
                            .get(partitionRecords.size() - 1).offset();
                    consumer.commitSync(Collections.singletonMap(partition,
                            new OffsetAndMetadata(lastConsumedOffset + 1)));
                }
            }
        } finally {
            consumer.close();
        }
    }
}
View Code

 

OffsetCommitSyncSingle

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * 带参数的同步位移提交
 */
public class OffsetCommitSyncSingle {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));

        int i = 0;
        try {
            while (running.get()) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (ConsumerRecord<String, String> record : records) {
                    //do some logical processing.
                    long offset = record.offset();
                    TopicPartition partition = new TopicPartition(record.topic(), record.partition());
                    consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(offset + 1)));
                }
            }


//            TopicPartition tp1 = new TopicPartition(topic, 0);
//            TopicPartition tp2 = new TopicPartition(topic, 1);
//            TopicPartition tp3 = new TopicPartition(topic, 2);
//            TopicPartition tp4 = new TopicPartition(topic, 3);
//            System.out.println(consumer.committed(tp1) + " : " + consumer.position(tp1));
//            System.out.println(consumer.committed(tp2) + " : " + consumer.position(tp2));
//            System.out.println(consumer.committed(tp3) + " : " + consumer.position(tp3));
//            System.out.println(consumer.committed(tp4) + " : " + consumer.position(tp4));
        } finally {
            consumer.close();
        }
    }
}
View Code

 

ProducerFastStart

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

/**
 * Kafka 消息生产者
 */
public class ProducerFastStart {
    // Kafka集群地址
    private static final String brokerList = "8.131.224.65:9092";
    // 主题名称-之前已经创建
    private static final String topic = "heima";

    public static void main(String[] args) {
        Properties properties = new Properties();
        // 设置key序列化器
        properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        //另外一种写法
        //properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 设置重试次数
        properties.put(ProducerConfig.RETRIES_CONFIG, 10);
        // 设置值序列化器
        properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        // 设置集群地址
        properties.put("bootstrap.servers", brokerList);
        // KafkaProducer 线程安全
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        ProducerRecord<String, String> record = new ProducerRecord<>(topic, "Kafka-demo-001", "hello, Kafka!没超时");
        ProducerRecord<String, String> record2 = new ProducerRecord<>(topic, 0, System.currentTimeMillis() - 10 * 1000, "Kafka-demo-001", "hello, Kafka!->超时");

        try {
            producer.send(record);
            producer.send(record2);
            //RecordMetadata recordMetadata = producer.send(record).get();
            //System.out.println("part:" + recordMetadata.partition() + ";topic:" + recordMetadata.topic());
        } catch (Exception e) {
            e.printStackTrace();
        }
        producer.close();
    }
}
View Code

 

ProtostuffDeserializer

package com.heima.kafka.chapter3;

import com.heima.kafka.chapter2.Company;
import io.protostuff.ProtostuffIOUtil;
import io.protostuff.Schema;
import io.protostuff.runtime.RuntimeSchema;
import org.apache.kafka.common.serialization.Deserializer;

import java.util.Map;


public class ProtostuffDeserializer implements Deserializer<Company> {
    public void configure(Map<String, ?> configs, boolean isKey) {

    }

    public Company deserialize(String topic, byte[] data) {
        if (data == null) {
            return null;
        }
        Schema schema = RuntimeSchema.getSchema(Company.class);
        Company ans = new Company();
        ProtostuffIOUtil.mergeFrom(data, ans, schema);
        return ans;
    }

    public void close() {

    }
}
View Code

 

ProtostuffSerializer

package com.heima.kafka.chapter3;

import com.heima.kafka.chapter2.Company;
import io.protostuff.LinkedBuffer;
import io.protostuff.ProtostuffIOUtil;
import io.protostuff.Schema;
import io.protostuff.runtime.RuntimeSchema;
import org.apache.kafka.common.serialization.Serializer;

import java.util.Map;


public class ProtostuffSerializer implements Serializer<Company> {
    public void configure(Map<String, ?> configs, boolean isKey) {
    }

    public byte[] serialize(String topic, Company data) {
        if (data == null) {
            return null;
        }
        Schema schema = (Schema) RuntimeSchema.getSchema(data.getClass());
        LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
        byte[] protostuff = null;
        try {
            protostuff = ProtostuffIOUtil.toByteArray(data, schema, buffer);
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        } finally {
            buffer.clear();
        }
        return protostuff;
    }

    public void close() {
    }
}
View Code

 

SeekDemo

package com.heima.kafka.chapter3;

import com.heima.kafka.ConsumerClientConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;

import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
import java.util.Set;

/**
 * 指定位移消费
 */
public class SeekDemo extends ConsumerClientConfig {


    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
        // timeout参数设置多少合适?太短会使分区分配失败,太长又有可能造成一些不必要的等待
        consumer.poll(Duration.ofMillis(2000));
        // 获取消费者所分配到的分区
        Set<TopicPartition> assignment = consumer.assignment();
        System.out.println(assignment);
        for (TopicPartition tp : assignment) {
            // 参数partition表示分区,offset表示指定从分区的哪个位置开始消费
            consumer.seek(tp, 10);
        }
//        consumer.seek(new TopicPartition(topic,0),10);
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
            //consume the record.
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.offset() + ":" + record.value());
            }
        }
    }

}
View Code

 

SeekDemoAssignment

package com.heima.kafka.chapter3;

import com.heima.kafka.ConsumerClientConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;

import java.time.Duration;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Properties;
import java.util.Set;

/**
 * 指定位移消费
 */
public class SeekDemoAssignment extends ConsumerClientConfig {


    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
        long start = System.currentTimeMillis();
        Set<TopicPartition> assignment = new HashSet<>();
        while (assignment.size() == 0) {
            consumer.poll(Duration.ofMillis(100));
            assignment = consumer.assignment();
        }
        long end = System.currentTimeMillis();
        System.out.println(end - start);
        System.out.println(assignment);
        for (TopicPartition tp : assignment) {
            consumer.seek(tp, 10);
        }
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
            //consume the record.
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.offset() + ":" + record.value());
            }
        }
    }
}
View Code

 

SeekToEnd

package com.heima.kafka.chapter3;

import com.heima.kafka.ConsumerClientConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;

import java.time.Duration;
import java.util.*;

/**
 * 指定位移开始消费  末尾
 */
public class SeekToEnd extends ConsumerClientConfig {


    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
        Set<TopicPartition> assignment = new HashSet<>();
        while (assignment.size() == 0) {
            consumer.poll(Duration.ofMillis(100));
            assignment = consumer.assignment();
        }
        // 指定从分区末尾开始消费
        Map<TopicPartition, Long> offsets = consumer.endOffsets(assignment);
        for (TopicPartition tp : assignment) {
            //consumer.seek(tp, offsets.get(tp));
            consumer.seek(tp, offsets.get(tp) + 1);
        }
        System.out.println(assignment);
        System.out.println(offsets);

        while (true) {
            ConsumerRecords<String, String> records =
                    consumer.poll(Duration.ofMillis(1000));
            //consume the record.
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.offset() + ":" + record.value());
            }
        }

    }
}
View Code

 

StringDeserializer

package com.heima.kafka.chapter3;

import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer;

import java.io.UnsupportedEncodingException;
import java.util.Map;


public class StringDeserializer implements Deserializer<String> {
    private String encoding = "UTF8";

    @Override
    public void configure(Map<String, ?> configs, boolean isKey) {
        String propertyName = isKey ? "key.deserializer.encoding" : "value.deserializer.encoding";
        Object encodingValue = configs.get(propertyName);
        if (encodingValue == null)
            encodingValue = configs.get("deserializer.encoding");
        if (encodingValue instanceof String)
            encoding = (String) encodingValue;
    }

    @Override
    public String deserialize(String topic, byte[] data) {
        try {
            if (data == null)
                return null;
            else
                return new String(data, encoding);
        } catch (UnsupportedEncodingException e) {
            throw new SerializationException("Error when deserializing byte[] to string due to unsupported encoding " + encoding);
        }
    }

    @Override
    public void close() {
        // nothing to do
    }
}
View Code

ThirdMultiConsumerThreadDemo

package com.heima.kafka.chapter3;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;


public class ThirdMultiConsumerThreadDemo {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "topic-demo";
    public static final String groupId = "group.demo";

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumerThread consumerThread = new KafkaConsumerThread(props, topic,
                Runtime.getRuntime().availableProcessors());
        consumerThread.start();
    }

    public static class KafkaConsumerThread extends Thread {
        private KafkaConsumer<String, String> kafkaConsumer;
        private ExecutorService executorService;
        private int threadNumber;

        public KafkaConsumerThread(Properties props, String topic, int threadNumber) {
            kafkaConsumer = new KafkaConsumer<>(props);
            kafkaConsumer.subscribe(Collections.singletonList(topic));
            this.threadNumber = threadNumber;
            executorService = new ThreadPoolExecutor(threadNumber, threadNumber,
                    0L, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<>(1000),
                    new ThreadPoolExecutor.CallerRunsPolicy());
        }

        @Override
        public void run() {
            try {
                while (true) {
                    ConsumerRecords<String, String> records =
                            kafkaConsumer.poll(Duration.ofMillis(100));
                    if (!records.isEmpty()) {
                        executorService.submit(new RecordsHandler(records));
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                kafkaConsumer.close();
            }
        }

    }

    public static class RecordsHandler extends Thread {
        public final ConsumerRecords<String, String> records;

        public RecordsHandler(ConsumerRecords<String, String> records) {
            this.records = records;
        }

        @Override
        public void run() {
            //处理records.
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.value());
            }
        }
    }
}
View Code

 

总结

 

 

 

 

 

避免重复消费

 

 

 

 

 

 

 

4章 主题

4.1 管理

 

 

 

 

4.2 增加分区

 

 

 

4.3 分区副本的分配--只做了解

 

4.5 KafkaAdminClient应用

 

 

package com.heima.kafka.chapter4;

import org.apache.kafka.clients.admin.*;
import org.apache.kafka.common.config.ConfigResource;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutionException;

/**
 * KafkaAdminClient应用
 */
public class KafkaAdminConfigOperation {

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        describeTopicConfig();
//        alterTopicConfig();
//        addTopicPartitions();
    }

    //Config(entries=[ConfigEntry(name=compression.type, value=producer, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=leader.replication.throttled.replicas, value=, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=message.downconversion.enable, value=true, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=min.insync.replicas, value=1, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=segment.jitter.ms, value=0, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=cleanup.policy, value=delete, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=flush.ms, value=9223372036854775807, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=follower.replication.throttled.replicas, value=, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=segment.bytes, value=1073741824, source=STATIC_BROKER_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=retention.ms, value=604800000, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=flush.messages, value=9223372036854775807, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=message.format.version, value=2.0-IV1, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=file.delete.delay.ms, value=60000, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=max.message.bytes, value=1000012, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=min.compaction.lag.ms, value=0, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=message.timestamp.type, value=CreateTime, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=preallocate, value=false, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=min.cleanable.dirty.ratio, value=0.5, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=index.interval.bytes, value=4096, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=unclean.leader.election.enable, value=false, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=retention.bytes, value=-1, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=delete.retention.ms, value=86400000, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=segment.ms, value=604800000, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=message.timestamp.difference.max.ms, value=9223372036854775807, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[]), ConfigEntry(name=segment.index.bytes, value=10485760, source=DEFAULT_CONFIG, isSensitive=false, isReadOnly=false, synonyms=[])])
    public static void describeTopicConfig() throws ExecutionException,
            InterruptedException {
        String brokerList =  "8.131.224.65:9092";
        String topic = "heima";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

        ConfigResource resource =
                new ConfigResource(ConfigResource.Type.TOPIC, topic);
        DescribeConfigsResult result =
                client.describeConfigs(Collections.singleton(resource));
        Config config = result.all().get().get(resource);
        System.out.println(config);
        client.close();
    }

    public static void alterTopicConfig() throws ExecutionException, InterruptedException {
        String brokerList =  "localhost:9092";
        String topic = "heima";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

        ConfigResource resource =
                new ConfigResource(ConfigResource.Type.TOPIC, topic);
        ConfigEntry entry = new ConfigEntry("cleanup.policy", "compact");
        Config config = new Config(Collections.singleton(entry));
        Map<ConfigResource, Config> configs = new HashMap<>();
        configs.put(resource, config);
        AlterConfigsResult result = client.alterConfigs(configs);
        result.all().get();

        client.close();
    }

    public static void addTopicPartitions() throws ExecutionException, InterruptedException {
        String brokerList =  "localhost:9092";
        String topic = "heima";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

        NewPartitions newPartitions = NewPartitions.increaseTo(5);
        Map<String, NewPartitions> newPartitionsMap = new HashMap<>();
        newPartitionsMap.put(topic, newPartitions);
        CreatePartitionsResult result = client.createPartitions(newPartitionsMap);
        result.all().get();

        client.close();
    }
}
View Code

 

5章 分区

5.1 副本机制

 

 

 

 

 


5.2 分区Leader选举

 

 

5.3 分区重新分配

 

 

现在模拟在一台服务器上有三个节点,三个节点不同在于端口号,具体步骤如下

拷贝三个实例

 

 修改配置

 全剧唯一,三个节点的id不同

此时只有一个zookeeper,多个逗号分开

 

 修改log的位置,三个节点的log的位置不同

 

 因为直接拷贝的目录,所以要删除logs文件

 

 把listener的本地ip改为localhost,保证三个节点端口不同

 此时启动三个节点就可以,

 此时模拟搭建的集群就建立好了

 再开一个终端,然后在kafka1建立三个节点,然后修改为四个,注意kafka只能增加分区

 

 

 

可以看到节点二压力大,承担了两个分区,现在再增加一个节点,希望新加入的节点能够承担一个分区,但是刚创建的节点是无法接收数据的,下面进行设置

 

 

 新建一个节点,并做配置,步骤同上面创建的过程。

 修改id,端口,日志位置,清理日志,配置ip,启动

新增加的节点不承担工作

 

 重新分配

创建一个reassign,json 

 

 生成reassign plan

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file reassign.json --broker-list "0,1,2,3" --generate

 

 

 

 把生成的方案放在result.json里面

 

 

 执行分配策略

 

 

 重新分配完成

 

 

 

5.4 修改副本因子

 

 

 

5.5 分区分配策略

 

 发生再平衡的情况

consumer group中的新增或删除某个consumer,导致其所消费的分区需要分配到组内其他的consumer上;
consumer订阅的topic发生变化,比如订阅的topic采用的是正则表达式的形式,如test-*此时如果有一个新建了一个topic test-user,那么这个topic的所有分区也是会自动分配给当前的consumer的,此时就会发生再平衡;
consumer所订阅的topic发生了新增分区的行为,那么新增的分区就会分配给当前的consumer,此时就会触发再平衡。
        Kafka提供的再平衡策略主要有三种:Round Robin,Range和Sticky,默认使用的是Range。这三种分配策略的主要区别在于:

Round Robin:会采用轮询的方式将当前所有的分区依次分配给所有的consumer;
Range:首先会计算每个consumer可以消费的分区个数,然后按照顺序将指定个数范围的分区分配给各个consumer;
Sticky:这种分区策略是最新版本中新增的一种策略,其主要实现了两个目的:
将现有的分区尽可能均衡的分配给各个consumer,存在此目的的原因在于Round Robin和Range分配策略实际上都会导致某几个consumer承载过多的分区,从而导致消费压力不均衡;
如果发生再平衡,那么重新分配之后在前一点的基础上会尽力保证当前未宕机的consumer所消费的分区不会被分配给其他的consumer上;

(6条消息) Kafka再平衡机制详解_爱宝贝丶的博客-CSDN博客

 

 

 

 KafkaAdminTopicOperation

package com.heima.kafka.chapter5;

import org.apache.kafka.clients.admin.*;

import java.util.*;
import java.util.concurrent.ExecutionException;

public class KafkaAdminTopicOperation {
    public static final String brokerList = "localhost:9092";
    public static final String topic = "topic-admin";

    public static void describeTopic(){
        String brokerList =  "localhost:9092";
        String topic = "heima";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

        DescribeTopicsResult result = client.describeTopics(Collections.singleton(topic));
        try {
            Map<String, TopicDescription> descriptionMap =  result.all().get();
            System.out.println(descriptionMap.get(topic));
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }
        client.close();
    }

    public static void createTopic() {
        String brokerList =  "localhost:9092";
        String topic = "topic-admin";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

//        NewTopic newTopic = new NewTopic(topic, 4, (short) 1);

//        Map<String, String> configs = new HashMap<>();
//        configs.put("cleanup.policy", "compact");
//        newTopic.configs(configs);

        Map<Integer, List<Integer>> replicasAssignments = new HashMap<>();
        replicasAssignments.put(0, Arrays.asList(0));
        replicasAssignments.put(1, Arrays.asList(0));
        replicasAssignments.put(2, Arrays.asList(0));
        replicasAssignments.put(3, Arrays.asList(0));

        NewTopic newTopic = new NewTopic(topic, replicasAssignments);

        //代码清单4-4 可以从这里跟进去
        CreateTopicsResult result = client.
                createTopics(Collections.singleton(newTopic));
        try {
            result.all().get();
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }
        client.close();
    }

    public static void deleteTopic(){
        String brokerList =  "localhost:9092";
        String topic = "topic-admin";

        Properties props = new Properties();
        props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
        AdminClient client = AdminClient.create(props);

        try {
            client.deleteTopics(Collections.singleton(topic)).all().get();
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }
        client.close();
    }

    public static void main(String[] args) throws ExecutionException, InterruptedException {
//        createTopic();
        describeTopic();
//        deleteTopic();

//        Properties props = new Properties();
//        props.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, brokerList);
//        props.put(CommonClientConfigs.REQUEST_TIMEOUT_MS_CONFIG, 30000);
//        AdminClient client = AdminClient.create(props);
////        createTopic(client);
////        deleteTopic(client);
//
//        NewTopic newTopic = new NewTopic(topic, 4, (short) 3);
//        Map<String, String> configs = new HashMap<>();
//        configs.put("cleanup.policy", "compact");
//        newTopic.configs(configs);
//        createTopic(client,newTopic);
    }

//    public static void deleteTopic(AdminClient client) throws ExecutionException, InterruptedException {
//        DeleteTopicsResult result = client.deleteTopics(Arrays.asList(topic));
//        result.all().get();
//    }
//
//    public static void createTopic(AdminClient client, NewTopic newTopic) throws ExecutionException,
//            InterruptedException {
//        CreateTopicsResult result = client.createTopics(Collections.singleton(newTopic));
//        result.all().get();
//    }
//
//    public static void createTopic(AdminClient client) throws ExecutionException, InterruptedException {
//        //创建一个主题:topic-demo,其中分区数为4,副本数为1
//        NewTopic newTopic = new NewTopic(topic, 4, (short) 1);
//        CreateTopicsResult result = client.createTopics(Collections.singleton(newTopic));
//        result.all().get();
//    }
}
View Code

 

 

6Kafka存储

6.1 存储结构概述

6.2 日志索引

Kafka解决查询效率的手段之一是将数据文件分段,比如有100条Message,它们的offset是从0到99。
假设将数据文件分成5段,第一段为0-19,第二段为20-39,以此类推,每段放在一个单独的数据文件里
面,数据文件以该段中最小的offset命名

 

 

 

6.3 日志清理

 


6.4 磁盘存储优势

 

 

 

 

 

 

7章 稳定性

7.1 幂等性

 

7.2 事务

 

 

 

 

 

7.3 控制器

 

 

7.4 可靠性保证

 

 

 

7.5 一致性保证

 

 

 

 

 

 

 

 

7.6 消息重复的场景及解决方案

7.7 __consumer_offsets

 ConsumerOffsetsAnalysis

package com.heima.kafka.chapter7;

import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.time.Duration;
import java.util.*;

/**
 * @Author dayuan
 */
public class ConsumerOffsetsAnalysis {
    // Kafka集群地址
    private static final String brokerList = "127.0.0.1:9092";
    // 主题名称-之前已经创建
    private static final String topic = "heima";
    // 消费组
    private static final String groupId = "group.demo";

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put("enable.auto.commit", "false");
        properties.put("auto.offset.reset", "latest");
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");

        properties.put("bootstrap.servers", brokerList);
        properties.put("group.id", groupId);

        Consumer<byte[], byte[]> consumer = new KafkaConsumer<byte[], byte[]>(properties);
        consumer.subscribe(Collections.singletonList("__consumer_offsets"));

        while (true) {
            ConsumerRecords<byte[], byte[]> records = consumer.poll(Duration.ofMillis(100));
            Iterator<ConsumerRecord<byte[], byte[]>> iterator = records.iterator();
            Map<String, Integer> map = new HashMap<>();
            while (iterator.hasNext()) {
                ConsumerRecord<byte[], byte[]> record = iterator.next();
                if (record.key() == null) {
                    continue;
                }
                System.out.println("topic:" + record.topic() + ",partition:" + record.partition() + ",offset:" + record.offset());
            }
        }
    }
}
View Code

 

ProducerTransactionSend

package com.heima.kafka.chapter7;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * Kafka Producer事务的使用
 */
public class ProducerTransactionSend {
    public static final String topic = "heima";
    public static final String brokerList = "localhost:9092";
    public static final String transactionId = "transactionId";

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                StringSerializer.class.getName());
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, transactionId);

        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

        producer.initTransactions();
        producer.beginTransaction();

        try {
            //处理业务逻辑并创建ProducerRecord
            ProducerRecord<String, String> record1 = new ProducerRecord<>(topic, "msg1");
            producer.send(record1);

            //模拟事务回滚案例
            System.out.println(1/0);

            ProducerRecord<String, String> record2 = new ProducerRecord<>(topic, "msg2");
            producer.send(record2);
            ProducerRecord<String, String> record3 = new ProducerRecord<>(topic, "msg3");
            producer.send(record3);
            //处理一些其它逻辑
            producer.commitTransaction();
        } catch (ProducerFencedException e) {
            producer.abortTransaction();
        }
        producer.close();
    }

}
View Code

 

 

 

8章 高级应用

8.1 命令行工具

 

 

 

 

 

8.2 数据管道Connect

 

 

 

 

 

 

 

 

8.4 流式处理Spark

 

 

 

 

 

8.5 SpringBoot Kafka

 

 

 

package com.heima.kafka.chapter8;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

/**
 * Kafka 消息消费者
 */
public class ConsumerFastStart {
    // Kafka集群地址
    private static final String brokerList = "127.0.0.1:9092";
    // 主题名称-之前已经创建
    private static final String topic = "heima";
    // 消费组
    private static
    final String groupId = "group.demo";

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("bootstrap.servers", brokerList);
        properties.put("group.id", groupId);

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList(topic));

        while (true) {
            ConsumerRecords<String, String> records =
                    consumer.poll(Duration.ofMillis(1000));
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record.value());
            }
        }
    }
}
View Code

 KafkaApplication

package com.heima.kafka;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.SendResult;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.util.concurrent.ListenableFuture;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.concurrent.ExecutionException;

@RestController
@SpringBootApplication
@RequestMapping
class KafkaApplication {
    private static final Logger logger = LoggerFactory.getLogger(KafkaApplication.class);

    public static void main(String[] args) {
        SpringApplication.run(KafkaApplication.class, args);
    }

    @RequestMapping("/index")
    public String index() {


        return "Hello,Kafka!";
    }

    @Autowired
    private KafkaTemplate template;
    private static final String topic = "topic0703";

    /**
     * 发送消息 带事务
     */
    @GetMapping("/send/{input}")
    public String sendToKafka(@PathVariable String input) throws ExecutionException, InterruptedException {
        ListenableFuture<SendResult<Object, Object>> send = this.template.send(topic, input);
        //return "send success";

        // 带返回值的
        //SendResult<Object, Object> result = send.get();
        //return result.toString();

        // 事务操作
        template.executeInTransaction(t -> {
            t.send(topic, input);
            if ("error".equals(input)) {
                throw new RuntimeException("input is error");
            }
            t.send(topic, input + " anthor");
            return true;
        });
        return "send success";
    }

    @GetMapping("/sendt/{input}")
    @Transactional(rollbackFor = RuntimeException.class)
    public String sendToKafka2(@PathVariable String input) throws ExecutionException, InterruptedException {
        template.send(topic, input);
        if ("error".equals(input)) {
            throw new RuntimeException("input is error");
        }
        template.send(topic, input + " anthor");
        return "send success";
    }

    /**
     * 接收消息
     */
    @KafkaListener(id = "", topics = topic, groupId = "group.demo")
    public void listener(String input) {
        logger.info("input value:{}", input);
    }
}
View Code

 

ReplyingKafkaTemplateTest

package com.heima.kafka;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
import org.springframework.kafka.listener.ConcurrentMessageListenerContainer;
import org.springframework.kafka.requestreply.ReplyingKafkaTemplate;
import org.springframework.kafka.requestreply.RequestReplyFuture;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.concurrent.ExecutionException;

/**
 * @Author dayuan
 */
@RestController
@SpringBootApplication
@RequestMapping
public class ReplyingKafkaTemplateTest {
    Logger logger = LoggerFactory.getLogger(ReplyingKafkaTemplateTest.class);

    private static final String topic = "topic0703";

    @Bean
    public ConcurrentMessageListenerContainer<String, String> repliesContainer(ConcurrentKafkaListenerContainerFactory<String, String> containerFactory) {
        ConcurrentMessageListenerContainer<String, String> repliesContainer = containerFactory.createContainer("replies");
        repliesContainer.getContainerProperties().setGroupId("repliesGroup");
        repliesContainer.setAutoStartup(false);
        return repliesContainer;
    }

    @Bean
    public ReplyingKafkaTemplate<String, String, String> replyingTemplate(ProducerFactory<String, String> pf, ConcurrentMessageListenerContainer<String, String> repliesContainer) {
        return new ReplyingKafkaTemplate(pf, repliesContainer);
    }

    @Bean
    public KafkaTemplate kafkaTemplate(ProducerFactory<String, String> pf) {
        return new KafkaTemplate(pf);
    }

    @Autowired
    private ReplyingKafkaTemplate template;


    @GetMapping("/rsend/{input}")
    @Transactional(rollbackFor = RuntimeException.class)
    public String send(@PathVariable String input) throws ExecutionException, InterruptedException {
        ProducerRecord<String, String> record = new ProducerRecord<>(topic, input);
        RequestReplyFuture<String, String, String> requestReplyFuture = template.sendAndReceive(record);
        ConsumerRecord<String, String> stringStringConsumerRecord = requestReplyFuture.get();
        logger.info(stringStringConsumerRecord.value());
        return stringStringConsumerRecord.value();
    }

}
View Code

 

application.properties

# Kafka config
spring.kafka.producer.bootstrap-servers=127.0.0.1:9092
spring.kafka.consumer.bootstrap-servers=127.0.0.1:9092
# 事务支持
spring.kafka.producer.transaction-id-prefix=kafka_tx.
View Code

 

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.1.6.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.dayuan</groupId>
    <artifactId>kafka</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>kafka</name>
    <description>Demo project for Spring Boot</description>

    <properties>
        <java.version>12</java.version>
    </properties>

    <dependencies>
        <!-- SpringBoot -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.springframework.kafka/spring-kafka -->
        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
            <version>2.2.6.RELEASE</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>
View Code

 

 

 

9章 集群管理

 

 

9.1 集群使用场景

 

 

 

9.2 集群搭建

 

 

 

 

 

 

 

9.3 多集群同步

 

 

 

10章 监控

10.1 监控度量指标

 

 

 

 

10.2 broker监控指标

 

 

 

10.3 主题分区监控

 

 

10.4 生产者监控指标

 

 

 

10.5 消费者监控指标

 

 

10.6 Kafka Eagle

 

 JmxConnectionDemo

package com.heima.kafka.chapter10;

import javax.management.*;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
import java.io.IOException;

/**
 * JMX Connection
 */
public class JmxConnectionDemo {
    private MBeanServerConnection conn;
    private String jmxURL;
    private String ipAndPort;

    public JmxConnectionDemo(String ipAndPort) {
        this.ipAndPort = ipAndPort;
    }

    public boolean init(){
        jmxURL = "service:jmx:rmi:///jndi/rmi://" + ipAndPort + "/jmxrmi";
        try {
            JMXServiceURL serviceURL = new JMXServiceURL(jmxURL);
            JMXConnector connector = JMXConnectorFactory
                    .connect(serviceURL, null);
            conn = connector.getMBeanServerConnection();
            if (conn == null) {
                return false;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }



    public double getMsgInPerSec() {
        String objectName = "kafka.server:type=BrokerTopicMetrics," +
                "name=MessagesInPerSec";
        Object val = getAttribute(objectName, "OneMinuteRate");
        if (val != null) {
            return (double) (Double) val;
        }
        return 0.0;
    }

    private Object getAttribute(String objName, String objAttr) {
        ObjectName objectName;
        try {
            objectName = new ObjectName(objName);
            return conn.getAttribute(objectName, objAttr);
        } catch (MalformedObjectNameException | IOException |
                ReflectionException | InstanceNotFoundException |
                AttributeNotFoundException | MBeanException e) {
            e.printStackTrace();
        }
        return null;
    }

    public static void main(String[] args) {
        JmxConnectionDemo jmxConnectionDemo =
                new JmxConnectionDemo("localhost:9999");
        jmxConnectionDemo.init();
        System.out.println(jmxConnectionDemo.getMsgInPerSec());
    }
}
View Code

 

KafkaConsumerGroupAnother.java

 

package com.heima.kafka.chapter10;

import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.JavaType;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.module.scala.DefaultScalaModule;
import kafka.admin.ConsumerGroupCommand;
import lombok.Builder;
import lombok.Data;

import java.io.IOException;
import java.util.List;
import java.util.Optional;


/**
 * 代码清单10-3
 * Created by 朱小厮 on 2018/10/22.
 */
public class KafkaConsumerGroupAnother {

    public static void main(String[] args) throws IOException {
        String[] agrs = {"--describe", "--bootstrap-server",
                "localhost:9092", "--group", "groupIdMonitor"};
        ConsumerGroupCommand.ConsumerGroupCommandOptions options =
                new ConsumerGroupCommand.ConsumerGroupCommandOptions(agrs);
        ConsumerGroupCommand.ConsumerGroupService kafkaConsumerGroupService =
                new ConsumerGroupCommand.ConsumerGroupService(options);

        ObjectMapper mapper = new ObjectMapper();
        //1. 使用jackson-module-scala_2.11
        mapper.registerModule(new DefaultScalaModule());
        //2. 反序列化时忽略对象不存在的属性
        mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES,
                false);
        //3. 将Scala对象序列化成JSON字符串
        //这里原本会有权限问题,通过序列化绕过
        String source = mapper.writeValueAsString(kafkaConsumerGroupService.
                collectGroupOffsets()._2.get());
        //4. 将JSON字符串反序列化成Java对象
        List<PartitionAssignmentStateAnother> target = mapper.readValue(source,
                getCollectionType(mapper,List.class,
                        PartitionAssignmentStateAnother.class));
        //5. 排序
        target.sort((o1, o2) -> o1.getPartition() - o2.getPartition());
        //6. 打印
        printPasList(target);
    }

    public static JavaType getCollectionType(ObjectMapper mapper,
                                             Class<?> collectionClass,
                                             Class<?>... elementClasses) {
        return mapper.getTypeFactory()
                .constructParametricType(collectionClass, elementClasses);
    }

    public static void printPasList(List<PartitionAssignmentStateAnother> list) {
        System.out.println(String.format("%-40s %-10s %-15s %-15s %-10s %-50s%-30s %s",
                "TOPIC", "PARTITION", "CURRENT-OFFSET", "LOG-END-OFFSET", "LAG", "CONSUMER-ID", "HOST", "CLIENT-ID"));
        list.forEach(item -> System.out.println(String.format("%-40s %-10s %-15s %-15s %-10s %-50s%-30s %s",
                item.getTopic(), item.getPartition(), item.getOffset(), item.getLogSize(), item.getLag(),
                Optional.ofNullable(item.getConsumerId()).orElse("-"),
                Optional.ofNullable(item.getHost()).orElse("-"),
                Optional.ofNullable(item.getClientId()).orElse("-"))));
    }
}

@Data
@Builder
class PartitionAssignmentStateAnother {
    private String group;
    private Node coordinator;
    private String topic;
    private int partition;
    private long offset;
    private long lag;
    private String consumerId;
    private String host;
    private String clientId;
    private long logSize;

    @Data
    public static class Node{
        public int id;
        public String idString;
        public String host;
        public int port;
        public String rack;
    }
}
View Code

 

KafkaConsumerGroupService.java

 

package com.heima.kafka.chapter10;

import lombok.Builder;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.CommonClientConfigs;
import org.apache.kafka.clients.admin.*;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.Node;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.*;
import java.util.concurrent.ExecutionException;

import static java.util.Comparator.comparing;
import static java.util.stream.Collectors.toList;

@Slf4j
public class KafkaConsumerGroupService {
    private String brokerList;
    private AdminClient adminClient;
    private KafkaConsumer<String, String> kafkaConsumer;

    public KafkaConsumerGroupService(String brokerList) {
        this.brokerList = brokerList;
    }

    public void init(){
        Properties props = new Properties();
        props.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        adminClient = AdminClient.create(props);
        kafkaConsumer = ConsumerGroupUtils.createNewConsumer(brokerList,
                "kafkaAdminClientDemoGroupId");
    }

    public void close(){
        if (adminClient != null) {
            adminClient.close();
        }
        if (kafkaConsumer != null) {
            kafkaConsumer.close();
        }
    }

    public List<PartitionAssignmentState> collectGroupAssignment(
            String group) throws ExecutionException, InterruptedException {
        DescribeConsumerGroupsResult groupResult = adminClient
                .describeConsumerGroups(Collections.singleton(group));
        ConsumerGroupDescription description =
                groupResult.all().get().get(group);

        List<TopicPartition> assignedTps = new ArrayList<>();
        List<PartitionAssignmentState> rowsWithConsumer = new ArrayList<>();
        Collection<MemberDescription> members = description.members();
        if (members != null) {
            ListConsumerGroupOffsetsResult offsetResult = adminClient
                    .listConsumerGroupOffsets(group);
            Map<TopicPartition, OffsetAndMetadata> offsets = offsetResult
                    .partitionsToOffsetAndMetadata().get();
            if (offsets != null && !offsets.isEmpty()) {
                String state = description.state().toString();
                if (state.equals("Stable")) {
                    rowsWithConsumer = getRowsWithConsumer(description, offsets,
                            members, assignedTps, group);
                }
            }
            List<PartitionAssignmentState> rowsWithoutConsumer =
                    getRowsWithoutConsumer(description, offsets,
                            assignedTps, group);
            rowsWithConsumer.addAll(rowsWithoutConsumer);
        }
        return rowsWithConsumer;
    }

    private List<PartitionAssignmentState> getRowsWithConsumer(
            ConsumerGroupDescription description,
            Map<TopicPartition, OffsetAndMetadata> offsets,
            Collection<MemberDescription> members,
            List<TopicPartition> assignedTps, String group) {
        List<PartitionAssignmentState> rowsWithConsumer = new ArrayList<>();
        for (MemberDescription member : members) {
            MemberAssignment assignment = member.assignment();
            if (assignment == null) {
                continue;
            }
            Set<TopicPartition> tpSet = assignment.topicPartitions();
            if (tpSet.isEmpty()) {
                rowsWithConsumer.add(PartitionAssignmentState.builder()
                        .group(group).coordinator(description.coordinator())
                        .consumerId(member.consumerId()).host(member.host())
                        .clientId(member.clientId()).build());

            } else {
                Map<TopicPartition, Long> logSizes =
                        kafkaConsumer.endOffsets(tpSet);
                assignedTps.addAll(tpSet);
                List<PartitionAssignmentState> tempList = tpSet.stream()
                        .sorted(comparing(TopicPartition::partition))
                        .map(tp -> getPasWithConsumer(logSizes, offsets, tp,
                                group, member, description)).collect(toList());
                rowsWithConsumer.addAll(tempList);
            }
        }
        return rowsWithConsumer;
    }

    private PartitionAssignmentState getPasWithConsumer(
            Map<TopicPartition, Long> logSizes,
            Map<TopicPartition, OffsetAndMetadata> offsets,
            TopicPartition tp, String group,
            MemberDescription member,
            ConsumerGroupDescription description) {
        long logSize = logSizes.get(tp);
        if (offsets.containsKey(tp)) {
            long offset = offsets.get(tp).offset();
            long lag = getLag(offset, logSize);
            return PartitionAssignmentState.builder().group(group)
                    .coordinator(description.coordinator()).lag(lag)
                    .topic(tp.topic()).partition(tp.partition())
                    .offset(offset).consumerId(member.consumerId())
                    .host(member.host()).clientId(member.clientId())
                    .logSize(logSize).build();
        }else {
            return PartitionAssignmentState.builder()
                    .group(group).coordinator(description.coordinator())
                    .topic(tp.topic()).partition(tp.partition())
                    .consumerId(member.consumerId()).host(member.host())
                    .clientId(member.clientId()).logSize(logSize).build();
        }
    }

    private static long getLag(long offset, long logSize) {
        long lag = logSize - offset;
        return lag < 0 ? 0 : lag;
    }

    private List<PartitionAssignmentState> getRowsWithoutConsumer(
            ConsumerGroupDescription description,
            Map<TopicPartition, OffsetAndMetadata> offsets,
            List<TopicPartition> assignedTps, String group) {
        Set<TopicPartition> tpSet = offsets.keySet();

        return tpSet.stream()
                .filter(tp -> !assignedTps.contains(tp))
                .map(tp -> {
                    long logSize = 0;
                    Long endOffset = kafkaConsumer.
                            endOffsets(Collections.singleton(tp)).get(tp);
                    if (endOffset != null) {
                        logSize = endOffset;
                    }
                    long offset = offsets.get(tp).offset();
                    return PartitionAssignmentState.builder().group(group)
                            .coordinator(description.coordinator())
                            .topic(tp.topic()).partition(tp.partition())
                            .logSize(logSize).lag(getLag(offset, logSize))
                            .offset(offset).build();
                }).sorted(comparing(PartitionAssignmentState::getPartition))
                .collect(toList());
    }

    public static void main(String[] args) throws ExecutionException,
            InterruptedException {
        KafkaConsumerGroupService service =
                new KafkaConsumerGroupService("localhost:9092");
        service.init();
        List<PartitionAssignmentState> list =
                service.collectGroupAssignment("groupIdMonitor");
        ConsumerGroupUtils.printPasList(list);
    }
}

class ConsumerGroupUtils{
    public static KafkaConsumer<String, String> createNewConsumer(
            String brokerUrl, String groupId) {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerUrl);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        return new KafkaConsumer<>(props);
    }

    public static void printPasList(List<PartitionAssignmentState> list) {
        System.out.println(String.format("%-40s %-10s %-15s %-15s %-10s" +
                        " %-50s%-30s %s", "TOPIC", "PARTITION",
                "CURRENT-OFFSET", "LOG-END-OFFSET", "LAG",
                "CONSUMER-ID", "HOST", "CLIENT-ID"));

        list.forEach(item ->
                System.out.println(String.format("%-40s %-10s %-15s " +
                                "%-15s %-10s %-50s%-30s %s",
                        item.getTopic(), item.getPartition(), item.getOffset(),
                        item.getLogSize(), item.getLag(),
                        Optional.ofNullable(item.getConsumerId()).orElse("-"),
                        Optional.ofNullable(item.getHost()).orElse("-"),
                        Optional.ofNullable(item.getClientId()).orElse("-"))));
    }
}

@Data
@Builder
class PartitionAssignmentState {
    private String group;
    private Node coordinator;
    private String topic;
    private int partition;
    private long offset;
    private long lag;
    private String consumerId;
    private String host;
    private String clientId;
    private long logSize;
}
View Code

TopicInfo

package com.heima.kafka;

import com.alibaba.fastjson.JSONObject;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * @Author dayuan
 * @Date 2019/7/11 11:28
 */
public class TopicInfo {

    public static final String brokerList = "localhost:9092";
    public static final String topic = "heima";
    public static final String groupId = "group.heima";
    private static AtomicBoolean running = new AtomicBoolean(true);

    public static Properties initConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
                StringDeserializer.class.getName());
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        return props;
    }

    public static void main(String[] args) {
        Properties props = initConfig();
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
        Map<String, List<PartitionInfo>> stringListMap = consumer.listTopics();
        System.out.println(JSONObject.toJSONString(stringListMap));
        //if(MapUtils.isNotEmpty(stringListMap)){
        //    Set<String> keys = stringListMap.keySet();
        //    for(String key:keys){
        //        List<PartitionInfo> partitionInfos = stringListMap.get(key);
        //        if(!partitionInfos.isEmpty()){
        //            for(PartitionInfo partitionInfo:partitionInfos){
        //                System.out.println("topic:"+partitionInfo.topic());
        //                System.out.println("partition:"+partitionInfo.partition());
        //            }
        //        }
        //    }
        //}

        TopicPartition tp = new TopicPartition(topic,0);
        long posititon = consumer.position(tp);
        System.out.println("the offset of the next record is " + posititon);
        System.out.println();
    }
}
View Code

 

KafkaContext

package com.heima.kafka;

/**
 * @Author dayuan
 * @Date 2019/7/11 14:40
 */
public abstract class KafkaContext {
    public static String brokerList = "localhost:9092";
    public static String topic = "heima";
    public static String groupId = "group.heima";
}
View Code

 

ConsumerClientConfig

package com.heima.kafka;

import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Properties;

/**
 * @Author dayuan
 * @Date 2019/7/11 14:46
 */
public class ConsumerClientConfig extends KafkaContext {

    public static Properties initConfig() {
        Properties props = new Properties();
        // key发序列化器
        props.put(org.apache.kafka.clients.consumer.ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // value反序列化器
        props.put(org.apache.kafka.clients.consumer.ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // Kafka集群地址列表
        props.put(org.apache.kafka.clients.consumer.ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
        // 消费组
        props.put(org.apache.kafka.clients.consumer.ConsumerConfig.GROUP_ID_CONFIG, groupId);
        // Kafka消费者找不到消费的位移时,从什么位置开始消费,默认:latest 末尾开始消费   earliest:从头开始
        // props.put(org.apache.kafka.clients.consumer.ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        // 是否启用自动位移提交
        props.put(org.apache.kafka.clients.consumer.ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
        return props;
    }

}
View Code

 

SparkStreamingFromkafka

package com.heima.kafka.spark;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaInputDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import scala.Tuple2;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;

/**
 * @Author dayuan
 * Kafka整合Spark
 */
public class SparkStreamingFromkafka {
    public static void main(String[] args) throws Exception {
        SparkConf sparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkStreamingFromkafka");
        JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Durations.seconds(10));

        Map<String, Object> kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "localhost:9092");//多个可用ip可用","隔开
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "sparkStreaming");
        Collection<String> topics = Arrays.asList("topic0703");//配置topic,可以是数组

        JavaInputDStream<ConsumerRecord<String, String>> javaInputDStream = KafkaUtils.createDirectStream(
                streamingContext,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.Subscribe(topics, kafkaParams));

        JavaPairDStream<String, String> javaPairDStream = javaInputDStream.mapToPair(new PairFunction<ConsumerRecord<String, String>, String, String>() {
            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, String> call(ConsumerRecord<String, String> consumerRecord) throws Exception {
                return new Tuple2<>(consumerRecord.key(), consumerRecord.value());
            }
        });
        javaPairDStream.foreachRDD(new VoidFunction<JavaPairRDD<String, String>>() {
            @Override
            public void call(JavaPairRDD<String, String> javaPairRDD) throws Exception {
                javaPairRDD.foreach(new VoidFunction<Tuple2<String, String>>() {
                    @Override
                    public void call(Tuple2<String, String> tuple2)
                            throws Exception {
                        System.out.println(tuple2._2);
                    }
                });
            }
        });
        streamingContext.start();
        streamingContext.awaitTermination();
    }

}
View Code

log4j.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
log4j.rootLogger=INFO, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n

#log4j.logger.org.apache.kafka=ERROR
View Code

 

附录

 MQ对比

详见文档

kafka优化

 

 

Zookeeper和Kafka的关系,为啥Kafka依赖Zookeeper

zookeeper和Kafka的关系
1.在Kafka的设计中,选择了使用Zookeeper来进行所有Broker的管理,体现在zookeeper上会有一个专门用来进行Broker服务器列表记录的点,节点路径为/brokers/ids

每个Broker服务器在启动时,都会到Zookeeper上进行注册,即创建/brokers/ids/[0-N]的节点,然后写入IP,端口等信息,Broker创建的是临时节点,所有一旦Broker上线或者下线,对应Broker节点也就被删除了,因此我们可以通过zookeeper上Broker节点的变化来动态表征Broker服务器的可用性,Kafka的Topic也类似于这种方式。

2.生产者负载均衡
生产者需要将消息合理的发送到分布式Broker上,这就面临如何进行生产者负载均衡问题。
对于生产者的负载均衡,Kafka支持传统的4层负载均衡,zookeeper同时也支持zookeeper方式来实现负载均衡。
(1)传统的4层负载均衡
根据生产者的IP地址和端口来为其定一个相关联的Broker,通常一个生产者只会对应单个Broker,只需要维护单个TCP链接。这样的方案有很多弊端,因为在系统实际运行过程中,每个生产者生成的消息量,以及每个Broker的消息存储量都不一样,那么会导致不同的Broker接收到的消息量非常不均匀,而且生产者也无法感知Broker的新增与删除。
(2)使用zookeeper进行负载均衡
很简单,生产者通过监听zookeeper上Broker节点感知Broker,Topic的状态,变更,来实现动态负载均衡机制,当然这个机制Kafka已经结合zookeeper实现了。

3.消费者的负载均衡和生产负载均衡类似

4.记录消息分区于消费者的关系,都是通过创建修改zookeeper上相应的节点实现

5.记录消息消费进度Offset记录,都是通过创建修改zookeeper上相应的节点实现
————————————————
kafka如何保证消息不丢失不被重复消费

在解决这个问题之前,我们首先梳理一下kafka消息的发送和消费机制。

消息的发送机制
kafka的消息发送机制分为同步和异步机制。可以通过producer.type属性进行配置。使用同步模式的时候,有三种状态来保证消息的安全生产。可以通过配置request.required.acks属性。三个属性分别如下

0—表示不进行消息接收是否成功的确认;
1—表示当Leader接收成功时确认;
-1—表示Leader和Follower都接收成功时确认;
当acks = 0的时候,不和Kafka集群进行消息接收确认,则当网络异常、缓冲区满了等情况时,消息可能丢失;当acks=1的时候,只保证leader写入成功。当leader partition挂了的时候,数据就有可能发生丢失。另外还有一种情况,使用异步模式的时候,当缓冲区满了,acks=0的时候,不需要进行消息接受是否成功的确认,所以会自动清空缓冲池里的消息。

同步模式下只需要将确认机制设置为-1,让消息写入leader和所有的副本,就可以保证消息安全生产。

异步模式下,则需要在配置文件中,将阻塞超时的时间设置为不限制。这样生产端会一直阻塞。可以保证数据不丢失。

我们需要设置block.on.buffer.full = true。 这样producer将一直等待缓冲区直至其变为可用。缓冲区满了就阻塞

acks=all。所有的follwoer都响应了消息就认为消息提交成功。

retries=MAX。无限重试。

max.in.flight.requests.per.connnection = 1限制客户端在单个连接上能够发送的未响应的请求的个数。设置为1表示kafka broker在响应请求之前client不能再向broker发送请求了。通过此举可以保证消息的顺序性。

消息的接受机制
消息的接受端保证消息不丢失的情形就比较简单了。kafka的consumer模式是自动提交位移的。我们只需要在代码逻辑中保证位移提交前消息被处理就行。我们可以关闭自动提交位移,设置enable.auto.commit为false。自己手动处理消息后提交位移。

消息的重复消费如何解决
重复消费的问题,一方面需要消息中间件来进行保证。另一方面需要自己的处理逻辑来保证消息的幂等性。极有可能代码消费了消息,但服务器突然宕机,未来得及提交offset。所以我们可以在代码保证消息消费的幂等性。至于方法可以通过redis的原子性来保证,也可以通过数据库的唯一id来保证。

数据写入kafka的分区策略

众所周知,kafka有分区的概念,生产者写入数据到kafka,涉及到数据到底写到哪个分区?
kafka api提供了默认的partitioner函数,具体策略如下:
1、如果生产者写入数据时,指定了具体分区,则使用该分区
2、如果生产者没有指定分区,但是提供了key,partitioner函数会对该key取hash再对topic数量取模以确定分区
3、如果生产者既没有指定分区,也没有提供key,则采用round-robin的方式选取分区

题库
1.Kafka中的ISR(InSyncRepli)、OSR(OutSyncRepli)、AR(AllRepli)代表什么?
ISR : 速率和leader相差低于10秒的follower的集合
OSR : 速率和leader相差大于10秒的follower
AR : 所有分区的follower

2.Kafka中的HW、LEO等分别代表什么?

 

 Kafka中的HW、LEO、LSO等分别代表什么? - gaoyanliang - 博客园 (cnblogs.com)


HW : 又名高水位,根据同一分区中,最低的LEO所决定
LEO : 每个分区的最高offset

3.Kafka的用途有哪些?使用场景如何?
1.用户追踪:根据用户在web或者app上的操作,将这些操作消息记录到各个topic中,然后消费者通过订阅这些消息做实时的分析,或者记录到HDFS,用于离线分析或数据挖掘
2.日志收集:通过kafka对各个服务的日志进行收集,再开放给各个consumer
3.消息系统:缓存消息
4.运营指标:记录运营监控数据,收集操作应用数据的集中反馈,如报错和报告

4.Kafka中是怎么体现消息顺序性的?
每个分区内,每条消息都有offset,所以只能在同一分区内有序,但不同的分区无法做到消息顺序性

5.“消费组中的消费者个数如果超过topic的分区,那么就会有消费者消费不到数据”这句话是否正确?
对的,超过分区数的消费者就不会再接收数据

6. 有哪些情形会造成重复消费?或丢失信息?
先处理后提交offset,会造成重读消费
先提交offset后处理,会造成数据丢失

7.Kafka 分区的目的?
对于kafka集群来说,分区可以做到负载均衡,对于消费者来说,可以提高并发度,提高读取效率

8.Kafka 的高可靠性是怎么实现的?
为了实现高可靠性,kafka使用了订阅的模式,并使用isr和ack应答机制
能进入isr中的follower和leader之间的速率不会相差10秒
当ack=0时,producer不等待broker的ack,不管数据有没有写入成功,都不再重复发该数据
当ack=1时,broker会等到leader写完数据后,就会向producer发送ack,但不会等follower同步数据,如果这时leader挂掉,producer会对新的leader发送新的数据,在old的leader中不同步的数据就会丢失
当ack=-1或者all时,broker会等到leader和isr中的所有follower都同步完数据,再向producer发送ack,有可能造成数据重复

9.topic的分区数可不可以增加?如果可以怎么增加?如果不可以,那又是为什么?
可以增加

bin/kafka-topics.sh --zookeeper localhost:2181/kafka --alter --topic topic-config --partitions 3
1
10.topic的分区数可不可以减少?如果可以怎么减少?如果不可以,那又是为什么?
不可以,先有的分区数据难以处理

11.简述Kafka的日志目录结构?
每一个分区对应一个文件夹,命名为topic-0,topic-1,每个文件夹内有.index和.log文件

12.如何解决消费者速率低的问题?
增加分区数和消费者数

13.Kafka的那些设计让它有如此高的性能??
1.kafka是分布式的消息队列
2.对log文件进行了segment,并对segment建立了索引
3.(对于单节点)使用了顺序读写,速度可以达到600M/s
4.引用了zero拷贝,在os系统就完成了读写操作

14.kafka启动不起来的原因?
在关闭kafka时,先关了zookeeper,就会导致kafka下一次启动时,会报节点已存在的错误
只要把zookeeper中的zkdata/version-2的文件夹删除即可

15.聊一聊Kafka Controller的作用?
负责kafka集群的上下线工作,所有topic的副本分区分配和选举leader工作

16.Kafka中有那些地方需要选举?这些地方的选举策略又有哪些?
在ISR中需要选择,选择策略为先到先得

17.失效副本是指什么?有那些应对措施?
失效副本为速率比leader相差大于10秒的follower
将失效的follower先提出ISR
等速率接近leader10秒内,再加进ISR

18.Kafka消息是采用Pull模式,还是Push模式?
在producer阶段,是向broker用Push模式
在consumer阶段,是向broker用Pull模式
在Pull模式下,consumer可以根据自身速率选择如何拉取数据,避免了低速率的consumer发生崩溃的问题
但缺点是,consumer要时不时的去询问broker是否有新数据,容易发生死循环,内存溢出

19.Kafka创建Topic时如何将分区放置到不同的Broker中?
首先副本数不能超过broker数
第一分区是随机从Broker中选择一个,然后其他分区相对于0号分区依次向后移
第一个分区是从nextReplicaShift决定的,而这个数也是随机产生的

20.Kafka中的事务是怎么实现的?☆☆☆☆☆
kafka事务有两种
producer事务和consumer事务
producer事务是为了解决kafka跨分区跨会话问题
kafka不能跨分区跨会话的主要问题是每次启动的producer的PID都是系统随机给的
所以为了解决这个问题
我们就要手动给producer一个全局唯一的id,也就是transaction id 简称TID
我们将TID和PID进行绑定,在producer带着TID和PID第一次向broker注册时,broker就会记录TID,并生成一个新的组件__transaction_state用来保存TID的事务状态信息
当producer重启后,就会带着TID和新的PID向broker发起请求,当发现TID一致时
producer就会获取之前的PID,将覆盖掉新的PID,并获取上一次的事务状态信息,从而继续上次工作
consumer事务相对于producer事务就弱一点,需要先确保consumer的消费和提交位置为一致且具有事务功能,才能保证数据的完整,不然会造成数据的丢失或重复

21.Kafka中的分区器、序列化器、拦截器是否了解?它们之间的处理顺序是什么?
拦截器>序列化器>分区器

22.Kafka生产者客户端的整体结构是什么样子的?使用了几个线程来处理?分别是什么?

使用两个线程:
main线程和sender线程
main线程会依次经过拦截器,序列化器,分区器将数据发送到RecourdAccumlator(线程共享变量)
再由sender线程从RecourdAccumlator中拉取数据发送到kafka broker
相关参数:
batch.size:只有数据积累到batch.size之后,sender才会发送数据。
linger.ms:如果数据迟迟未达到batch.size,sender等待linger.time之后就会发送数据。

23.消费者提交消费位移时提交的是当前消费到的最新消息的offset还是offset+1?
offset + 1
图示:

生产者发送数据offset是从0开始的

消费者消费的数据offset是从offset+1开始的


一个主题下的分区不能小于消费者数量,多出来的就会空闲不进行消费
 
KAFKA分区选择策略
kafka producer 发送消息的时候,可以指定 key,这个 key 的作用是为消息选择存储分区

当指定 key 且不为空的时候,kafka 是根据 key 的 hash 值与分区数取模来决定数据存储到那个分区

当 key=null 时,kafka 是先从缓存中取分区号,然后判断缓存的值是否为空,如果不为空,就将消息存到这个分区,否则重新计算要存储的分区,并将分区号缓存起来,供下次使用

kafka 定义了一 个全局变量, 这个变量值是配置参数中的topic.metadata.refresh.interval.ms 设置的值,也就是说在这个时间内,key=null 的消息都会往缓存起来的这个分区存储,当时缓存过时之后,就会重新计算分区号,将计算结果缓存起来。也就是说在key为null的情况下,Kafka并不是每条消息都随机选择一Partition;而是每隔 topic.metadata.refresh.interval.ms 才会随机选择一次!
 
batch.size
当多个消息发送到相同分区时,生产者会将消息打包到一起,以减少请求交互. 而不是一条条发送
批次的大小可以通过batch.size 参数设置.默认是16KB
较小的批次大小有可能降低吞吐量(批次大小为0则完全禁用批处理)。
一个非常大的批次大小可能会浪费内存。因为我们会预先分配这个资源。
例子
比如说发送消息的频率就是每秒300条,那么如果比如batch.size调节到了32KB,或者64KB,是否可以提升发送消息的整体吞吐量。

因为理论上来说,提升batch的大小,可以允许更多的数据缓冲在里面,那么一次Request发送出去的数据量就更多了,这样吞吐量可能会有所提升。

但是这个东西也不能无限的大,过于大了之后,要是数据老是缓冲在Batch里迟迟不发送出去,那么岂不是你发送消息的延迟就会很高。

比如说,一条消息进入了Batch,但是要等待5秒钟Batch才凑满了64KB,才能发送出去。那这条消息的延迟就是5秒钟。

所以需要在这里按照生产环境的发消息的速率,调节不同的Batch大小自己测试一下最终出去的吞吐量以及消息的延迟,设置一个最合理的参数。

linger.ms
上面比如我们设置batch size为32KB,但是比如有的时刻消息比较少,过了很久,比如5min也没有凑够32KB,这样延时就很大,所以需要一个参数. 再设置一个时间,到了这个时间,即使数据没达到32KB,也将这个批次发送出去. 比如设置5ms,就是到了5ms,大小没到32KB,也会发出去
总结
同时设置batch.size和 linger.ms,就是哪个条件先满足就都会将消息发送出去
Kafka需要考虑高吞吐量与延时的平衡.
 

kafka与zookeeper的关系:一个典型的Kafka集群中包含若干Produce,若干broker(一般broker数量越多,集群吞吐率越高),若干Consumer Group,以及一个Zookeeper集群。Kafka通过Zookeeper管理集群配置,选举leader,以及在Consumer Group发生变化时进行rebalance。Producer使用push模式将消息发布到broker,Consumer使用pull模式从broker订阅并消费消息。维护生产者和消费者的负载均衡。

consumer group中的新增或删除某个consumer,导致其所消费的分区需要分配到组内其他的consumer上;
consumer订阅的topic发生变化,比如订阅的topic采用的是正则表达式的形式,如test-*此时如果有一个新建了一个topic test-user,那么这个topic的所有分区也是会自动分配给当前的consumer的,此时就会发生再平衡;
consumer所订阅的topic发生了新增分区的行为,那么新增的分区就会分配给当前的consumer,此时就会触发再平衡。

 

posted @ 2021-09-05 17:55  你的雷哥  阅读(696)  评论(0编辑  收藏  举报