kafka官方文档 中英双语 1.2 案例

1.2 Use Cases

案例

Here is a description of a few of the popular use cases for Apache Kafka®. For an overview of a number of these areas in action, see this blog post.

下面是对 Apache Kafka® 的一些流行案例的描述。

Messaging

消息传递

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

Kafka 可以很好地替代更传统的消息代理。消息代理的使用有多种原因(将处理与数据生产者分离,缓冲未处理的消息等)。与大多数消息传递系统相比,Kafka 具有更好的吞吐量、内置分区、复制和容错能力,这使其成为大规模消息处理应用程序的良好解决方案。

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

根据我们的经验,消息传递的使用通常吞吐量相对较低,但可能需要较低的端到端延迟,并且通常依赖于 Kafka 提供的强大持久性保证。

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.

在这个领域中,Kafka 可以与 ActiveMQ 或 RabbitMQ 等传统消息传递系统相媲美。

Website Activity Tracking

网站活动跟踪

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Kafka 最初的使用案例是能够将用户活动跟踪管道重建为一组实时发布 - 订阅提要。这意味着站点活动(页面浏览、搜索或用户可能采取的其他操作)发布到中心主题,每个活动类型有一个主题。这些提要可用于订阅一系列用例,包括实时处理、实时监控以及加载到 Hadoop 或离线数据仓库系统以进行离线处理和报告。

Activity tracking is often very high volume as many activity messages are generated for each user page view.

活动跟踪通常是非常大的量,因为每个用户页面浏览量都会生成许多活动消息。

Metrics

指标

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Kafka 通常用于操作监控数据。这涉及聚合来自分布式应用程序的统计信息以生成操作数据的集中提要。

Log Aggregation

日志聚合

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

许多人使用 Kafka 作为日志聚合解决方案的替代品。日志聚合通常从服务器收集物理日志文件,并将它们放在中心位置(可能是文件服务器或 HDFS)进行处理。Kafka 抽象出文件的细节,并将日志或事件数据抽象为消息流。这允许更低的延迟处理,更容易支持多个数据源和分布式数据消费。与 Scribe 或 Flume 等以日志为中心的系统相比,Kafka 提供了同样好的性能、由于复制而更强的持久性保证以及更低的端到端延迟。

Stream Processing

流处理

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.

Kafka 的许多用户在由多个阶段组成的处理管道中处理数据,其中原始输入数据从 Kafka topic中消费,然后聚合、丰富或以其他方式转换为新topic以供进一步消费或后续处理。例如,推荐新闻文章的处理管道可能会从 RSS 提要中抓取文章内容并将其发布到 “文章” 主题;进一步的处理可能会规范化或删除重复内容,并将清理后的文章内容发布到新主题;最后的处理阶段可能会尝试向用户推荐此内容。此类处理管道根据各个主题创建实时数据流图表。从 0.10.0.0 开始,Apache Kafka 中提供了一个名为 Kafka Streams 的轻量级但功能强大的流处理库来执行上述数据处理。除了 Kafka Streams 之外,替代的开源流处理工具包括 Apache Storm 和 Apache Samza。

Event Sourcing

事件源

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

事件源是一种应用程序设计风格,其中状态更改被记录为按时间顺序排列的记录序列。Kafka 对非常大的存储日志数据的支持使其成为以这种风格构建的应用程序的绝佳后端。

Commit Log

提交日志

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.

Kafka 可以作为分布式系统的一种外部提交日志。日志有助于在节点之间复制数据,并充当故障节点恢复数据的重新同步机制。Kafka 中的日志压缩功能有助于支持这种用法。在这种用法中,Kafka 类似于 Apache BookKeeper 项目。

posted @ 2025-04-06 19:34  海冠军  阅读(31)  评论(0)    收藏  举报