ZhangZhihui's Blog  

The commit() method on a Kafka consumer is about tracking which messages you have successfully processed. Let’s go step by step.


1️⃣ Concept: Offsets

  • Each Kafka partition maintains an offset for each message.

  • The offset is a number representing the position of a message in a partition.

  • Consumers use offsets to track which messages have been consumed.


2️⃣ Consumer commit()

  • commit() tells Kafka:

“I have successfully processed messages up to this offset; you can consider them consumed.”

  • There are two main ways to commit offsets:

a) Automatic commit (enable.auto.commit=True)

  • The consumer commits offsets periodically (default every 5 seconds).

  • Kafka keeps track of which messages have been read.

  • Risk: messages may be lost or reprocessed if the consumer crashes between commits.

b) Manual commit (enable.auto.commit=False)

  • You explicitly call consumer.commit() after processing messages.

  • Gives fine-grained control to ensure messages are only committed after successful processing.


3️⃣ Types of commit

a) Synchronous commit

consumer.commit()  # blocks until offsets are committed
  • Guarantees the offset is committed before moving on.

  • Safe, but may slow down processing if you commit too often.

b) Asynchronous commit

consumer.commit_async()
consumer.commit(asynchronous=True)
  • Returns immediately, commits in the background.

  • Faster, but there’s a small chance of losing the commit if the consumer crashes.


4️⃣ Offset storage

  • Offsets are stored in a special Kafka topic: __consumer_offsets.

  • This is how Kafka remembers each consumer group’s progress.


5️⃣ Example: Manual commit

from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'my-topic',
    bootstrap_servers='localhost:9092',
    enable_auto_commit=False,
    group_id='my-group'
)

for message in consumer:
    process(message)
    consumer.commit()  # commit after processing each message
  • Only after commit() does Kafka know that this message has been successfully processed.


✅ Key Takeaways

  1. commit() records the last processed offset for the consumer group.

  2. Ensures no message is lost or reprocessed unnecessarily.

  3. Manual commits give more control, especially in critical processing pipelines.

  4. Automatic commits are easier but risk message duplication after crashes.

 

posted on 2025-12-02 17:59  ZhangZhihuiAAA  阅读(2)  评论(0)    收藏  举报