Apache Kafka has become the de facto standard for building real-time data pipelines and streaming applications. Designed as a distributed commit log, Kafka decouples producers and consumers through a durable, high-throughput, fault-tolerant architecture. In this comprehensive blog post, we’ll explore Kafka’s core building blocks—Topics, Partitions, Offsets, Consumer Groups—and delve into the granular configuration options for Producers and Consumers that ensure performance, reliability, and exactly-once semantics.

Apache Kafka Flow Visualization

Real-time message streaming visualization

Producer

📤 PRODUCER

Kafka Cluster

Consumers

Sent

Consumed

Consumers

1. Topics: The Logical Stream of Events

A Topic in Kafka is a named feed to which records are published. Think of it as a category or log file name. Topics provide a namespace for producers to write messages and for consumers to subscribe.

Key characteristics:
- Append-only: Once written, records are immutable and stored in log segments on disk.
- Retention policies: Configurable by size or time (retention.ms, retention.bytes).
- Cleanup policies: delete (default) or compact, useful for changelog topics in stream processing.

# Create a topic named "orders" with 3 partitions and a replication factor of 2
kafka-topics --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 2 \
  --bootstrap-server broker1:9092

2. Partitions: Parallelism and Ordering

Each topic is divided into Partitions, which are the unit of parallelism and ordering in Kafka.

Ordering guarantee: Within a single partition, Kafka guarantees strict order by offset.
Scalability: More partitions → higher throughput and more consumer parallelism.
Leader/Follower: One broker acts as the leader for a partition; others are followers replicating the log.

Producers can assign messages to partitions via:

Key-based partitioning (deterministic): partition = hash(key) % num_partitions.
Round-robin (no key): Balances load evenly across partitions.

3. Offsets: Bookmarking Your Place

Within a partition, each record has a monotonically increasing Offset (0, 1, 2, …). Offsets serve as bookmarks for consumers:

Consumer offset: The next record to read. Stored either in Kafka’s __consumer_offsets topic or externally.
Auto vs. manual commit:
- enable.auto.commit=true: Kafka will commit offsets every auto.commit.interval.ms.
- enable.auto.commit=false: You control when to commit via the consumer API, enabling finer control.

// Manual commit example
consumer.poll(Duration.ofMillis(100));
consumer.commitSync();

4. Consumer Groups: Scaling and Fault Tolerance

A set of consumers identified by the same group.id form a Consumer Group. Kafka ensures that each partition of a subscribed topic is consumed by exactly one consumer in the group, providing:

Load balancing: Distributes partitions across consumers.
Fault tolerance: If a consumer crashes, its partitions are reassigned.
Rebalancing: Triggered when consumers join/leave the group or subscriptions change.

# Sample consumer properties
group.id=order-processors
enable.auto.commit=false
auto.offset.reset=earliest

5. Producers: Publishing Data with Precision

5.1 Essential Producer Configurations

Config	Description	Default
`bootstrap.servers`	Comma-separated list of broker addresses.	-
`key.serializer`	Class to serialize message keys (e.g., `StringSerializer`).	-
`value.serializer`	Class to serialize message values (e.g., `StringSerializer`).	-
`acks`	Number of acknowledgments the leader must receive (`0`, `1`, `all`).	`1`
`retries`	Number of retry attempts on transient failures.	`2147483647`
`linger.ms`	Time to wait for additional messages before sending a batch.	`0`
`batch.size`	Maximum size (in bytes) of each batch.	`16KB`
`enable.idempotence`	Ensure exactly-once delivery per producer session.	`false`

5.2 Enabling Exactly-Once Semantics

enable.idempotence=true
acks=all
retries=MAX_INT
max.in.flight.requests.per.connection=1

With idempotence, Kafka assigns a producer ID and sequence numbers to detect duplicates on retries.

6. Consumers: Fetching and Processing Messages

6.1 Key Consumer Configurations

Config	Description	Default
`bootstrap.servers`	Comma-separated list of broker addresses.	-
`group.id`	Identifier for the consumer group.	-
`key.deserializer`	Class to deserialize message keys (e.g., `StringDeserializer`).	-
`value.deserializer`	Class to deserialize message values (e.g., `StringDeserializer`).	-
`auto.offset.reset`	Action when no offset is found (`earliest`, `latest`, `none`).	`latest`
`enable.auto.commit`	Whether to auto-commit offsets.	`true`
`fetch.min.bytes`	Minimum bytes to fetch in a request before responding.	`1`
`max.poll.records`	Maximum number of records returned in a single `poll()`.	`500`

6.2 Poll Loop Example

while (running) {
  ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
  for (ConsumerRecord<String, String> record : records) {
    // process record.key(), record.value()
  }
  consumer.commitSync();
}

7. Under-the-Hood: Replication and Fault Tolerance

Replication factor: Number of copies per partition. Ensures data durability.
In-Sync Replicas (ISR): Followers fully caught up with the leader.
Unclean leader election: Avoided by default to prevent data loss.

min.insync.replicas=2

Sets the minimum ISR required for acks=all to succeed.

8. Delivery Guarantees

Semantics	Description
At most once	Messages may be lost but never redelivered (acks=0).
At least once	Messages may be redelivered, duplicates possible (default acks=1, retries >0).
Exactly once	No duplicates even on retries (idempotent producer + transactional API).

producer.initTransactions();
try {
  producer.beginTransaction();
  // send messages
  producer.commitTransaction();
} catch (Exception e) {
  producer.abortTransaction();
}

9. Conclusion

Apache Kafka’s design—distributed, partitioned, replicated—provides the backbone for resilient, high-throughput data streams. By tuning producer and consumer configurations, you can adapt Kafka for use cases ranging from simple pub/sub to mission-critical exactly-once stream processing. We’ve covered the fundamental concepts and configuration knobs; your next step is hands‑on experimentation with the Kafka console tools, client libraries, and stream processing APIs. Happy streaming!

Apache Kafka

Apache Kafka Flow Visualization

Producer

Kafka Cluster

Consumers

1. Topics: The Logical Stream of Events

2. Partitions: Parallelism and Ordering

3. Offsets: Bookmarking Your Place

4. Consumer Groups: Scaling and Fault Tolerance

5. Producers: Publishing Data with Precision

5.1 Essential Producer Configurations

5.2 Enabling Exactly-Once Semantics

6. Consumers: Fetching and Processing Messages

6.1 Key Consumer Configurations

6.2 Poll Loop Example

7. Under-the-Hood: Replication and Fault Tolerance

8. Delivery Guarantees

9. Conclusion

Read Next

Builder Design pattern

SOLID Principle

Subscribe to Newsletter

Apache Kafka Flow Visualization

Producer

Kafka Cluster

Consumers

Configure Kafka System

1. Topics: The Logical Stream of Events

2. Partitions: Parallelism and Ordering

3. Offsets: Bookmarking Your Place

4. Consumer Groups: Scaling and Fault Tolerance

5. Producers: Publishing Data with Precision

5.1 Essential Producer Configurations

5.2 Enabling Exactly-Once Semantics

6. Consumers: Fetching and Processing Messages

6.1 Key Consumer Configurations

6.2 Poll Loop Example

7. Under-the-Hood: Replication and Fault Tolerance

8. Delivery Guarantees

9. Conclusion

Read Next

Builder Design pattern

SOLID Principle

Subscribe to Newsletter