Apache Kafka is one of the most popular open-source data streaming platforms adopted by companies like Netflix, Spotify, LinkedIn, Uber to build mission-critical applications. In this comprehensive 3500+ word guide, we take an in-depth look at deploying a Kafka cluster using Docker Compose for development or testing.

Overview:

  • Introduction to Apache Kafka
  • Deploying Kafka Cluster with Docker Compose
  • Working with Kafka Cluster
  • Kafka Cluster Administration
  • Comparison of Deployment Options
  • Kafka Security Considerations
  • Conclusion

Let’s get started.

Introduction to Apache Kafka

Apache Kafka is an open-source, highly scalable, low latency platform that enables storing and processing streams of data in a fault-tolerant way. Before we deploy Kafka, let‘s understand its architecture and core concepts.

Kafka Architecture

The following diagram shows the high-level architecture of Apache Kafka:

Kafka Architecture

A Kafka cluster primarily consists of the following components:

Brokers: Kafka brokers are server nodes that receive messages from producers and store them on disk. Data is replicated across brokers for fault tolerance using a replication factor.

Topics and Partitions: Kafka topics are logical streams of data. A topic is split into ordered partitions which contain messages in an immutable sequence. Partitions allow parallelism by distributing data across brokers.

Producers: Producers are applications that publish data to Kafka brokers. The broker assigns the message to the correct partition.

Consumers: Consumers subscribe to topics and process messages published by producers. Consumers track their read progress within each partition.

ZooKeeper: Zookeeper provides coordination between brokers and consumers/producers by electing a controller, managing service registry and cluster metadata.

This architecture allows Kafka to be massively scalable and achieve very high throughput for reading and writing streams of data.

Key Concepts of Kafka

Some key concepts related to Kafka streams and partitions:

  • Retention: Kafka lets you retain streams of data durably by configuring retention period on topics. For example, setting retention to 7 days allows replaying data up to 7 days old.

  • Multiple Subscribers: A topic can have many consumer groups subscribing to it in parallel isolating streams per consumer group. For example, a payments and fraud analysis group can consume same stream independently.

  • Ordering Guarantees: Consumers read records in order within a partition avoiding out-of-order issues in stream processing.

  • Horizontal Scalability: Kafka partitions allow distributing topic data across many brokers achieving great horizontal scale.

  • High Availability: Data is replicated across brokers using a replication factor avoiding data loss in case of broker failures.

Now that we understand Kafka architecture and core concepts, let‘s deploy a Kafka cluster using Docker.

Deploying Kafka Cluster with Docker Compose

For development and testing, Kafka can be conveniently deployed on a single host using Docker containers. We will use Docker Compose to deploy a 3 broker Kafka cluster along with Zookeeper.

Docker Compose File for Kafka Cluster

Here is the docker-compose.yml file that will start a Kafka cluster:

version: ‘3‘
services:

  zookeeper:
    image: confluentinc/cp-zookeeper:7.3.0
    container_name: zookeeper
    ports:
     - "2181:2181"
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000  

  kafka-1:
    image: confluentinc/cp-kafka:7.3.0
    container_name: kafka-1 
    ports:
     - "9091:9091"
    depends_on:
     - zookeeper
    environment:
     KAFKA_BROKER_ID: 1
     KAFKA_ZOOKEEPER_CONNECT: ‘zookeeper:2181‘
     KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER://kafka-1:9091  

  kafka-2:  
    image: confluentinc/cp-kafka:7.3.0
    container_name: kafka-2
    ports:
     - "9092:9092"
    depends_on:
    - zookeeper
    environment:
     KAFKA_BROKER_ID: 2
     KAFKA_ZOOKEEPER_CONNECT: ‘zookeeper:2181‘ 
     KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER://kafka-2:9092

  kafka-3:
    image: confluentinc/cp-kafka:7.3.0
    container_name: kafka-3
    ports:
     - "9093:9093"
    depends_on:
    - zookeeper
    environment:
     KAFKA_BROKER_ID: 3
     KAFKA_ZOOKEEPER_CONNECT: ‘zookeeper:2181‘ 
     KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER://kafka-3:9093

Let‘s analyze the docker-compose file:

  • We are using official Confluent Kafka images tagged as 7.3.0
  • The cluster consists of 1 Zookeeper instance and 3 Kafka brokers
  • Zookeeper runs on port 2181 and Kafka brokers run on ports 9091, 9092 and 9093
  • The Kafka broker ids are set from 1 to 3 using the KAFKA_BROKER_ID environment variable
  • Kafka connects to zookeeper via the container name zookeeper at port 2181
  • The ADVERTISED_LISTENERS are set using the LISTENER_DOCKER protocol for inter-broker communication

With this configuration, we are ready to start our containers.

Starting Kafka Cluster Containers

Launch the containers in detached mode using docker-compose:

docker-compose up -d

Verify that the containers are running using docker ps:

CONTAINER ID   IMAGE                                     STATUS          PORTS                      NAMES
d8e2f611a263   confluentinc/cp-kafka:7.3.0               Up 5 seconds    0.0.0.0:9093->9093/tcp     kafka-3
475e75f658d5   confluentinc/cp-kafka:7.3.0               Up 5 seconds    0.0.0.0:9092->9092/tcp     kafka-2
cbe4146ef60d   confluentinc/cp-kafka:7.3.0               Up 6 seconds    0.0.0.0:9091->9091/tcp     kafka-1 
412aaa80172b   confluentinc/cp-zookeeper:7.3.0           Up 6 seconds    2181/tcp, 2888/tcp, 0.0.0.0:2181->2181/tcp   zookeeper

Our 3 broker Kafka cluster is now ready!

Let‘s test it out by producing and consuming messages.

Working with Kafka Cluster

We can connect to Kafka brokers from the host machine to publish and consume messages.

Produce Messages to Kafka Topic

Let‘s produce a few messages to the test topic using:

kafka-console-producer --topic test --bootstrap-server localhost:9091

Then in the console, type some messages and hit enter:

Hello Kafka!  
This is my first Kafka message
Learning Kafka with Docker

This publishes the messages to the test topic on broker at port 9091.

Consume Published Messages

In another terminal, consume the published messages starting from beginning:

kafka-console-consumer --topic test --from-beginning --bootstrap-server localhost:9091

You should see the messages consumed:

Hello Kafka!
This is my first Kafka message  
Learning Kafka with Docker

Likewise, you can connect producers and consumers to brokers running at ports 9092 and 9093 as well.

Our Kafka cluster works correctly! Let‘s look at how to manage and monitor it.

Kafka Cluster Administration with Control Center

Confluent Control Center provides GUI interface to manage and monitor Kafka clusters including broker monitoring, topic management, and schema registry.

Integrate Control Center

We update the docker-compose file to add Control Center:

services:

# Existing zookeeper and kafka services

control-center:
    image: confluentinc/cp-enterprise-control-center:7.3.0
    hostname: control-center
    container_name: control-center
    depends_on:
      - kafka-1  
      - kafka-2
      - kafka-3
    ports:
      - "9021:9021"
    environment:
      CONTROL_CENTER_BOOTSTRAP_SERVERS: ‘kafka-1:9091,kafka-2:9092,kafka-3:9093‘

We connected Control Center to the 3 Kafka brokers.

Access Control Center GUI

Restart the cluster and access Control Center at http://localhost:9021 in browser.

You will see the Overview dashboard with live metrics for brokers, topics, partitions and consumers:

Control Center Dashboard

Similarly, check out topic management, schema registry etc. features that Control Center provides for Kafka administration.

Next, let‘s compare this single host Docker Compose deployment option for Kafka with other setups.

Comparison of Kafka Deployment Options

For development/testing purposes, Kafka can be conveniently deployed on a single host using Docker Compose. This keeps resource usage minimal.

However, for production usage there are other recommended deployment options:

Deployment Method Description Benefits Use Case
Docker Compose Multiple containers on a single Docker host Simplified configuration, local development Prototype, development
Docker Swarm Distributed containers across multiple Docker hosts High-availability, horizontal scale Small scale production
Kubernetes Containers managed by Kubernetes cluster manager Flexibility, reliability, self-healing Enterprise grade production
Confluent Cloud Fully managed Apache Kafka clusters No ops, cloud integration capabilities Public cloud based data streaming

Table 1: Comparison of popular Kafka deployment options

As shown above, Docker Compose is great for local development but not adequate for large scale production scenarios.

Managed offerings like Confluent Cloud provide fully managed, auto-scaled Kafka clusters on public clouds along with capabilities like:

  • Automatic provisioning, scaling and healing
  • Out-of-the-box monitoring, security, and role-based access controls
  • Over 100+ cloud service integrations (Google Cloud, AWS, Azure)
  • Developer self-service access to clusters and topics

For enterprise grade production use cases, managed services eliminate operational complexity.

Now let‘s discuss some Kafka security considerations.

Kafka Security Considerations

Though we haven‘t configured security for simplicity, here are some guidelines for properly securing Kafka clusters:

  • Use SSL for encryption between Kafka clients and brokers
  • Enable SASL/SCRAM authentication for client connections
  • Restrict access with ACLs between clients and allowed topics
  • Encrypt inter-broker communication using SSL or IPSec
  • Integrate with enterprise authentication systems
  • Enable schema validation on producers and consumers

Additional measures like encryption, dedicated VPC networks, firewall policies should also be considered.

For multi-datacenter clusters, rely on the cloud provider’s secure infrastructure.

Finally, let‘s conclude what we learned.

Conclusion

In this comprehensive guide, we covered the following:

  • Overview of Kafka architecture and core concepts
  • Step-by-step instructions to deploy a 3 node Kafka cluster using Docker Compose
  • Examples to produce and consume messages with Kafka brokers
  • Integrating Confluent Control Center for Kafka cluster monitoring
  • Comparison between Kafka deployment methods
  • Security considerations for Kafka in production

Apache Kafka provides a high performance, resilient platform for building streaming data pipelines and applications. For local development purposes, Docker Compose provides a simplified way to run Kafka clusters.

Additionally, cloud-native deployment options make running large scale production Kafka clusters convenient by eliminating operational complexity.

I hope you found this guide useful. Feel free to reach out for any more questions!

Similar Posts