As a full-stack developer and distributed systems engineer, I help companies build large-scale streaming data architectures using technologies like Apache Kafka.
In this comprehensive 3200+ word guide, you‘ll learn how to build a high performance Kafka consumer in Go that‘s able to process streams of data at scale.
Overview of Apache Kafka
Apache Kafka is a distributed, partitioned, replicated commit log service. It was open sourced by LinkedIn in 2011 and later became a top level Apache project in 2012.
What does this actually mean though? At a high level, Kafka provides a durable, fault tolerant, append-only messaging system. Here‘s a quick rundown of Kafka terminology and concepts:
Topics and Logs
Kafka organizes data streams into topics. Under the covers, each topic is broken into one or more partitions, which reside on disk as logs:
Topic 1
|
|-- Partition 0 (Log)
|-- Partition 1 (Log)
|-- Partition 2 (Log)
New records get appended to partitions in an immutable fashion. Kafka retains data for a configurable period, defaulting to 7 days.
Producers and Consumers
Clients communicate with Kafka through TCP connections to the cluster. Producers publish messages to topics while consumers subscribe to topic data feeds.
Messages
Messages consist of a key, value and metadata like timestamps. Kafka treats both the key and value as binary blobs – they have no schema enforced on the broker. Hence producers and consumers must agree on serialization formats.
Consumer Groups
When multiple consumers connect to a topic, they can form consumer groups. Group members divide topic partitions between themselves so each partition is consumed by exactly one member. This allows scaling out consumers horizontally.
Kafka Architecture Overview
Now that we understand some basics terminologies, let‘s look at Kafka‘s architecture:

Brokers
At the heart of a Kafka deployment are brokers – Kafka cluster nodes that manage data persistence and replication. Brokers expose a TCP interface for producers/consumers as well as inter-broker communication.
Typical broker configurations utilize direct attached storage like SSDs for performance. SAS disks can be used as secondary storage tiers. Brokers themselves are stateless, relying on ZooKeeper for cluster management.
ZooKeeper
Apache ZooKeeper coordinates the Kafka cluster and manages configurations, broker metadata, consumer offsets and more. ZooKeeper elects controller instances which handle cluster leadership duties like partition replication and broker failover handling.
Producers and Consumers
Clients communicate with brokers directly to publish and subscribe to topic data feeds. The brokers themselves are agnostic to the message payload format – it‘s up to clients to agree on serialization formats like JSON, Protobuf or Avro.
Now that we‘ve covered Kafka basics, let‘s focus in on the consumer clients.
Kafka Consumer Client Capabilities
Some key capabilities Kafka consumer clients provide:
- Subscribe to topics and partition feeds
- Track offset position in partitions
- Control polling duration with timeouts
- Batch fetch records for efficiency
- Commit offsets manually or automatically
- Rebalance partitions on consumer group changes
- Handle broker and network failures
- Consumer group coordination
Now let‘s go through how to leverage these capabilities within a Go application.
Golang Setup
For starters, make sure Go 1.9+ is installed. Next let‘s initialize a module:
mkdir consumer-app-go && cd consumer-app-go
go mod init github.com/myorg/consumer-app-go
This sets up Go module support so dependencies get pulled into the vendor folder.
For interacting with Kafka, we‘ll use the Confluent Kafka Golang client:
go get github.com/confluentinc/confluent-kafka-go/kafka
Let‘s now create a file consumer.go to hold our application code.
Kafka Consumer Configuration
Inside consumer.go, the first step is to configure our consumer. This sets up the initial bootstrap brokers as well as the consumer group this process joins:
package main
import (
"fmt"
"github.com/confluentinc/confluent-kafka-go/kafka"
)
func main() {
// Consumer configuration
c, err := kafka.NewConsumer(&kafka.ConfigMap{
"bootstrap.servers": "localhost:9092",
"group.id": "my-consumer",
})
}
In the config map, we primarily need to specify:
bootstrap.servers
This provides Kafka brokers for bootstrapping initial connections and metadata. The client will fetch whole cluster information from these servers.
group.id
Uniquely identifies the consumer group this process is joining. Think of it like a worker ID. The same group.id across processes enables scale-out capabilities.
Some other useful configs include:
auto.offset.reset: Where to start if no offsets committed yetenable.auto.commit: To enable background commitssession.timeout.ms: Grace period to rejoin and resume
Now we‘re ready to start consuming.
Kafka Consumer Topic Subscription
The consumer process must subscribe to one or more topics it will consume. This handles coordinating which topic partitions are assigned to the consumer process:
func main() {
// Previously initialized consumer
topics := []string{"trades", "users"}
err = c.SubscribeTopics(topics, nil)
}
By subscribing to topics, the consumer joins the group coordinating those topic feeds. Topic partitions will be dynamically assigned by the Kafka brokers to balance load.
Now that we‘ve subscribed, messages will start flowing based on partition assignments.
Consuming Messages & Events
Withsubscriptions initialized, we can start the main consumer loop:
for {
msg, err := c.ReadMessage(-1)
if err == nil {
fmt.Printf("Message: %s\n", string(msg.Value))
} else {
fmt.Printf("Error: %v \n", err)
}
}
A few pointers on the message consumption process:
ReadMessagefetches next message from assigned partitions- Timeout of -1 blocks indefinitely until new message arrives
- The
msgcontains key, value, timestamp and other metadata - Printed the value payload for simplicity
This main loop allows the app to process a continual stream of messages off Kafka topics by:
- Pulling through TCP socket for assigned partitions
- Deserializing message batches
- Iterating over messages and processing
Other configurations like batch sizes and polling intervals allow tuning throughput.
Kafka Consumer Flow Control
An important capability provided by Kafka is consumer flow control. By adding some buffering, backpressure provides a mechanism to add resilience:

Some key parameters that control message flow:
Fetch Batch Sizes
Consumers pull messages in batches for efficiency. A larger batch means less overhead per message. However setting this too high can overwhelm consumers. 50-500 messages per request is common.
Prefetch Buffer
The consumer library buffers a configurabale number of batches ahead of the application. This allows the consumer to keep consuming when application is slow to avoid backpressure propagating through the pipeline.
Poll Duration
In cases when no new messages arrive, the timeout controls how long ReadMessage blocks. This balances efficiency with batching.
Tuning for sustained high throughput requires balancing these configurations.
Now let‘s take a deeper look into delivery semantics.
Kafka Consumer Delivery Semantics
Kafka offers two delivery semantics:
At most once
Easy to implement – consume, process, and commit sequentially. No duplication. Data loss on crashes.
At least once
Ensures every message gets processed but duplicates may occur. Needs idempotent processing & commit on success.
The approach boils down to when offsets commits occur either before or after processing steps.
For stronger delivery guarantees, we‘ll look at implementing an at least once approach.
Implementing At Least Once Delivery
Kafka commits two offsets – current position and last stable offset. The stable value trails behind indicating guarantee all messages processed through offset committed successfully even in crashes.

To implement at least once semantics:
- Fetch message
- Process message
- Commit message offset
This ensures that duplicates may occur after crashes, but everything gets processed eventually. Combining this with idempotent processing provides end-to-end guarantees.
Let‘s look at an example in Go:
for {
msg, _ := consumer.ReadMessage(-1)
// Process msg
process(msg)
// Mark offset successfully processed
consumer.CommitMessage(msg)
}
This style waits to commit only after successful processing, ensuring at least once delivery. The trailing committed offset lags behind indicating we processed this safely before proceeding.
Now that we understand the fundamentals of building a consumer, let‘s explore some more advanced capabilities.
Managing Consumer Group Rebalancing
As consumers join or leave a group, Kafka handles ownership transfers of partitions through rebalancing. Assignments get shifted between members leading to a graceful handoff:

Rebalancing enables:
- Adding consumers for scale
- Removing unhealthy consumers
- Replication across data centers
Graceful shutdowns can notify the group ahead of departure. But crashes also get handled ensuring data continues flowing.
The consumer app must properly handle the RebalanceCb callback to reposition data streams and commit offsets. This makes coordination seamless behind the scenes.
Consumer Monitoring & Metrics
Observability into consumer group metrics provides invaluable insight:
Throughput
Number of messages passing through lets you size systems and identify bottlenecks.
End to End Lag
The delay between producer send time and processing time indicates how "caught up" a partition is. Rising lag indicates problems.
Consumer Heartbeats
Metrics on heartbeat send/fail rates show consumer health.
Here‘s an example dashboard:

Prometheus is a popular choice, storing numeric time series metrics generated from Kafka client instrumentation and logs. Grafana visualizes these metrics through dashboards.
Comparing Kafka Go Client Libraries
There‘s two popular Kafka client libraries for Go – Sarama and Confluent. Let‘s compare some key differences:
| Feature | Sarama | Confluent |
|---|---|---|
| Community Support | Apache licensed, community driven | Open source, commercially supported |
| Client Capabilities | Consumer groups + producers | Adds admin client, schemas, serializers |
| Error Handling | Returns errors | Handles retries automatically |
| Compatibility | Apache Kafka 0.8+ | Confluent Platform 5.0+ |
| Async Support | Async producers + consumers | Sync clients, async processing |
| Maturity | Since 2013, stable API | Since 2016, evolving API |
Generally Confluent‘s Go client builds on Sarama adding value like schema registry integration and admin capabilities. It wraps the C librdkafka library focusing on usability.
Sarama provides a lower level interface interacting directly with Kafka brokers. This makes it ideal for building event streaming applications on your own framework.
Implementing Kafka Streams in Go
Let‘s explore a more advanced example that builds a stream processing application. We‘ll leverage a worker pool to distribute processing across threads:

package main
import (
"context"
"github.com/confluentinc/confluent-kafka-go/kafka"
"golang.org/x/sync/errgroup"
)
type Message struct {
Key string
Value string
}
func main() {
topic := "trades"
consumer := CreateConsumer(topic)
defer consumer.Close()
workers := CreateWorkerPool(10)
g := new(errgroup.Group)
// Start consumer loop
g.Go(func() error {
for {
msg, err := consumer.ReadMessage(-1)
if err != nil { return err }
workers.Process(&Message{
Key: string(msg.Key),
Value: string(msg.Value),
})
}
})
// Block until all goroutines done
if err := g.Wait(); err != nil {
log.Fatal(err)
}
}
func CreateConsumer(topic string) *kafka.Consumer {
// New consumer
}
func CreateWorkerPool(size int) *WorkerPool {
// Starts up workers
}
By leveraging goroutines for concurrent processing behind the scenes, we build a scalable stream processing application. Additional workers allow handling higher throughput. Batching also improves efficiency greatly.
Kafka enables several streaming paradigms:
KSQL
Streaming SQL for data pipelines and materialized views.
Kafka Streams
Distributed, scalable library for building streaming apps.
Flink
Unified stream & batch processing at scale.
There‘s several ways to build on the foundations Kafka provides.
Kafka Consumer Performance & Sizing
When planning out hardware capacity, data ingest rates dictate storage and consumer sizing requirements.
Here‘s some ballpark Kafka benchmarks on modern cloud hardware profiles:
| Workload | Throughput | Drives | Instance Type |
|---|---|---|---|
| Non-Replicated Topics | 1.5 Gbps | 1 EBS gp3 | m5.8xlarge |
| 3x Replication Factor | 500 Mbps | 1 EBS gp3 | m5.8xlarge |
Some useful metrics for forecasts:
- 1 MB/s → 86 GB/day
- 3 MB/s → 250 GB/day
Analyzing maximum ingest rates and retention policies allows projections of storage capacity over time.
Generally CPU is the initial bottleneck before additional processing, like compression, strains IO bandwidth. Also consider downstream consumers when sizing – often the slowest link in pipelines.
Overprovisioning brokers allows headroom for spikes or maintenance. Scale-out architectures easily add capacity as needed.
Conclusion
Kafka provides a battle-tested foundation for large scale streaming data architectures. Golang empowers building high performance applications like consumer systems.
In this guide we covered Kafka architecture basics, implementing a consumer group, delivery semantics, monitoring, stream processing examples and sizing considerations.
Getting the foundations right on ingest pipelines prevents major issues down the road once business relies on streams of data. Investment into scalable, observable consumer apps pays dividends over time as complexity grows.
To learn more, check out the Kafka site and Confluent docs. The complete Go consumer app code is available on GitHub.


