Cloud & Infrastructure · Messaging
NATS JetStream in Production: When Kafka Is Too Much
Kafka is the default answer for message queuing at scale. But for teams running fewer than a million messages per day, NATS JetStream offers persistence, delivery guarantees, and a dramatically simpler operational footprint.
Anurag Verma
7 min read
Sponsored
Every time a team needs reliable message delivery between services, someone suggests Kafka. Kafka is a reasonable choice at genuine Kafka scale: millions of messages per second, dozens of consumer groups, regulatory retention requirements, the works. For most teams, it’s substantial operational overhead in exchange for capabilities they won’t use.
NATS JetStream is the alternative worth knowing. NATS has been around since 2011 as a lightweight pub/sub system. JetStream, added in 2021, layers persistence, delivery guarantees, and stream replay on top of the core system. The result is a message broker that handles most production messaging use cases with a fraction of the complexity.
What JetStream Adds to Core NATS
Core NATS is fire-and-forget pub/sub. If a subscriber is offline when a message is published, it misses the message. That’s fine for use cases like real-time metrics or cache invalidation signals where missed messages are acceptable. It’s not fine for order processing, payment events, or anything where “at least once delivery” matters.
JetStream fixes this with persistent streams:
- Streams store messages for configurable retention periods (time-based or size-based)
- Consumers track their position within a stream and receive unacknowledged messages
- Acknowledgment modes: at-most-once (fire and forget), at-least-once (explicit ack), and exactly-once (with double-ack protocol)
- Pull and push consumers: pull for batch processing, push for real-time workloads
The key difference from Kafka: a NATS server can run as a single binary with no external dependencies. A Kafka cluster requires Kafka brokers plus ZooKeeper (or KRaft quorum), monitoring, and enough expertise to operate the cluster safely. NATS embedded mode lets you run the server inside your application process during testing.
Setting Up a Stream
Here’s a minimal JetStream setup in Go, the language NATS tooling works best in:
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
func main() {
nc, err := nats.Connect(nats.DefaultURL)
if err != nil {
log.Fatal(err)
}
defer nc.Close()
js, err := jetstream.New(nc)
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
// Create a stream that persists up to 7 days or 1GB
stream, err := js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
Name: "ORDERS",
Subjects: []string{"orders.>"},
Retention: jetstream.LimitsPolicy,
MaxAge: 7 * 24 * time.Hour,
MaxBytes: 1 * 1024 * 1024 * 1024,
Storage: jetstream.FileStorage,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Stream created: %s\n", stream.CachedInfo().Config.Name)
}
The orders.> subject pattern matches any subject starting with orders. — so orders.created, orders.payment.confirmed, orders.shipped all land in the same stream. This hierarchical subject routing is one of NATS’s genuinely useful design decisions.
Publishing and Consuming with Delivery Guarantees
// Publisher
func publishOrder(js jetstream.JetStream, orderID string, payload []byte) error {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
ack, err := js.Publish(ctx, "orders.created", payload,
jetstream.WithMsgID(orderID), // Deduplication key
)
if err != nil {
return fmt.Errorf("publish failed: %w", err)
}
fmt.Printf("Published seq=%d, stream=%s\n", ack.Sequence, ack.Stream)
return nil
}
// Consumer (pull-based, good for batch processing)
func startConsumer(js jetstream.JetStream) error {
ctx := context.Background()
consumer, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
Name: "order-processor",
Durable: "order-processor",
AckPolicy: jetstream.AckExplicitPolicy,
MaxDeliver: 5,
AckWait: 30 * time.Second,
FilterSubject: "orders.created",
})
if err != nil {
return err
}
for {
msgs, err := consumer.Fetch(10, jetstream.FetchMaxWait(5*time.Second))
if err != nil {
if err == jetstream.ErrTimeout {
continue // no messages, wait again
}
return err
}
for msg := range msgs.Messages() {
if err := processOrder(msg.Data()); err != nil {
msg.Nak() // Redeliver after AckWait expires
continue
}
msg.Ack()
}
}
}
MaxDeliver: 5 means a message that fails 5 times gets moved to a dead-letter stream automatically. AckWait: 30 * time.Second means if your consumer doesn’t ack within 30 seconds, NATS redelivers to another consumer instance. These are sensible defaults for most workloads.
Key-Value and Object Store
JetStream ships with two higher-level APIs built on top of streams: Key-Value and Object Store.
Key-Value is a distributed KV store with history. You can configure how many revisions to keep per key, set TTLs, and watch for changes:
kv, err := js.CreateOrUpdateKeyValue(ctx, jetstream.KeyValueConfig{
Bucket: "config",
History: 10, // keep last 10 versions of each key
TTL: 24 * time.Hour,
})
// Watch for changes to any config key
watcher, _ := kv.WatchAll(ctx)
defer watcher.Stop()
go func() {
for entry := range watcher.Updates() {
if entry == nil {
continue // initial values delivered, now in watch mode
}
fmt.Printf("Config changed: %s = %s\n", entry.Key(), entry.Value())
}
}()
// Write and read
kv.Put(ctx, "feature.flags.new-checkout", []byte("true"))
entry, _ := kv.Get(ctx, "feature.flags.new-checkout")
fmt.Println(string(entry.Value()))
This replaces Redis or etcd for simple configuration distribution between services, with the NATS server as the single dependency.
NATS vs Kafka: The Honest Comparison
| Factor | NATS JetStream | Apache Kafka |
|---|---|---|
| Setup complexity | Single binary, no deps | ZooKeeper or KRaft + brokers |
| Throughput ceiling | ~50M msgs/sec per node | ~1M msgs/sec per broker |
| Message retention | Configurable, up to disk | Configurable, designed for long retention |
| Consumer model | Push and pull | Pull only |
| Exactly-once | Double-ack protocol | Exactly-once transactions (complex) |
| Multi-tenancy | Accounts + auth | ACLs (complex at scale) |
| Cloud-native cluster | JetStream clustering | Requires careful partition tuning |
| Learning curve | Low | High |
| Good for | <10M msgs/day, small teams | >100M msgs/day, dedicated infra team |
The throughput numbers are rarely the deciding factor. NATS can push 50 million messages per second on decent hardware. Unless you’re running a large-scale event processing pipeline, that ceiling doesn’t matter. What matters is whether you want to maintain a Kafka cluster.
Running NATS in Production
NATS runs as a single binary with a simple config file:
# nats-server.conf
listen: 0.0.0.0:4222
http: 0.0.0.0:8222
jetstream {
store_dir: /data/nats
max_memory_store: 1GB
max_file_store: 20GB
}
cluster {
name: production
listen: 0.0.0.0:6222
routes: [
nats://nats-1:6222
nats://nats-2:6222
nats://nats-3:6222
]
}
A three-node cluster provides high availability. Add a monitoring endpoint at port 8222, and you can query server stats, stream info, and consumer lag with plain HTTP. No separate tools needed for basic operational visibility.
For Kubernetes, the NATS Helm chart handles the cluster setup:
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm install nats nats/nats \
--set config.jetstream.enabled=true \
--set config.jetstream.fileStore.enabled=true \
--set config.cluster.enabled=true \
--set config.cluster.replicas=3
Where NATS Doesn’t Fit
NATS is not the right choice when:
- You need message replay across months or years (Kafka’s log compaction handles this better)
- You’re processing genuinely high volumes — billions of events per day — where Kafka’s partition-based parallelism matters
- You need Kafka Connect or the Kafka ecosystem of connectors for data pipelines
- Your organization already runs Kafka well and adding another broker technology creates more cost than it saves
For fan-out patterns where hundreds of consumers subscribe to the same subject, NATS push consumers are very efficient. For high-throughput analytics pipelines that need guaranteed ordering across many partitions, Kafka’s model fits better.
The Migration Path
If you’re replacing an existing system, NATS offers a bridge mode that can subscribe to Kafka topics and republish to NATS subjects. It’s useful for gradual migration rather than a hard cutover.
For most teams starting from scratch — or replacing ad hoc database polling and Redis pub/sub — NATS JetStream is the first thing to reach for. You get persistence, delivery guarantees, consumer groups, dead-letter queues, and a key-value store with a single deployment. That’s a lot of capability for something you can start with brew install nats-server.
The inflection point where Kafka becomes necessary is later than most teams think. If you’re not sure which side of that line you’re on, start with NATS.
Sponsored
More from this category
More from Cloud & Infrastructure
Prometheus and Grafana: Production Monitoring Without the Complexity Tax
Dev Containers: Reproducible Development Environments in 2026
Docker Image Optimization in 2026: Multi-Stage Builds and the Sizes That Actually Matter
Sponsored
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored