NATS JetStream in Production: When Kafka Is Too Much

Every time a team needs reliable message delivery between services, someone suggests Kafka. Kafka is a reasonable choice at genuine Kafka scale: millions of messages per second, dozens of consumer groups, regulatory retention requirements, the works. For most teams, it’s substantial operational overhead in exchange for capabilities they won’t use.

NATS JetStream is the alternative worth knowing. NATS has been around since 2011 as a lightweight pub/sub system. JetStream, added in 2021, layers persistence, delivery guarantees, and stream replay on top of the core system. The result is a message broker that handles most production messaging use cases with a fraction of the complexity.

What JetStream Adds to Core NATS

Core NATS is fire-and-forget pub/sub. If a subscriber is offline when a message is published, it misses the message. That’s fine for use cases like real-time metrics or cache invalidation signals where missed messages are acceptable. It’s not fine for order processing, payment events, or anything where “at least once delivery” matters.

JetStream fixes this with persistent streams:

Streams store messages for configurable retention periods (time-based or size-based)
Consumers track their position within a stream and receive unacknowledged messages
Acknowledgment modes: at-most-once (fire and forget), at-least-once (explicit ack), and exactly-once (with double-ack protocol)
Pull and push consumers: pull for batch processing, push for real-time workloads

The key difference from Kafka: a NATS server can run as a single binary with no external dependencies. A Kafka cluster requires Kafka brokers plus ZooKeeper (or KRaft quorum), monitoring, and enough expertise to operate the cluster safely. NATS embedded mode lets you run the server inside your application process during testing.

Setting Up a Stream

Here’s a minimal JetStream setup in Go, the language NATS tooling works best in:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/nats-io/nats.go"
    "github.com/nats-io/nats.go/jetstream"
)

func main() {
    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    js, err := jetstream.New(nc)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // Create a stream that persists up to 7 days or 1GB
    stream, err := js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.>"},
        Retention: jetstream.LimitsPolicy,
        MaxAge:   7 * 24 * time.Hour,
        MaxBytes: 1 * 1024 * 1024 * 1024,
        Storage:  jetstream.FileStorage,
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Stream created: %s\n", stream.CachedInfo().Config.Name)
}

The orders.> subject pattern matches any subject starting with orders. — so orders.created, orders.payment.confirmed, orders.shipped all land in the same stream. This hierarchical subject routing is one of NATS’s genuinely useful design decisions.

Publishing and Consuming with Delivery Guarantees

// Publisher
func publishOrder(js jetstream.JetStream, orderID string, payload []byte) error {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    ack, err := js.Publish(ctx, "orders.created", payload,
        jetstream.WithMsgID(orderID), // Deduplication key
    )
    if err != nil {
        return fmt.Errorf("publish failed: %w", err)
    }

    fmt.Printf("Published seq=%d, stream=%s\n", ack.Sequence, ack.Stream)
    return nil
}

// Consumer (pull-based, good for batch processing)
func startConsumer(js jetstream.JetStream) error {
    ctx := context.Background()

    consumer, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Name:          "order-processor",
        Durable:       "order-processor",
        AckPolicy:     jetstream.AckExplicitPolicy,
        MaxDeliver:    5,
        AckWait:       30 * time.Second,
        FilterSubject: "orders.created",
    })
    if err != nil {
        return err
    }

    for {
        msgs, err := consumer.Fetch(10, jetstream.FetchMaxWait(5*time.Second))
        if err != nil {
            if err == jetstream.ErrTimeout {
                continue // no messages, wait again
            }
            return err
        }

        for msg := range msgs.Messages() {
            if err := processOrder(msg.Data()); err != nil {
                msg.Nak() // Redeliver after AckWait expires
                continue
            }
            msg.Ack()
        }
    }
}

MaxDeliver: 5 means a message that fails 5 times gets moved to a dead-letter stream automatically. AckWait: 30 * time.Second means if your consumer doesn’t ack within 30 seconds, NATS redelivers to another consumer instance. These are sensible defaults for most workloads.

Key-Value and Object Store

JetStream ships with two higher-level APIs built on top of streams: Key-Value and Object Store.

Key-Value is a distributed KV store with history. You can configure how many revisions to keep per key, set TTLs, and watch for changes:

kv, err := js.CreateOrUpdateKeyValue(ctx, jetstream.KeyValueConfig{
    Bucket:  "config",
    History: 10, // keep last 10 versions of each key
    TTL:     24 * time.Hour,
})

// Watch for changes to any config key
watcher, _ := kv.WatchAll(ctx)
defer watcher.Stop()

go func() {
    for entry := range watcher.Updates() {
        if entry == nil {
            continue // initial values delivered, now in watch mode
        }
        fmt.Printf("Config changed: %s = %s\n", entry.Key(), entry.Value())
    }
}()

// Write and read
kv.Put(ctx, "feature.flags.new-checkout", []byte("true"))
entry, _ := kv.Get(ctx, "feature.flags.new-checkout")
fmt.Println(string(entry.Value()))

This replaces Redis or etcd for simple configuration distribution between services, with the NATS server as the single dependency.

NATS vs Kafka: The Honest Comparison

Factor	NATS JetStream	Apache Kafka
Setup complexity	Single binary, no deps	ZooKeeper or KRaft + brokers
Throughput ceiling	~50M msgs/sec per node	~1M msgs/sec per broker
Message retention	Configurable, up to disk	Configurable, designed for long retention
Consumer model	Push and pull	Pull only
Exactly-once	Double-ack protocol	Exactly-once transactions (complex)
Multi-tenancy	Accounts + auth	ACLs (complex at scale)
Cloud-native cluster	JetStream clustering	Requires careful partition tuning
Learning curve	Low	High
Good for	<10M msgs/day, small teams	>100M msgs/day, dedicated infra team

The throughput numbers are rarely the deciding factor. NATS can push 50 million messages per second on decent hardware. Unless you’re running a large-scale event processing pipeline, that ceiling doesn’t matter. What matters is whether you want to maintain a Kafka cluster.

Running NATS in Production

NATS runs as a single binary with a simple config file:

# nats-server.conf
listen: 0.0.0.0:4222
http: 0.0.0.0:8222

jetstream {
  store_dir: /data/nats
  max_memory_store: 1GB
  max_file_store: 20GB
}

cluster {
  name: production
  listen: 0.0.0.0:6222
  routes: [
    nats://nats-1:6222
    nats://nats-2:6222
    nats://nats-3:6222
  ]
}

A three-node cluster provides high availability. Add a monitoring endpoint at port 8222, and you can query server stats, stream info, and consumer lag with plain HTTP. No separate tools needed for basic operational visibility.

For Kubernetes, the NATS Helm chart handles the cluster setup:

helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm install nats nats/nats \
  --set config.jetstream.enabled=true \
  --set config.jetstream.fileStore.enabled=true \
  --set config.cluster.enabled=true \
  --set config.cluster.replicas=3

Where NATS Doesn’t Fit

NATS is not the right choice when:

You need message replay across months or years (Kafka’s log compaction handles this better)
You’re processing genuinely high volumes — billions of events per day — where Kafka’s partition-based parallelism matters
You need Kafka Connect or the Kafka ecosystem of connectors for data pipelines
Your organization already runs Kafka well and adding another broker technology creates more cost than it saves

For fan-out patterns where hundreds of consumers subscribe to the same subject, NATS push consumers are very efficient. For high-throughput analytics pipelines that need guaranteed ordering across many partitions, Kafka’s model fits better.

The Migration Path

If you’re replacing an existing system, NATS offers a bridge mode that can subscribe to Kafka topics and republish to NATS subjects. It’s useful for gradual migration rather than a hard cutover.

For most teams starting from scratch — or replacing ad hoc database polling and Redis pub/sub — NATS JetStream is the first thing to reach for. You get persistence, delivery guarantees, consumer groups, dead-letter queues, and a key-value store with a single deployment. That’s a lot of capability for something you can start with brew install nats-server.

The inflection point where Kafka becomes necessary is later than most teams think. If you’re not sure which side of that line you’re on, start with NATS.

NATS JetStream in Production: When Kafka Is Too Much

What JetStream Adds to Core NATS

Setting Up a Stream

Publishing and Consuming with Delivery Guarantees

Key-Value and Object Store

NATS vs Kafka: The Honest Comparison

Running NATS in Production

Where NATS Doesn’t Fit

The Migration Path

Turning Agency Work Into Products: The IP Playbook

Prometheus and Grafana: Production Monitoring Without the Complexity Tax

More from Cloud & Infrastructure

Prometheus and Grafana: Production Monitoring Without the Complexity Tax

Dev Containers: Reproducible Development Environments in 2026

Docker Image Optimization in 2026: Multi-Stage Builds and the Sizes That Actually Matter

Join the conversation.