Skip to content

Cloud & Infrastructure · Messaging

NATS JetStream in Production: When Kafka Is Too Much

Kafka is the default answer for message queuing at scale. But for teams running fewer than a million messages per day, NATS JetStream offers persistence, delivery guarantees, and a dramatically simpler operational footprint.

Anurag Verma

Anurag Verma

7 min read

NATS JetStream in Production: When Kafka Is Too Much

Sponsored

Share

Every time a team needs reliable message delivery between services, someone suggests Kafka. Kafka is a reasonable choice at genuine Kafka scale: millions of messages per second, dozens of consumer groups, regulatory retention requirements, the works. For most teams, it’s substantial operational overhead in exchange for capabilities they won’t use.

NATS JetStream is the alternative worth knowing. NATS has been around since 2011 as a lightweight pub/sub system. JetStream, added in 2021, layers persistence, delivery guarantees, and stream replay on top of the core system. The result is a message broker that handles most production messaging use cases with a fraction of the complexity.

What JetStream Adds to Core NATS

Core NATS is fire-and-forget pub/sub. If a subscriber is offline when a message is published, it misses the message. That’s fine for use cases like real-time metrics or cache invalidation signals where missed messages are acceptable. It’s not fine for order processing, payment events, or anything where “at least once delivery” matters.

JetStream fixes this with persistent streams:

  • Streams store messages for configurable retention periods (time-based or size-based)
  • Consumers track their position within a stream and receive unacknowledged messages
  • Acknowledgment modes: at-most-once (fire and forget), at-least-once (explicit ack), and exactly-once (with double-ack protocol)
  • Pull and push consumers: pull for batch processing, push for real-time workloads

The key difference from Kafka: a NATS server can run as a single binary with no external dependencies. A Kafka cluster requires Kafka brokers plus ZooKeeper (or KRaft quorum), monitoring, and enough expertise to operate the cluster safely. NATS embedded mode lets you run the server inside your application process during testing.

Setting Up a Stream

Here’s a minimal JetStream setup in Go, the language NATS tooling works best in:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/nats-io/nats.go"
    "github.com/nats-io/nats.go/jetstream"
)

func main() {
    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    js, err := jetstream.New(nc)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // Create a stream that persists up to 7 days or 1GB
    stream, err := js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
        Name:     "ORDERS",
        Subjects: []string{"orders.>"},
        Retention: jetstream.LimitsPolicy,
        MaxAge:   7 * 24 * time.Hour,
        MaxBytes: 1 * 1024 * 1024 * 1024,
        Storage:  jetstream.FileStorage,
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Stream created: %s\n", stream.CachedInfo().Config.Name)
}

The orders.> subject pattern matches any subject starting with orders. — so orders.created, orders.payment.confirmed, orders.shipped all land in the same stream. This hierarchical subject routing is one of NATS’s genuinely useful design decisions.

Publishing and Consuming with Delivery Guarantees

// Publisher
func publishOrder(js jetstream.JetStream, orderID string, payload []byte) error {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    ack, err := js.Publish(ctx, "orders.created", payload,
        jetstream.WithMsgID(orderID), // Deduplication key
    )
    if err != nil {
        return fmt.Errorf("publish failed: %w", err)
    }

    fmt.Printf("Published seq=%d, stream=%s\n", ack.Sequence, ack.Stream)
    return nil
}

// Consumer (pull-based, good for batch processing)
func startConsumer(js jetstream.JetStream) error {
    ctx := context.Background()

    consumer, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Name:          "order-processor",
        Durable:       "order-processor",
        AckPolicy:     jetstream.AckExplicitPolicy,
        MaxDeliver:    5,
        AckWait:       30 * time.Second,
        FilterSubject: "orders.created",
    })
    if err != nil {
        return err
    }

    for {
        msgs, err := consumer.Fetch(10, jetstream.FetchMaxWait(5*time.Second))
        if err != nil {
            if err == jetstream.ErrTimeout {
                continue // no messages, wait again
            }
            return err
        }

        for msg := range msgs.Messages() {
            if err := processOrder(msg.Data()); err != nil {
                msg.Nak() // Redeliver after AckWait expires
                continue
            }
            msg.Ack()
        }
    }
}

MaxDeliver: 5 means a message that fails 5 times gets moved to a dead-letter stream automatically. AckWait: 30 * time.Second means if your consumer doesn’t ack within 30 seconds, NATS redelivers to another consumer instance. These are sensible defaults for most workloads.

Key-Value and Object Store

JetStream ships with two higher-level APIs built on top of streams: Key-Value and Object Store.

Key-Value is a distributed KV store with history. You can configure how many revisions to keep per key, set TTLs, and watch for changes:

kv, err := js.CreateOrUpdateKeyValue(ctx, jetstream.KeyValueConfig{
    Bucket:  "config",
    History: 10, // keep last 10 versions of each key
    TTL:     24 * time.Hour,
})

// Watch for changes to any config key
watcher, _ := kv.WatchAll(ctx)
defer watcher.Stop()

go func() {
    for entry := range watcher.Updates() {
        if entry == nil {
            continue // initial values delivered, now in watch mode
        }
        fmt.Printf("Config changed: %s = %s\n", entry.Key(), entry.Value())
    }
}()

// Write and read
kv.Put(ctx, "feature.flags.new-checkout", []byte("true"))
entry, _ := kv.Get(ctx, "feature.flags.new-checkout")
fmt.Println(string(entry.Value()))

This replaces Redis or etcd for simple configuration distribution between services, with the NATS server as the single dependency.

NATS vs Kafka: The Honest Comparison

FactorNATS JetStreamApache Kafka
Setup complexitySingle binary, no depsZooKeeper or KRaft + brokers
Throughput ceiling~50M msgs/sec per node~1M msgs/sec per broker
Message retentionConfigurable, up to diskConfigurable, designed for long retention
Consumer modelPush and pullPull only
Exactly-onceDouble-ack protocolExactly-once transactions (complex)
Multi-tenancyAccounts + authACLs (complex at scale)
Cloud-native clusterJetStream clusteringRequires careful partition tuning
Learning curveLowHigh
Good for<10M msgs/day, small teams>100M msgs/day, dedicated infra team

The throughput numbers are rarely the deciding factor. NATS can push 50 million messages per second on decent hardware. Unless you’re running a large-scale event processing pipeline, that ceiling doesn’t matter. What matters is whether you want to maintain a Kafka cluster.

Running NATS in Production

NATS runs as a single binary with a simple config file:

# nats-server.conf
listen: 0.0.0.0:4222
http: 0.0.0.0:8222

jetstream {
  store_dir: /data/nats
  max_memory_store: 1GB
  max_file_store: 20GB
}

cluster {
  name: production
  listen: 0.0.0.0:6222
  routes: [
    nats://nats-1:6222
    nats://nats-2:6222
    nats://nats-3:6222
  ]
}

A three-node cluster provides high availability. Add a monitoring endpoint at port 8222, and you can query server stats, stream info, and consumer lag with plain HTTP. No separate tools needed for basic operational visibility.

For Kubernetes, the NATS Helm chart handles the cluster setup:

helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm install nats nats/nats \
  --set config.jetstream.enabled=true \
  --set config.jetstream.fileStore.enabled=true \
  --set config.cluster.enabled=true \
  --set config.cluster.replicas=3

Where NATS Doesn’t Fit

NATS is not the right choice when:

  • You need message replay across months or years (Kafka’s log compaction handles this better)
  • You’re processing genuinely high volumes — billions of events per day — where Kafka’s partition-based parallelism matters
  • You need Kafka Connect or the Kafka ecosystem of connectors for data pipelines
  • Your organization already runs Kafka well and adding another broker technology creates more cost than it saves

For fan-out patterns where hundreds of consumers subscribe to the same subject, NATS push consumers are very efficient. For high-throughput analytics pipelines that need guaranteed ordering across many partitions, Kafka’s model fits better.

The Migration Path

If you’re replacing an existing system, NATS offers a bridge mode that can subscribe to Kafka topics and republish to NATS subjects. It’s useful for gradual migration rather than a hard cutover.

For most teams starting from scratch — or replacing ad hoc database polling and Redis pub/sub — NATS JetStream is the first thing to reach for. You get persistence, delivery guarantees, consumer groups, dead-letter queues, and a key-value store with a single deployment. That’s a lot of capability for something you can start with brew install nats-server.

The inflection point where Kafka becomes necessary is later than most teams think. If you’re not sure which side of that line you’re on, start with NATS.

Sponsored

Sponsored

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored