Secrets Management in Production: The Patterns That Actually Work

The most common way a production system gets compromised isn’t a sophisticated exploit. It’s a developer pushing a .env file to a public GitHub repo, or a Docker image built with environment variables baked into a layer, or a Kubernetes secret stored in plaintext YAML that ended up in version control.

Secrets management is the kind of problem that feels solved right up until it isn’t. This post is a ground-level look at what actually works at different scales, which tools fit which situations, and the specific mistakes that keep showing up in post-mortems.

The Spectrum of Bad to Acceptable

Before picking a tool, it helps to be clear about what you’re protecting against. The threat model for secrets isn’t primarily “hackers brute-forcing your password.” The actual risks are:

Secrets landing in version control (git history is forever)
Secrets baked into container images (layers are often publicly accessible)
Secrets visible in process lists (ps aux shows environment variables on some systems)
Secrets leaking through logs (rotating logging middleware that captures request headers)
Overly broad access (one compromised service gets all the secrets)

Most teams go through the same progression:

Stage 1: .env files, never committed. Fine for local development. Shared via Slack or 1Password. Fragile when secrets rotate.

Stage 2: CI/CD environment variables (GitHub Actions secrets, GitLab CI variables). Better than .env in repos, but no audit trail, no rotation, no per-environment granularity.

Stage 3: A dedicated secrets manager. This is where production should be.

AWS Secrets Manager

If you’re already on AWS, Secrets Manager is the path of least resistance. It integrates with IAM for access control, which means you can grant an ECS task or Lambda function access to a specific secret without creating service account credentials.

import boto3
import json

def get_database_url() -> str:
    client = boto3.client("secretsmanager", region_name="us-east-1")
    response = client.get_secret_value(SecretId="prod/myapp/database")
    secret = json.loads(response["SecretString"])
    return f"postgresql://{secret['username']}:{secret['password']}@{secret['host']}/{secret['db']}"

The IAM policy for the service consuming this secret:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/myapp/database-*"
    }
  ]
}

That * at the end matters: Secrets Manager appends a random suffix to secret ARNs, and without the wildcard, the policy breaks on rotation.

Built-in rotation is the headline feature. For RDS databases, Secrets Manager can rotate credentials automatically, updating both the secret and the database user in one atomic operation. For other services, you write a Lambda function that handles the rotation logic.

The downsides are real. At $0.40 per secret per month plus $0.05 per 10,000 API calls, costs add up for applications with dozens of secrets accessed at high frequency. The recommendation: cache secrets in memory for a reasonable TTL (15-60 minutes for most credentials) and only re-fetch near expiration or on authentication failure.

import time
from functools import lru_cache

_secret_cache: dict[str, tuple[dict, float]] = {}
SECRET_TTL = 900  # 15 minutes

def get_secret(secret_id: str) -> dict:
    cached = _secret_cache.get(secret_id)
    if cached and time.time() - cached[1] < SECRET_TTL:
        return cached[0]
    
    client = boto3.client("secretsmanager")
    response = client.get_secret_value(SecretId=secret_id)
    value = json.loads(response["SecretString"])
    _secret_cache[secret_id] = (value, time.time())
    return value

HashiCorp Vault

Vault is the more capable option, with more operational overhead to match. It runs as a server (or cluster), handles its own authentication backends, stores secrets with fine-grained ACL policies, and provides dynamic secrets: credentials generated on-demand that expire automatically.

Dynamic database credentials are particularly useful:

# Vault generates a unique PostgreSQL username/password valid for 1 hour
vault read database/creds/my-role
# Key                Value
# ---                -----
# lease_id           database/creds/my-role/abc123
# username           v-token-myuser-abc123
# password           A1b2-C3d4-E5f6

Each service gets its own short-lived credentials. When a service is compromised, the damage is limited to what those credentials can access, and they expire automatically. This is a meaningful security improvement over long-lived static credentials.

The tradeoff: running Vault requires managing a highly available server cluster, handling unsealing, backup, and monitoring. Vault Cloud (HashiCorp’s managed offering) reduces the ops burden but adds cost.

For teams that need multi-cloud secrets or have strict compliance requirements (PCI DSS, HIPAA), Vault is often the right answer despite the complexity. For AWS-only shops, Secrets Manager usually wins on simplicity.

Infisical and Doppler: Developer-Friendly Alternatives

A newer class of secrets managers targets the developer experience gap that both Vault and Secrets Manager have. The pitch: secrets that sync to your local .env, staging, and production environments from one dashboard, with an audit log and PR-style change approval.

Doppler is the most polished commercial option. The CLI syncs secrets to your local environment:

doppler run -- node server.js
# Your app sees all secrets as environment variables, same as in production

GitHub Actions integration is built-in, so secrets flow through your deployment pipeline without manual copying.

Infisical is the open-source alternative, self-hostable if you don’t want to trust a SaaS provider with your credentials. The feature parity with Doppler is high, and the self-hosted option matters for regulated industries.

These tools shine specifically at solving the “how do I give a new developer the right secrets on day one without sharing a .env via Slack” problem. They’re less suited for dynamic secrets or fine-grained service-to-service access control.

Kubernetes: Where Secrets Go Wrong

Kubernetes has a Secret resource type that stores base64-encoded values. Base64 is not encryption. A default Kubernetes installation stores secrets in etcd in plaintext. Getting this wrong is common.

The minimum viable improvement is enabling etcd encryption at rest:

# kube-apiserver configuration
--encryption-provider-config=/etc/kubernetes/encryption-config.yaml

# encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: <base64-encoded-32-byte-key>
      - identity: {}

A better pattern: don’t store production secrets in Kubernetes Secrets at all. Use the External Secrets Operator to pull from AWS Secrets Manager or Vault, creating Kubernetes Secrets dynamically at runtime. The actual values never live in your cluster manifests or version control.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: database-credentials
  data:
    - secretKey: DATABASE_URL
      remoteRef:
        key: prod/myapp/database
        property: url

The Patterns That Prevent Incidents

Regardless of which tool you use, the practices that actually prevent secrets from leaking:

Never build secrets into container images. The build process should not have access to production credentials. Secrets are injected at runtime via environment variables or mounted volumes. Run docker history <image> to confirm your layers don’t contain secret values.

Rotate regularly, automate where possible. Any secret that hasn’t been rotated in over a year is probably stale in multiple places. Start with database passwords (rotate quarterly) and API keys for external services.

Audit access logs. Every major secrets manager produces logs of who (or what service) accessed which secret when. Review these. Anomalous access patterns (a service reading credentials it never needed before) are often the first indicator of a compromise.

Least privilege per service. A background job that sends email shouldn’t have access to the database master password. Grant only the secrets each service needs, named to reflect the scope: prod/emailworker/sendgrid not prod/sendgrid.

Add secret detection to your CI pipeline. Tools like gitleaks or trufflehog scan commits for patterns matching API keys, connection strings, and private keys. Run them as a pre-receive git hook or a required CI check. Finding a committed secret before it merges is dramatically cheaper than rotating it after.

# .github/workflows/security.yml snippet
- name: Scan for secrets
  uses: gitleaks/gitleaks-action@v2
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The investment in a proper secrets manager pays off the first time you need to rotate a compromised credential under pressure. Rotation that should take five minutes because everything pulls from one central location is a different experience from hunting through ten repos and four deployment pipelines to find every place a key was hardcoded.

Start with whatever fits your current stack. Move up the complexity ladder only when the simpler tool creates problems you can see. For most teams starting fresh in 2026: Infisical if you want open-source and self-hosted, Doppler if you want managed simplicity, AWS Secrets Manager if you’re AWS-native and need IAM integration.

Secrets Management in Production: The Patterns That Actually Work

The Spectrum of Bad to Acceptable

AWS Secrets Manager

HashiCorp Vault

Infisical and Doppler: Developer-Friendly Alternatives

Kubernetes: Where Secrets Go Wrong

The Patterns That Prevent Incidents

Python 3.14: What's New and What Actually Matters

shadcn/ui in 2026: The Component Library That Refuses to Be a Dependency

More from Cloud & Infrastructure

Database Connection Pooling in 2026: PgBouncer, Supabase, and Prisma Accelerate

Incident Response for Small Engineering Teams: SRE Without a Dedicated Ops Team

Service Mesh in 2026: Do You Actually Need Istio, Linkerd, or Cilium?

Working notes from
the studio.

Join the conversation.

The Spectrum of Bad to Acceptable

AWS Secrets Manager

HashiCorp Vault

Infisical and Doppler: Developer-Friendly Alternatives

Kubernetes: Where Secrets Go Wrong

The Patterns That Prevent Incidents

Python 3.14: What's New and What Actually Matters

shadcn/ui in 2026: The Component Library That Refuses to Be a Dependency

More from Cloud & Infrastructure

Database Connection Pooling in 2026: PgBouncer, Supabase, and Prisma Accelerate

Incident Response for Small Engineering Teams: SRE Without a Dedicated Ops Team

Service Mesh in 2026: Do You Actually Need Istio, Linkerd, or Cilium?

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.