Prometheus and Grafana: Production Monitoring Without the Complexity Tax

When a production service starts behaving badly, you need to know three things: is it down, is it slow, and is it broken for specific users or everyone? Logs tell you what happened. Metrics tell you what’s happening now and whether it’s trending worse. Prometheus and Grafana are the standard open-source stack for the metrics side, and they’re worth understanding even if you eventually graduate to a managed observability product.

The problem with most Prometheus tutorials is that they start with configuration files and end before you have a dashboard that would actually help you debug a real incident. This post starts from the other direction: what do you need to see, then how do you get there.

The Four Metrics That Matter First

Before writing any instrumentation code, decide what you’re measuring. The USE method (Utilization, Saturation, Errors) and the RED method (Rate, Errors, Duration) cover most production situations for web services:

Request rate: How many requests per second are hitting each endpoint. Unusual drops signal problems upstream; unusual spikes signal load events or traffic anomalies.

Error rate: What percentage of requests are returning 5xx. A climbing error rate is almost always the first signal of a real problem.

Request duration: p50 (median), p95, and p99 latency. Median tells you the typical user experience. p99 tells you how bad it gets for the slowest requests.

Resource utilization: CPU, memory, and for services with persistent connections (databases, queues), connection pool usage.

Everything else can wait until you have these four. Dashboards that try to show everything end up showing nothing useful.

Setting Up Prometheus

Prometheus runs as a server that periodically scrapes /metrics endpoints from your services. Your services expose metrics; Prometheus pulls them on a configured interval (default: 15 seconds).

The minimal Docker Compose setup for local development:

version: '3.8'
services:
  prometheus:
    image: prom/prometheus:v2.53.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=15d'

  grafana:
    image: grafana/grafana:11.1.0
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "your-password"
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  prometheus_data:
  grafana_data:

The Prometheus configuration file tells it where to scrape:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'api-server'
    static_configs:
      - targets: ['api:8000']  # Your service's host:port
    metrics_path: '/metrics'

Instrumenting a Python/FastAPI Service

For Python services, the prometheus_client library does the heavy lifting. Here’s a production-ready instrumentation setup for FastAPI:

from fastapi import FastAPI, Request
from prometheus_client import (
    Counter, Histogram, Gauge, generate_latest, CONTENT_TYPE_LATEST
)
from starlette.responses import Response
import time

app = FastAPI()

# Define your metrics
REQUEST_COUNT = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status_code']
)

REQUEST_DURATION = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration in seconds',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

ACTIVE_REQUESTS = Gauge(
    'http_active_requests',
    'Number of HTTP requests currently being processed'
)

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    # Skip metrics endpoint itself
    if request.url.path == "/metrics":
        return await call_next(request)

    ACTIVE_REQUESTS.inc()
    start_time = time.time()

    response = await call_next(request)

    duration = time.time() - start_time
    endpoint = request.url.path
    method = request.method
    status = str(response.status_code)

    REQUEST_COUNT.labels(method=method, endpoint=endpoint, status_code=status).inc()
    REQUEST_DURATION.labels(method=method, endpoint=endpoint).observe(duration)
    ACTIVE_REQUESTS.dec()

    return response

@app.get("/metrics")
async def metrics():
    return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)

For database query metrics, add a similar pattern around your database calls:

DB_QUERY_DURATION = Histogram(
    'db_query_duration_seconds',
    'Database query duration in seconds',
    ['query_type', 'table']
)

DB_ERRORS = Counter(
    'db_errors_total',
    'Database errors',
    ['query_type', 'error_type']
)

async def execute_query(query_type: str, table: str, coro):
    with DB_QUERY_DURATION.labels(query_type=query_type, table=table).time():
        try:
            return await coro
        except Exception as e:
            DB_ERRORS.labels(
                query_type=query_type,
                error_type=type(e).__name__
            ).inc()
            raise

Instrumenting a Node.js/Express Service

The prom-client library handles Node.js instrumentation:

const client = require('prom-client');
const express = require('express');

const app = express();

// Collect default Node.js metrics (memory, event loop lag, etc.)
client.collectDefaultMetrics({ prefix: 'node_' });

const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0],
});

const httpRequestTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
});

// Middleware
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    const labels = {
      method: req.method,
      route: req.route?.path ?? req.path,
      status_code: res.statusCode,
    };
    end(labels);
    httpRequestTotal.inc(labels);
  });
  next();
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

The Grafana Dashboard

Once Prometheus is collecting metrics, add it as a data source in Grafana: Data Sources > Add data source > Prometheus, set the URL to http://prometheus:9090.

The four panels every service dashboard needs:

Request rate (requests per second):

rate(http_requests_total[5m])

Split by endpoint and status code. A sudden drop in 200s with a rise in 500s is an incident in progress.

Error rate (percentage of requests that fail):

sum(rate(http_requests_total{status_code=~"5.."}[5m])) 
/ 
sum(rate(http_requests_total[5m]))

Set an alert threshold at 1-5% depending on your service’s normal baseline.

Request duration percentiles:

histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)

Plot p50, p95, and p99 together. p99 degradation often precedes p95 degradation, giving you earlier warning.

Active connections / resource usage:

http_active_requests

For database connection pools: query the pool’s own metrics. Most database drivers expose pool size, in-use connections, and wait time.

Alerting

Prometheus AlertManager handles alerting. A minimal rules file:

# alerts.yml
groups:
  - name: api
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status_code=~"5.."}[5m])) 
          / sum(rate(http_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }}"

      - alert: SlowResponseTime
        expr: |
          histogram_quantile(0.95, 
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
          ) > 2.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p95 latency above 2 seconds"

      - alert: ServiceDown
        expr: up{job="api-server"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is down"

Wire AlertManager to PagerDuty, Slack, or email in its own config. Grafana also has its own alerting system that integrates with the same visualization — useful if you want alerts triggered from the same queries you’re already viewing.

In Kubernetes

For Kubernetes deployments, the standard approach is the kube-prometheus-stack Helm chart, which bundles Prometheus, Grafana, AlertManager, and a set of pre-built dashboards for cluster metrics:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=your-password

Service discovery is automatic via Kubernetes annotations. Add these to your service’s pod spec:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8000"
  prometheus.io/path: "/metrics"

Prometheus will automatically discover and scrape any pod with these annotations.

What to Skip

Most observability sprawl starts with too many metrics. Things not worth tracking until you have a specific reason:

Per-user metrics at the Prometheus layer (use your application database for this)
Metrics for endpoints that receive less than one request per hour (the overhead-to-signal ratio is too low)
Duplicate metrics across different label dimensions (pick one canonical view)
Cardinality explosions from user IDs, order IDs, or other high-cardinality labels in Prometheus labels

Prometheus stores time-series data, and cardinality matters. Each unique combination of labels is a separate series. If you put a user ID in a label, a service with 100,000 users creates 100,000 series for a single metric. This kills Prometheus performance.

The practical baseline: rate, errors, and duration with labels for method, endpoint, and status_code. For most services, that’s 50-200 series. That’s manageable and informative. Add from there based on actual debugging needs, not speculation.

Prometheus and Grafana: Production Monitoring Without the Complexity Tax

The Four Metrics That Matter First

Setting Up Prometheus

Instrumenting a Python/FastAPI Service

Instrumenting a Node.js/Express Service

The Grafana Dashboard

Alerting

In Kubernetes

What to Skip

NATS JetStream in Production: When Kafka Is Too Much

Python asyncio in Production: The Pitfalls No One Warns You About

More from Cloud & Infrastructure

NATS JetStream in Production: When Kafka Is Too Much

Dev Containers: Reproducible Development Environments in 2026

Docker Image Optimization in 2026: Multi-Stage Builds and the Sizes That Actually Matter

Join the conversation.