Cloud & Infrastructure · Observability
Prometheus and Grafana: Production Monitoring Without the Complexity Tax
A practical guide to setting up metrics collection with Prometheus and visualization with Grafana for backend services — what to instrument, what to skip, and what the dashboards should actually show.
Anurag Verma
7 min read
Sponsored
When a production service starts behaving badly, you need to know three things: is it down, is it slow, and is it broken for specific users or everyone? Logs tell you what happened. Metrics tell you what’s happening now and whether it’s trending worse. Prometheus and Grafana are the standard open-source stack for the metrics side, and they’re worth understanding even if you eventually graduate to a managed observability product.
The problem with most Prometheus tutorials is that they start with configuration files and end before you have a dashboard that would actually help you debug a real incident. This post starts from the other direction: what do you need to see, then how do you get there.
The Four Metrics That Matter First
Before writing any instrumentation code, decide what you’re measuring. The USE method (Utilization, Saturation, Errors) and the RED method (Rate, Errors, Duration) cover most production situations for web services:
Request rate: How many requests per second are hitting each endpoint. Unusual drops signal problems upstream; unusual spikes signal load events or traffic anomalies.
Error rate: What percentage of requests are returning 5xx. A climbing error rate is almost always the first signal of a real problem.
Request duration: p50 (median), p95, and p99 latency. Median tells you the typical user experience. p99 tells you how bad it gets for the slowest requests.
Resource utilization: CPU, memory, and for services with persistent connections (databases, queues), connection pool usage.
Everything else can wait until you have these four. Dashboards that try to show everything end up showing nothing useful.
Setting Up Prometheus
Prometheus runs as a server that periodically scrapes /metrics endpoints from your services. Your services expose metrics; Prometheus pulls them on a configured interval (default: 15 seconds).
The minimal Docker Compose setup for local development:
version: '3.8'
services:
prometheus:
image: prom/prometheus:v2.53.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=15d'
grafana:
image: grafana/grafana:11.1.0
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: "your-password"
volumes:
- grafana_data:/var/lib/grafana
volumes:
prometheus_data:
grafana_data:
The Prometheus configuration file tells it where to scrape:
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'api-server'
static_configs:
- targets: ['api:8000'] # Your service's host:port
metrics_path: '/metrics'
Instrumenting a Python/FastAPI Service
For Python services, the prometheus_client library does the heavy lifting. Here’s a production-ready instrumentation setup for FastAPI:
from fastapi import FastAPI, Request
from prometheus_client import (
Counter, Histogram, Gauge, generate_latest, CONTENT_TYPE_LATEST
)
from starlette.responses import Response
import time
app = FastAPI()
# Define your metrics
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status_code']
)
REQUEST_DURATION = Histogram(
'http_request_duration_seconds',
'HTTP request duration in seconds',
['method', 'endpoint'],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)
ACTIVE_REQUESTS = Gauge(
'http_active_requests',
'Number of HTTP requests currently being processed'
)
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
# Skip metrics endpoint itself
if request.url.path == "/metrics":
return await call_next(request)
ACTIVE_REQUESTS.inc()
start_time = time.time()
response = await call_next(request)
duration = time.time() - start_time
endpoint = request.url.path
method = request.method
status = str(response.status_code)
REQUEST_COUNT.labels(method=method, endpoint=endpoint, status_code=status).inc()
REQUEST_DURATION.labels(method=method, endpoint=endpoint).observe(duration)
ACTIVE_REQUESTS.dec()
return response
@app.get("/metrics")
async def metrics():
return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)
For database query metrics, add a similar pattern around your database calls:
DB_QUERY_DURATION = Histogram(
'db_query_duration_seconds',
'Database query duration in seconds',
['query_type', 'table']
)
DB_ERRORS = Counter(
'db_errors_total',
'Database errors',
['query_type', 'error_type']
)
async def execute_query(query_type: str, table: str, coro):
with DB_QUERY_DURATION.labels(query_type=query_type, table=table).time():
try:
return await coro
except Exception as e:
DB_ERRORS.labels(
query_type=query_type,
error_type=type(e).__name__
).inc()
raise
Instrumenting a Node.js/Express Service
The prom-client library handles Node.js instrumentation:
const client = require('prom-client');
const express = require('express');
const app = express();
// Collect default Node.js metrics (memory, event loop lag, etc.)
client.collectDefaultMetrics({ prefix: 'node_' });
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0],
});
const httpRequestTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
});
// Middleware
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on('finish', () => {
const labels = {
method: req.method,
route: req.route?.path ?? req.path,
status_code: res.statusCode,
};
end(labels);
httpRequestTotal.inc(labels);
});
next();
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});
The Grafana Dashboard
Once Prometheus is collecting metrics, add it as a data source in Grafana: Data Sources > Add data source > Prometheus, set the URL to http://prometheus:9090.
The four panels every service dashboard needs:
Request rate (requests per second):
rate(http_requests_total[5m])
Split by endpoint and status code. A sudden drop in 200s with a rise in 500s is an incident in progress.
Error rate (percentage of requests that fail):
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
Set an alert threshold at 1-5% depending on your service’s normal baseline.
Request duration percentiles:
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)
Plot p50, p95, and p99 together. p99 degradation often precedes p95 degradation, giving you earlier warning.
Active connections / resource usage:
http_active_requests
For database connection pools: query the pool’s own metrics. Most database drivers expose pool size, in-use connections, and wait time.
Alerting
Prometheus AlertManager handles alerting. A minimal rules file:
# alerts.yml
groups:
- name: api
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate on {{ $labels.job }}"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: SlowResponseTime
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
) > 2.0
for: 5m
labels:
severity: warning
annotations:
summary: "p95 latency above 2 seconds"
- alert: ServiceDown
expr: up{job="api-server"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
Wire AlertManager to PagerDuty, Slack, or email in its own config. Grafana also has its own alerting system that integrates with the same visualization — useful if you want alerts triggered from the same queries you’re already viewing.
In Kubernetes
For Kubernetes deployments, the standard approach is the kube-prometheus-stack Helm chart, which bundles Prometheus, Grafana, AlertManager, and a set of pre-built dashboards for cluster metrics:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=your-password
Service discovery is automatic via Kubernetes annotations. Add these to your service’s pod spec:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
Prometheus will automatically discover and scrape any pod with these annotations.
What to Skip
Most observability sprawl starts with too many metrics. Things not worth tracking until you have a specific reason:
- Per-user metrics at the Prometheus layer (use your application database for this)
- Metrics for endpoints that receive less than one request per hour (the overhead-to-signal ratio is too low)
- Duplicate metrics across different label dimensions (pick one canonical view)
- Cardinality explosions from user IDs, order IDs, or other high-cardinality labels in Prometheus labels
Prometheus stores time-series data, and cardinality matters. Each unique combination of labels is a separate series. If you put a user ID in a label, a service with 100,000 users creates 100,000 series for a single metric. This kills Prometheus performance.
The practical baseline: rate, errors, and duration with labels for method, endpoint, and status_code. For most services, that’s 50-200 series. That’s manageable and informative. Add from there based on actual debugging needs, not speculation.
Sponsored
More from this category
More from Cloud & Infrastructure
NATS JetStream in Production: When Kafka Is Too Much
Dev Containers: Reproducible Development Environments in 2026
Docker Image Optimization in 2026: Multi-Stage Builds and the Sizes That Actually Matter
Sponsored
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored