ClickHouse in 2026: Analytical Queries on Billions of Rows Without the Pain

Most web applications run on Postgres or MySQL, and those databases handle transactional workloads well. User registration, order creation, session tracking: these are row-oriented operations, and row-oriented databases handle them efficiently.

Analytical queries are a different problem. “How many users from Germany made a purchase in the last 90 days, broken down by referral source?” requires scanning millions of rows to aggregate a handful of columns. Postgres will answer that query, but as your data grows past a few hundred million rows, it starts taking minutes. ClickHouse is built specifically for this class of query, and it answers them in seconds or less. Not by being clever about indexes, but by storing and processing data in a fundamentally different way.

How Columnar Storage Changes the Performance Equation

Row-oriented databases store each row’s data contiguously on disk. When you query one column from a million rows, the database reads all the row data and discards what it doesn’t need.

Columnar databases store each column’s data contiguously. A query that reads three columns out of a hundred scans only three columns’ worth of data. For analytical queries that aggregate a small number of columns across a large number of rows, this is the difference between reading 2% of your data and reading 100% of it.

ClickHouse adds compression on top of columnar storage. Because a single column contains data of the same type (all integers, all timestamps, all strings with similar patterns), it compresses extremely well. A table that takes 500 GB in Postgres might compress to 50 GB in ClickHouse, which means the same query reads far less data from disk.

The result: ClickHouse benchmarks consistently show it handling queries over billions of rows in under a second. The ClickHouse team publishes a public benchmark at benchmark.clickhouse.com that compares it against other analytical databases with real hardware and reproducible methodology.

When to Consider ClickHouse

ClickHouse is worth evaluating when any of these are true:

Your analytics queries take more than a few seconds and the data volume keeps growing
You’re running nightly batch jobs to summarize data because live queries are too slow
You’ve added Postgres read replicas or materialized views specifically to handle analytical traffic
You want to offer real-time dashboards to clients or users and Postgres is the bottleneck
You’re storing events, logs, or time-series data and querying it gets slower every month

ClickHouse is not a general-purpose database. It’s not a replacement for Postgres. You still use Postgres for your application’s transactional data. ClickHouse is where you send data that needs to be queried analytically at scale.

Setting Up ClickHouse

The fastest way to get a ClickHouse instance is ClickHouse Cloud, which offers a free trial. For local development, Docker works:

docker run -d \
  --name clickhouse \
  -p 8123:8123 \
  -p 9000:9000 \
  clickhouse/clickhouse-server:latest

Port 8123 is the HTTP interface. Port 9000 is the native binary protocol. You can query via HTTP immediately:

curl http://localhost:8123 \
  --data-binary "SELECT version()"
# 24.x.x.x

Or using the CLI client:

docker exec -it clickhouse clickhouse-client

The Table Engine Matters

ClickHouse has multiple table engines. The one you’ll use for almost everything is MergeTree and its variants.

CREATE TABLE events
(
    event_id    UUID,
    user_id     UInt64,
    event_type  LowCardinality(String),
    properties  String,
    country     LowCardinality(String),
    created_at  DateTime64(3)  -- millisecond precision
)
ENGINE = MergeTree()
ORDER BY (created_at, user_id)
PARTITION BY toYYYYMM(created_at);

A few things worth noting here:

ORDER BY defines the physical sort order on disk. ClickHouse uses this to skip data during queries. If you frequently filter by event_type, adding it to the ORDER BY (or creating a secondary index) speeds up those queries significantly.

LowCardinality(String) is for columns with limited unique values (under ~10,000). It uses dictionary encoding internally, which improves compression and query speed.

PARTITION BY splits the table into monthly partitions. Queries that filter by month only touch the relevant partition, not the entire table.

Practical Query Examples

Aggregate queries are where ClickHouse shines:

-- Daily active users over the last 30 days
SELECT
    toDate(created_at)  AS day,
    uniq(user_id)       AS dau
FROM events
WHERE created_at >= now() - INTERVAL 30 DAY
  AND event_type = 'session_start'
GROUP BY day
ORDER BY day;

-- Funnel: users who completed each step in order within 24 hours
SELECT
    countIf(has(steps, 'signup'))           AS step_1,
    countIf(has(steps, 'onboarding_done'))  AS step_2,
    countIf(has(steps, 'first_purchase'))   AS step_3
FROM (
    SELECT
        user_id,
        groupArray(event_type) AS steps
    FROM events
    WHERE created_at >= today() - 30
    GROUP BY user_id
);

-- Cohort retention: week-0 vs week-1 retention
SELECT
    cohort_week,
    countIf(activity_week = cohort_week)      AS week_0,
    countIf(activity_week = cohort_week + 1)  AS week_1,
    round(week_1 / week_0 * 100, 1)           AS retention_pct
FROM (
    SELECT
        user_id,
        toMonday(min(created_at)) AS cohort_week
    FROM events
    WHERE event_type = 'signup'
    GROUP BY user_id
) cohorts
JOIN (
    SELECT user_id, toMonday(created_at) AS activity_week
    FROM events
    GROUP BY user_id, activity_week
) activity USING (user_id)
GROUP BY cohort_week
ORDER BY cohort_week;

These queries run on billions of rows in ClickHouse in seconds. The syntax is close enough to standard SQL that most developers don’t need to learn much before writing useful queries.

Inserting Data

ClickHouse is optimized for bulk inserts, not single-row inserts. The recommended pattern is to batch inserts.

From your application, buffer events and flush in batches:

from clickhouse_driver import Client
from datetime import datetime
import uuid

client = Client(host='localhost')

# Buffer and flush
event_buffer = []

def record_event(user_id, event_type, properties, country):
    event_buffer.append({
        'event_id': str(uuid.uuid4()),
        'user_id': user_id,
        'event_type': event_type,
        'properties': json.dumps(properties),
        'country': country,
        'created_at': datetime.utcnow(),
    })
    
    if len(event_buffer) >= 1000:
        flush_events()

def flush_events():
    if not event_buffer:
        return
    client.execute(
        'INSERT INTO events VALUES',
        event_buffer,
        types_check=True
    )
    event_buffer.clear()

For higher throughput, use a queue (Kafka, SQS) and write a consumer that inserts to ClickHouse in batches. This decouples your application from the database and makes the insertion pipeline more resilient.

Directly from object storage, ClickHouse can read files from S3 or GCS:

INSERT INTO events
SELECT *
FROM s3(
    's3://your-bucket/events/2026-06-07/*.parquet',
    'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY',
    'Parquet'
);

This is useful for backfilling historical data or loading from a data lake.

Replicating from Postgres

The most common architecture: Postgres handles your OLTP workload, ClickHouse handles analytics. You keep them in sync via change data capture (CDC).

PeerDB is an open-source CDC tool built specifically for Postgres-to-ClickHouse replication. It sets up in about 15 minutes:

docker compose -f peerdb-compose.yaml up -d

# Then in the PeerDB UI or API:
# 1. Create a Postgres source peer
# 2. Create a ClickHouse destination peer
# 3. Create a mirror for the tables you want replicated

Debezium is another option with broader database support if you need to replicate from other sources too.

With CDC in place, writes to Postgres appear in ClickHouse within seconds. Your application keeps writing to Postgres normally. Your analytics dashboards query ClickHouse.

ClickHouse Cloud vs Self-Hosted

	ClickHouse Cloud	Self-hosted
Setup time	Minutes	Hours to days
Scaling	Automatic	Manual
Cost at low scale	$50-200/month for small clusters	Cheaper if you’re already running servers
Cost at high scale	Scales with usage	Fixed unless you over-provision
Operational burden	Low	High
Data residency	Choose region	Full control

For most product teams evaluating ClickHouse, start with ClickHouse Cloud. If the queries and architecture work, you have real data for a self-hosting decision later.

What ClickHouse Is Not Good For

High-frequency single-row reads. Looking up a user by ID is slower in ClickHouse than Postgres. It’s not a key-value store.
Frequent updates and deletes. ClickHouse supports mutations, but they’re expensive. It’s designed for append-heavy workloads.
ACID transactions across tables. ClickHouse is not a transactional database. Use Postgres for that.
Small datasets. If your analytics tables have millions of rows rather than billions, Postgres with a good query plan is fast enough and simpler to operate.

The decision point is roughly: if your slowest analytics queries are taking 5-10 seconds or more and the data is still growing, ClickHouse is worth the setup cost. Below that threshold, Postgres with better indexes or a materialized view is usually the simpler answer.

ClickHouse in 2026: Analytical Queries on Billions of Rows Without the Pain

How Columnar Storage Changes the Performance Equation

When to Consider ClickHouse

Setting Up ClickHouse

The Table Engine Matters

Practical Query Examples

Inserting Data

Replicating from Postgres

ClickHouse Cloud vs Self-Hosted

What ClickHouse Is Not Good For

How Agencies Build Contractor Networks: Adding Capacity Without the Headcount

dbt in 2026: SQL-First Data Transformation That Actually Scales

More from Cloud & Infrastructure

Cloudflare R2 vs AWS S3 in 2026: The Storage Decision for Developer Teams

Error Tracking in 2026: What Sentry Catches That Your Logs Don't

Vendor Lock-in in 2026: What It Actually Costs and When to Stop Worrying About It

Join the conversation.