Skip to content

Web Development · Database

SQL Window Functions: The Analytics Queries Your ORM Won't Write for You

Running totals, rankings, moving averages, lag and lead comparisons: window functions handle the analytics queries that would otherwise require application-side loops or multiple round-trips. Here's how they work.

Anurag Verma

Anurag Verma

8 min read

SQL Window Functions: The Analytics Queries Your ORM Won't Write for You

Sponsored

Share

There’s a category of analytics query that most developers solve the wrong way: fetch a bunch of rows, then loop over them in application code to compute rankings, running totals, or period-over-period comparisons. The query is simple. The application logic is messy. The performance is worse than it needs to be.

Window functions exist precisely for this case. They let you compute values across a set of rows related to the current row, without collapsing the result into a single aggregate. You get both the row’s original columns and the computed value in the same query.

What Window Functions Are (and Aren’t)

A regular aggregate function like SUM() or COUNT() collapses a group of rows into one. GROUP BY department, SUM(salary) gives you one row per department.

A window function computes across a group but keeps every row. You can see each employee’s salary alongside the sum for their department:

SELECT
    name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;

Output:

name       | department | salary | dept_total
-----------|------------|--------|----------
Alice      | Eng        | 120000 | 350000
Bob        | Eng        | 115000 | 350000
Carol      | Eng        | 115000 | 350000
Dave       | Sales      | 95000  | 190000
Eve        | Sales      | 95000  | 190000

Each row keeps its own data. The dept_total is computed over the window of rows in the same department.

The OVER Clause

The OVER clause defines the window (which rows the function looks at). It has three optional parts:

function() OVER (
    PARTITION BY <columns>   -- divide rows into groups
    ORDER BY <columns>       -- order within each group
    ROWS BETWEEN ...         -- which rows in the group to include
)

An empty OVER () means the window is the entire result set.

PARTITION BY

PARTITION BY divides rows into groups, similar to GROUP BY, but without collapsing. Each partition is an independent window.

-- Rank employees by salary within each department
SELECT
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

Without PARTITION BY, the ranking would be global across all employees. With it, each department gets its own rank sequence starting at 1.

Ranking Functions

Three ranking functions handle the same problem differently when there are ties:

SELECT
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,   -- unique, no gaps
    RANK()       OVER (ORDER BY score DESC) AS rank,       -- ties share rank, gap after
    DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank  -- ties share rank, no gap
FROM leaderboard;
name   | score | row_num | rank | dense_rank
-------|-------|---------|------|----------
Alice  | 95    | 1       | 1    | 1
Bob    | 90    | 2       | 2    | 2
Carol  | 90    | 3       | 2    | 2
Dave   | 85    | 4       | 4    | 3

RANK() skips 3 after the tie at position 2. DENSE_RANK() does not. ROW_NUMBER() gives each row a unique number regardless of ties.

Use ROW_NUMBER() when you need exactly one row per position (like picking the top result per category). Use RANK() when ties should be visible and gaps are acceptable. Use DENSE_RANK() when ties should be visible but you don’t want the gaps.

Running Totals

SUM() with ORDER BY in the OVER clause computes a cumulative sum:

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

By default, ORDER BY in a window function uses ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means it sums from the first row in the partition up to and including the current row.

To compute a running total per customer:

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS customer_running_total
FROM orders;

Moving Averages

The ROWS BETWEEN clause controls which rows in the window the function sees. For a 7-day moving average:

SELECT
    date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_metrics;

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes the current row plus the 6 rows before it (a 7-row window). For dates with fewer than 7 preceding rows, PostgreSQL uses whatever rows are available, so the first few values are averages over smaller windows.

If you need a centered moving average (3 rows before and 3 after):

AVG(value) OVER (
    ORDER BY date
    ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
)

LAG and LEAD

LAG() and LEAD() let you access values from other rows without a self-join. This is the right tool for period-over-period comparisons.

SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,
    ROUND(
        100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month))
        / NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0),
        1
    ) AS mom_pct_change
FROM monthly_revenue
ORDER BY month;

The second argument to LAG() is the offset (default 1). LAG(revenue, 3) returns the value from 3 rows earlier. LEAD() works the same way but looks forward.

NULLIF(..., 0) prevents division by zero when the previous month had zero revenue.

For per-group comparisons (e.g., month-over-month per product):

SELECT
    product_id,
    month,
    revenue,
    LAG(revenue) OVER (PARTITION BY product_id ORDER BY month) AS prev_month
FROM product_revenue;

FIRST_VALUE and LAST_VALUE

FIRST_VALUE() and LAST_VALUE() return the first or last value in the window:

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS dept_max_salary,
    salary / FIRST_VALUE(salary) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    )::numeric AS pct_of_max
FROM employees;

A common trap: LAST_VALUE() with the default window frame returns the current row’s value, not the last row in the partition. That’s because the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. To get the true last value:

LAST_VALUE(salary) OVER (
    PARTITION BY department
    ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

NTILE

NTILE(n) divides rows into n equal-sized buckets and assigns a bucket number to each row. Useful for percentile groupings:

SELECT
    customer_id,
    total_spend,
    NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM customer_totals;

Customers in quartile 1 are the top 25% spenders. This is cleaner than computing percentile boundaries manually.

Named Windows

If you’re applying multiple window functions with the same frame, name the window to avoid repetition:

SELECT
    date,
    value,
    SUM(value)    OVER w AS running_sum,
    AVG(value)    OVER w AS running_avg,
    MIN(value)    OVER w AS running_min,
    MAX(value)    OVER w AS running_max
FROM metrics
WINDOW w AS (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ORDER BY date;

Performance Considerations

Window functions run after WHERE, GROUP BY, and HAVING filtering. The window operates on already-filtered rows, which is often what you want.

They can be expensive because they often require sorting. If your window has an ORDER BY on a column without an index, PostgreSQL will sort the entire result in memory (or to disk if it’s large). An index on the window’s ORDER BY column can eliminate that sort.

For large tables, filtering before the window function matters:

-- Run the window on only the last 90 days, not the full table
WITH recent AS (
    SELECT * FROM events WHERE event_date >= NOW() - INTERVAL '90 days'
)
SELECT
    event_date,
    count,
    SUM(count) OVER (ORDER BY event_date) AS running_total
FROM recent;

Window functions can’t be used directly in WHERE or HAVING clauses because they run after those filters. Wrap them in a subquery or CTE:

-- Find employees ranked 1-5 in each department
WITH ranked AS (
    SELECT
        name,
        department,
        salary,
        RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
    FROM employees
)
SELECT * FROM ranked WHERE dept_rank <= 5;

What This Replaces

Before window functions became widely understood, the same logic required self-joins or application-side loops. Running total without window functions:

-- The old way: O(n²) self-join
SELECT
    a.order_date,
    a.amount,
    SUM(b.amount) AS running_total
FROM orders a
JOIN orders b ON b.order_date <= a.order_date
GROUP BY a.order_date, a.amount;

With a window function, PostgreSQL sorts once and passes through the rows linearly. The self-join approach scales quadratically with row count.

The first time you replace a 50-line Python loop computing period-over-period comparisons with a single SQL query using LAG(), the pattern becomes intuitive quickly. Window functions show up constantly in analytics work and are supported by every major database: PostgreSQL, MySQL 8+, SQLite 3.25+, BigQuery, Snowflake, and DuckDB.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored