Understanding window functions in PostgreSQL (Part 1)

As fullstack developers, we rarely pay attention to the intricacies of databases (or relational databases, to be more precise).

It’s understandable why it’s so, because we’re crushed by the complexities of the apps we are building, and the pressures of meeting deadlines and whatnot, and there’s hardly ever time to look in this direction.

However, I believe we should make an effort to improve our knowledge of the tools we use, no matter how “useless” that knowledge may seem at the time. After all, no learning goes wasted . . . or at least, completely wasted.

With that in mind, let’s take a look at one of the very important (and very nice!) features found in PostgreSQL (in relational databases, that is): window functions.

Why care about window functions at all?

Now, I’m the type of person who doesn’t like to study something for its sake. I need to know before I start why it’s a good idea to pour time and effort into it. For instance, Domain-Driven Design (DDD) is worthwhile because it makes our app separate from the framework, and allows many advantages, the most tangible being – the ability to support as many output/API types as you want, without having to scramble all over your controllers/views land.

Similarly, we must know what we’re missing without window functions, if we’re to give two hoots about them, let alone learn them inside out.

So, let’s start with a very simple example.

(By the way, the schema and starting examples were shamelessly copied from this tutorial. Credit where it’s due!)

Let’s begin with two tables: products and product_groups, the latter being a tiny table that just separates out the group name:

CREATE TABLE product_groups (
    group_id serial PRIMARY KEY,
    group_name VARCHAR (255) NOT NULL
);

CREATE TABLE products (
    product_id serial PRIMARY KEY,
    product_name VARCHAR (255) NOT NULL,
    price DECIMAL (11, 2),
    group_id INT NOT NULL,
    FOREIGN KEY (group_id) REFERENCES product_groups (group_id)
);

Yes, perhaps the group_name was better off stuffed into the products table, but hey, let’s not debate schema design right now! 🙂

Now, I ask you a relatively simple question: can you list all the product groups that we have along with the average price for each of them?

Depending on how ORM-heavy you’ve become over time, this question will either make you laugh or make you laugh nervously. But even if you’ve lost muscle memory for joins and whatnot, you might hazily remember that aggregations (group by) might be the way to go here. And indeed, that is so:

SELECT
    group_name,
    AVG (price)
FROM
    products
INNER JOIN product_groups USING (group_id)
GROUP BY
    group_name;

The database is nice enough to perform the groupings for us, and we get the following output:

group_name avg
Smartphone 500
Tablet 350
Laptop 850

So far so good, but now we have a slightly different need. We need to send this report to business folks, so we need to make this more presentable. A very nice idea is to show a list of all products, their price, as well as which group they belong to and the average price of that group. Yes, programming-wise, there’s a lot of repetition in such data, but when opened in a spreadsheet, it will look very nice and digestible for humans.

If you didn’t know any better, you’d think just sticking the names of these columns in the previous query would do the trick:

SELECT
    product_name,
    product_price,
    group_name,
    AVG (price)
FROM
    products
INNER JOIN product_groups USING (group_id)
GROUP BY
    group_name;

Except that this gives you a nasty error: Column 'product_name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. I won’t launch into the full explanation, but it’s plain to see that GROUP BY only knows how to “condense” the input set down to the groups we asked it to create.

So, how can we achieve the report we want? If you’ve only done joins, aggregates, filters, and other everyday stuff, you might be feeling the panic already. Yes, it doesn’t seem possible by using the stuff we find in basic tutorials.

Window functions to the rescue!

What are window functions in PostgreSQL?

After going through the last section, we’ve already gained some insight into the nature of window functions. Put simply, a window function is able to segregate the given rows into groups, but is also able to access all the rows in a group.

Look at the following query:

SELECT
    product_name,
    price,
    group_name,
    AVG (price) OVER (
       PARTITION BY group_name
    )
FROM
    products
    INNER JOIN
        product_groups USING (group_id);

The PARTITION BY clause is the same as GROUP BY, except that you can think of it as running through the rows again after having computed the groups and their average. Ignoring the JOIN for a moment (which we had to introduce simply because of how the tables are related), this version of the query reads more natural and produces straightforward results as we want:

HP Elite    1200.00 Laptop  850
Lenovo Thinkpad 700.00  Laptop  850
Sony VAIO   700.00  Laptop  850
Dell Vostro 800.00  Laptop  850
Microsoft Lumia 200.00  Smartphone  500
HTC One 400.00  Smartphone  500
Nexus   500.00  Smartphone  500
iPhone  900.00  Smartphone  500
iPad    700.00  Tablet  350
Kindle Fire 150.00  Tablet  350
Samsung Galaxy Tab  200.00  Tablet  350

There are many more interesting window functions that solve a number of different, difficult problems when it comes to querying. I highly recommend looking up the docs, but even if you don’t, just remember that something like this exists that can make weird queries very easy. This way, you can dump all the hard work onto the DB and not have to do programmatic weight-lifting!

I hope there was some motivation and usefulness for you here. If your interest in databases and what we can do with them has risen by even 1%, you’ve made my day! 🥳