Understanding window functions in PostgreSQL (Part 2 - DENSE_RANK())
We continue our study (and enjoyment!) of PostgreSQL window functions from last time. And in case you didn’t read that one or don’t wish to, here’s a quick motivation for using window functions: they help us write “difficult” queries that would otherwise be inconceivable in basic SQL features!
Find the third-most costly product in a particular category
I don’t know about you, but a requirement like the above makes my heart freeze. You can feel your brain overloaded, whirring through options in a futile attempt to reconcile its basic SQL knowledge into something that could answer the above. But once again, window functions (this time
DENSE_RANK()) make the job a whole lot easier.
Before jumping the gun, let’s make sure we understand the schema correctly:
CREATE TABLE product_groups ( group_id serial PRIMARY KEY, group_name VARCHAR (255) NOT NULL ); CREATE TABLE products ( product_id serial PRIMARY KEY, product_name VARCHAR (255) NOT NULL, price DECIMAL (11, 2), group_id INT NOT NULL, FOREIGN KEY (group_id) REFERENCES product_groups (group_id) );
We have here a typical
products table with just one tiny twist: the category (group) the product belongs to is stored in the
product_groups table. This wasn’t strictly needed; it’s just how one of the examples I came across, was arranged. That said, having the information split up like this allows us to apply the window function in the context of a
JOIN – which is good learning!
Now, before I introduce the query, I want to explain the window function we’ll be using to solve this problem –
DENSE_RANK(). If you can digest the fact that window functions can help you magically create groups (partitions) for a query like the one we’re after,
DENSE_RANK() is easy to explain: it works on the rows in a group and assigns them a number (for ranking) based on their order in that group. (Yes, yes, an example is coming; just hold on for a minute!)
A related function is
RANK() and its contrast to
DENSE_RANK() is best explained through an example – assume we have the following order of products and their prices in a particular category:
Laptop-1 2000 Phone-1 1800 Laptop-2 1800 Laptop-3 1500
RANK() would rank them as follows:
Laptop-1 2000 1 Phone-1 1800 2 Laptop-2 1800 2 Laptop-3 1500 4
DENSE_RANK() would rank them like this:
Laptop-1 2000 1 Phone-1 1800 2 Laptop-2 1800 2 Laptop-3 1500 3
See the difference? Both
DENSE_RANK() assign a rank of 2 to the second-highest price in the group, but
RANK() skips as many ranks for the next entry change as there were duplications (so, we go 1, 2, 2, 4) while
DENSE_RANK() preserves a strict numeric order (1, 2, 2, 3 – 3 comes after 2 no matter what). Both have their uses, though in our case we need
DENSE_RANK() as it fits our definition of finding the third-most costly product in each category.
DENSE_RANK() example, with a Common Table Expression (CTE) thrown in
Let’s look at the query that’ll solve the given problem: find the third-most costly product in a given category:
WITH rank_data AS ( SELECT product_name, price, group_name, DENSE_RANK() OVER ( PARTITION BY group_name ORDER BY price ) price_rank FROM products INNER JOIN product_groups USING (group_id) ) SELECT * FROM rank_data WHERE group_name = 'Laptop' AND price_rank = 3;
Now, let’s not lose our minds! The query is pretty simple; it just has a few parts to it. So, let’s look at these parts one by one:
- The final part of the query is just a regular
- The part
WITH rank_data AS ...is just called a Common Table Expression (CTE) in Postgres. For developers, it’s easier to think of this as a variable declaration. This name (
rank_data) gets used in the last part of the query (
FROM rank_data) to pull data from.
- There is a single
JOINstatement in the inner query. This is needed because of how our tables are structured.
- Now for the meat of the query:
DENSE_RANK() OVER (PARTITION BY group_name ORDER BY price) price_rank. This creates another column in the query output and fills it with an integer indicating the rank of the row. Rank based on what? Well, the query is doing a sort of
GROUP BY, except that it’s called a partition; while the partition can be thought of as a group, the windowing feature allows the query engine to access individual rows while having a “window” to the partition the row is in. We ourselves don’t need to think about how it’s done internally, just as we don’t bother how
GROUP BYis working internally. It’s enough to know that the query is able to create groups on
group_name(product category) and use the price to rank them (
ORDER BY price).
Yes, that’s all there is to it! Granted that the syntax can throw you off initially, but don’t you agree how simple and powerful these window functions are?! Notice that many of the clauses in this query have nothing to do with window functions:
WHERE, etc., but I’ve used them to create a more meaningful example.
Go ahead and practice a little by making your own similar scenarios. Or perhaps search online and look for some examples that extend the idea into different scenarios. One thing is sure: you’ll now easily understand the sorcery involved and will enjoy it as well!