Publish AI, ML & data-science insights to a global community of data professionals.

SQL Window Functions

A Love Hate Relationship

Tutorial

I love you even when I hate you – Henry Leo Soohyun

I’ve long had a love-hate relationship with SQL Window functions. I loved them when they worked, and I hated them when they didn’t. I would spend hours arguing and battling with them until either I walked away in frustration or through brute force got them to work. These efforts were inconsistent and perplexing. So much power, yet so difficult to harness. After hours of coffee and perhaps a few unkind words, I realized I misunderstood the core elements driving the SQL Window Function algorithm.

The Basics

If you’ve worked with Big Data, chances are you’ve run across window functions in Spark, Pandas, Tableau, MySQL, Postgres and Snowflake to name a few. Window functions can be categorized into one of three types as shown in the table below.

In this tutorial, I focus on the core syntax of a window function, namely the OVER(), PARTITION BY and ORDER BY clauses. These elements are the drivers behind most window functions, so if you are struggling with how to apply a window function, it likely has to do with not understanding how the widow function algorithm executes these elements. Understanding how these work will open the door to implementing all window function types. In this tutorial, I don’t cover the frame clause. It is a big topic that deserves its own tutorial. Also, its implementation varies across SQL platforms. So, I suggest reading the documentation of the platform.

The start to understanding any window function is the syntax. The image below shows the basic syntax of a window function, along with a description of each of the important elements.

You will learn more about these in the tutorial below. In the meantime, I suggest making a copy of the syntax for future reference. It can serve as quick refresher as you work with more and more window functions.

Window Function Algorithm

The algorithm steps used to process a window function vary depending on the window function, partitioning and ordering by clauses. While a deeper dive into algorithm processing is beyond this tutorial, I’ve found it helpful to think of the algorithm as starting with the OVER() clause followed by the PARTITION BY and lastly ORDER BY.

Our SQL Window Function

The tutorial explores the RANK(sales_price) OVER(PARITITION BY city ORDERY BY sold_date [frame clause]) function. The RANK() window function determines the rank of a value in a group of values. The table shows the dataset.

The rank_sales_price column is populated using sold_date, city and street_name as inputs to the RANK()window function. You may have noticed the rank_sales_price column is blank. Not to worry. We will walk through the steps needed to rank the sales price.

Our Data

The data in our example comes from data I created and populated to a SQL Server table. When learning SQL window functions, I encourage you to build your own examples in a spreadsheet and then import them into your own SQL table. The alternative is to use an existing data set and import it into a SQL table. The SQL query below shows all the columns function in my data tables along with the Window functions I created.

Besides the RANK() window function, the SELECT statement can support additional window functions. This a powerful feature for testing different PARTITION BY and ORDER BY clauses. The OVER() clause enables this feature.

OVER () Clause: The Chief Architect

The OVER() clause is like an architect who designs a home. The home design typically comprises a bedroom, bathroom, kitchen and several other rooms; see the floor plan below. Similarly, for each window the OVER() clause architects a separate execution space for the PARTITION BY and ORDER BY clauses.

Each room in a house has a different function. The bedroom is for sleep while the kitchen is for cooking. With Window Types, each function has a different purpose. The AVG window function returns the average (arithmetic mean) while the SUM window function returns the sum of the input column.

The more window functions there are in a SELECT statement, the more architects, OVER clauses, there are creating execution spaces. That is the power and beauty of the OVER() clause, it eliminates the need to create multiple queries. A single SELECT statement can return multiple results.

Design Considerations

When looking at window functions, it’s helpful to think of the PARTITION BY clause as the grouper and the ORDER BY clause as the organizer of the data in the columns. As the grouper, the PARTITION BY groups like values in the column of data. As the organizer, the ORDER BY clause determines how the aggregate ranked or analytic window function applies to row data. It is tempting to overlook the ORDER BY clause because of its optional default setting, but that would be a mistake. The ORDER BY clause is important and understanding it will save you hours of frustration. We’ll dig into it a little later on, but first let’s look at the PARTITION BY clause in more detail.

PARTITION BY Clause

When working with the PARTITION BY clause, it’s helpful to remember a couple of rules:

  • The PARTITION BY clause processes columns from left to right.
  • The PARTITION BY clause groups like values and split on differences.

I have found it useful to think of the PARTITION BY clause as scanning the column and grouping the same values before moving to the next column in the PARTITION. Let’s look at the image below for an explanation of how it works.

In step 1, the processing starts with the left-most column, City. In the city column there are three distinct values: Essex, Green Acres and Blue Ocean. The algorithm groups together rows with the same values and splits on the differences. The outcome is three results sets in Step 2. The processing continues in the next column, Street Name. The window algorithm continues grouping and splitting on the differences. In Step 2, the data with Essex as the city does not split. No difference exists the row values. Main street is in both rows. However, a difference in the Street Name causes the data to split for the cities of Green Acres and Blue Ocean. The image shows the results of the grouping and splitting Step 3.

ORDER BY Clause

I think the ORDER BY clause has an image problem when writing SELECT statement queries. You can go months without needing to use one, so that makes it easy to overlook when working with SQL Window Functions – an advanced topic. However, when writing window functions, the ORDER BY clause is a powerful tool in applying the results of the window function. It’s sorting property (ascending- descending) controls how the window function is applied to a row of data.

When working with the ORDER BY clause, it’s helpful to remember a couple of rules:

  • The ORDER BY clause processes columns from left to right.
  • In the ORDER BY clause, columns process independent of the columns in the PARTITION BY clause. Columns used in the PARTITION BY clause do not have to be used in the ORDER BY clause.
  • The sort property of the ORDER BY clause determines how the window function is applied to row data.
  • It is common to find integer and date data types columns in the ORDER BY clause. Date columns might be start date, end date, etc. While integer columns might be price, size, or distance. The window function algorithm does not limit the ORDER BY column to these values or data types. These are intended as helpful examples to understand usage.

I’ve added the Date Sold and the Rank columns to the data below. The Date Sold column contains the date data used in the ORDER BY clause. The Rank column has a blank value. The sort property of the ORDER BY clause sets the value in the Rank column.

There are two paths, ascending or descending, used to set the value in the Rank column.

  • Path 1: If I set the Date Sold column in the ORDER BY clause to ascending (asc), it assigns the RANK value of 1to the first record, 1/18/2017.
  • Path 2: If I set the Date Sold column in the ORDER BY clause to descending (desc), it assigns the RANK value of 1 to the last record, 1/20/2019.

So, the window function is applied to the record by the asc or desc sort property of the ORDER BY clause.

Conclusion

Knowing how the Window algorithm processes the OVER(), PARTITION BY and ORDER BY clauses is key to implementing SQL window functions. The OVER clause creates the execution space for the PARTITION BY and ORDER BY clauses. The PARTITION BY clause groups like values and splits on differences. The sort property of the ORDER BY clause determines how the window function is applied to the records in the partition.

Inspiration: Don’t despise the small beginnings. With each lesson shared, I remember a tiny room with my mom, a record player, and alphabet chart. She used simple tools to teach two little black boys to read, write and think. The beginnings may be humble, but the dreams can be big. Keep sharing and inspiring others to be greater than they could have imagined.


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles