Why window functions may come in handy

Let us dive directly into a real world example. You’ve got a table with salaries of different job functions and you are asked to find out what the n-tiles (quantiles, mean etc.) are of each function within a company. So this means you’ve got many people that hold a position which you want to compare with others. You want to compare salaries of marketing professionals to their peers, but you don’t want to compare said salaries to marketing directors because they have their own salary bands. So that means you need to frame (also called partition) job titles with their own with which you calculate the n-tiles. Without window functions, this could be a quite fussy task.

In order to walk through the problem, create a new database, I’m going to name mine FoehnWind [^1]. The following script creates a new table and inserts some sample data of 20 rows into it.

Tutorial

USE [FoehnWind];
GO
CREATE TABLE dbo.Employee
(ID           TINYINT IDENTITY(1, 1) NOT NULL, 
 FirstName    VARCHAR(150) NOT NULL, 
 LastName     VARCHAR(150) NULL, 
 JobTitle     VARCHAR(100) NOT NULL, 
 AnnualSalary INT NOT NULL, 
 CONSTRAINT [PK_dbo.Employee] PRIMARY KEY CLUSTERED(ID ASC)
)
ON [PRIMARY];
GO
INSERT INTO [dbo].[Employee] 
VALUES
 ( 'Ken','Sanchez','Chief Executive Officer', 530000)
,( 'Jo','Berry','Marketing Assistant', 71875 )
,( 'Lori','Penor','Marketing Assistant', 73398 )
,( 'Pat','Coleman','Marketing Assistant', 51874 )
,( 'Stuart','Macrae','Marketing Assistant', 59108 )
,( 'Mary','Dempsey','Marketing Assistant', 57641 )
,( 'Wanida','Benshoof','Marketing Assistant', 60779 )
,( 'David','Bradley','Marketing Manager', 130000 )
,( 'Jill','Williams','Marketing Specialist', 108117 )
,( 'John','Wood','Marketing Specialist', 99407 )
,( 'Sariya','Pit','Marketing Specialist', 95048 )
,( 'Terry','Eminhizer','Marketing Specialist', 106316 )
,( 'Michael','Sullivan','Marketing Specialist', 87047 )
,( 'Ovidiu','Cracium','Marketing Specialist', 107883 )
,( 'Rob','Walters','Marketing Specialist', 109369 )
,( 'Garrett','Young','Senior Marketing Specialist', 118051 )
,( 'George','Li','Senior Marketing Specialist', 117253 )
,( 'Russell','King','Senior Marketing Specialist', 123362 )
,( 'Sandra','Alayo','Senior Marketing Specialist', 118185 )
,( 'Denise','Smith','Vice President Marketing', 250000 );
GO

Firstly, lets use a window function which calculates the averages and percentiles of salaries for each job title:

SELECT distinct e.JobTitle
        ,AVG(e.AnnualSalary) OVER (PARTITION BY e.JobTitle) AS AverageSalary
        ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianContinous
        ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianDisContinous
FROM dbo.Employee e
order by e.JobTitle

You perhaps now wondered how the functions PERCENTILE_CONT() and PERCENTILE_DISC() differ. The discontinous variant returns an actual value that exists in the calculated set and pointing out that this values is the closest one being to the calculative median. The continous percentile in contrast returns a value that is actually the calculated median. For example, Marketing Assistant Stuart Macrae has an Annual Salary of 59’108. This amount is also the median in this job role, but the calculated median is 59’943.5 so Stuart is slightly below the median but his salary is the closest one to the median.

JobTitle

EmployeeName

AnnualSalary

AverageSalary

MedianContinous

MedianDisContinous

Chief Executive Officer

Ken Sanchez

530000

530000

530000

530000

Marketing Assistant

Pat Coleman

51874

62445

59943.5

59108

Marketing Assistant

Mary Dempsey

57641

62445

59943.5

59108

Marketing Assistant

Stuart Macrae

59108

62445

59943.5

59108

Marketing Assistant

Wanida Benshoof

60779

62445

59943.5

59108

Marketing Assistant

Jo Berry

71875

62445

59943.5

59108

Marketing Assistant

Lori Penor

73398

62445

59943.5

59108

Marketing Manager

David Bradley

130000

130000

130000

130000

Marketing Specialist

Michael Sullivan

87047

101883

106316

106316

Marketing Specialist

Sariya Pit

95048

101883

106316

106316

Marketing Specialist

John Wood

99407

101883

106316

106316

Marketing Specialist

Terry Eminhizer

106316

101883

106316

106316

Marketing Specialist

Ovidiu Cracium

107883

101883

106316

106316

Marketing Specialist

Jill Williams

108117

101883

106316

106316

Marketing Specialist

Rob Walters

109369

101883

106316

106316

Senior Marketing Specialist

George Li

117253

119212

118118

118051

Senior Marketing Specialist

Garrett Young

118051

119212

118118

118051

Senior Marketing Specialist

Sandra Alayo

118185

119212

118118

118051

Senior Marketing Specialist

Russell King

123362

119212

118118

118051

Vice President Marketing

Denise Smith

250000

250000

250000

250000

Further reading

This was just an example of what window functions are handy for. More window functions are documented here. Ranked functions are used quite often to find out what happened over time, e.g. if a status of a product changed. If you want to dig even deeper into this topic, I recommend (no affiliation) the book T-SQL Window Functions: For data analysis and beyond (2nd Edition) (Developer Reference) by Itzik Ben-Gan. He's authored some excellent books with great knowledge.

[^1]: Fohen wind is a fairly common wind in the European Alps, which is a reference to the popular sample database Nordwind.