[ad_1]

## Prepare yourself for the Data Science interview

## Background

Window functions are very useful for performing data manipulation effectively with a few lines of codes and this is one of the reasons that you will find a question around window function in almost every data science interview.

This is part 1 of the series on SQL window functions. In this blog, we will learn about the fundamentals of SQL window functions and their applications.

**Why window functions are required?**

Imagine that we have the data on the salaries of employees within an organization. The below table shows the data:

Now, suppose we want to add 2 columns to this data table:

- TOTAL: This column contains the total salaries of all the employees. This is equal to the sum of the salary column.
- TOTAL_JOB: This column contains the total salaries of all the employees within the job role corresponding to a row(Data Scientist, Data Analyst, Data Engineer). For example, for the rows with ‘JOB’ as Data Scientist, the TOTAL_JOB column is equal to 12,000 (sum of salaries of Data Scientists: 3800+4900+3300).

Can we add these columns without performing GROUP BY and self-joins?

**Yes, **we easily do this using the **window function.**

**What is the Window function in SQL?**

The window function performs calculations on one or multiple rows of a data table and returns the values to all the rows of the table. Unlike the aggregation functions (using the GROUP BY clause), where the individual rows are ‘lost’, the window functions do not combine the results of multiple rows into a single row and each row retains its original identity.

## Syntax of Window Function

The following is the syntax of the Window function:

`SELECT`

, ,

(expression)OVER

(PARTITION BY

ORDER BY)

FROM

Let’s understand each of the keywords in detail:

**Window function**is the name of the window function we wish to apply, such as sum, mean, row number, etc.**Expression**is the column’s name on which the window function should be applied. Depending on the window function that we are using, this may or may not be required. For example, the row number window function does not require the expression.**OVER**simply indicates that the function is a window function.**PARTITION BY**partitions the rows of the data table, allowing us to define which rows to utilize to compute the window function.**Partition list**is the name of the column(s) by which we want to partition. This is mandatory with the PARTITION BY clause.**ORDER BY**is used to sort the rows within each partition. This is an optional clause.**Order list**is the name of the column(s) to be ordered, it is mandatory with the ORDER BY clause.

## Some Examples

To see the window functions in action, let’s look at a few examples :

**OVER Clause without PARTITION BY**

To add a column(TOTAL) having the sum of salaries of all the employees in our employee table, we will use the sum function as a window function, the salary column as an expression, and the OVER() clause.

As we are finding the sum of salary across all the employees (rows), we don’t need to partition our data.

`## SQL Query`

select EMPID, NAME, JOB, SALARY,

**sum(SALARY) over() as TOTAL**

FROM

employee_table

2.** OVER Clause with PARTITION BY**

Now, to add a column having the total salaries of all the employees within the job role corresponding to a row(Data Scientist, Data Analyst, Data Engineer), we need to partition our data by column JOB.

To get the output, we will use the sum function as a window function, the salary column as an expression, and within the OVER() clause we will partition our data table by the JOB column.

`## SQL Query`

select EMPID, NAME, JOB, SALARY,

**sum(SALARY) over(partition by** **JOB)**

as TOTAL_JOB

FROM

employee_table

## Conclusion

So, we looked at how we can easily add aggregated values to all the rows of a table using the window function in SQL.

We can create columns having the total across all the rows as well as across the partition of rows without losing the original rows of the table.

[ad_2]

Source link