Technically, the CASE expression in SQL evaluates the conditions mentioned in the WHEN clause. If the condition is True, the code mentioned in the THEN clause is executed and if the condition is False, the code mentioned in the ELSE part is executed.
Before starting with the examples, let’s quickly refresh the expression
CASE..WHEN..THEN and ways to write it. ♻️
The most frequently used form of this statement is writing multiple
WHEN..THEN clauses under a single
For example, let’s create a new column — Sales Manager Initials — by putting conditions on the Sales Manager column as below.
WHEN Sales_Manager = 'Pablo' THEN 'P'
WHEN Sales_Manager = 'Kristen' THEN 'K'
END AS Sales_Manager_Initials
📍 Note: The ELSE clause in the above code is optional. If you don’t mention it, then all the places where conditions in both the WHEN clauses are False will be filled with
Another way, when the condition being tested is only on one column, just like in the above example, SQL offers you a simplified way of writing
CASE..WHEN expression as below.
WHEN 'Pablo' THEN 'P'
WHEN 'Kristen' THEN 'K'
END AS Sales_Manager_Initials
So, you can get exactly the same output without mentioning the column name in each
WHEN clause. Simply write the keyword
CASE followed by the column name on which you want to apply the condition.
I didn’t find any execution time or efficiency-related differences between these two methods. So, it is an individual decision on which method to follow.
Now, as you got insights into how to write CASE-WHEN expression, let’s jump into the practical use cases.
Creating new columns based on existing ones is one of the most commonly done data transformations. And mostly you need to imply conditions on the existing columns.
As you noticed in the above example, you can conditionally test the values in a single column and create a new column based on that. But, the WHEN clause gives you the flexibility to add conditions on multiple columns as well. ✅
For example, suppose for all the orders where the Sales Manager is Pablo and the Product Category is Office, you want to add 50 in the Quantity. At the same time, for all other combinations of Sales Manager and Product Category, you want to keep the Quantity column unchanged.
Ideally, you need conditions on two columns as —
Sales_Manager = ‘Pablo’,
Product_Category = ‘Office’. Moreover, these both conditions must be True to get the new values for Quantity. Hence, you need to join these conditions using logical AND.
WHEN Sales_Manager = 'Pablo'
AND Product_Category = 'Office'
END AS New_Quantity
As you can see two conditions in the WHEN clause are joined with the keyword AND. Only when both the conditions are True, the code in the THEN clause is executed i.e. Quantity+50, as you can see in the above picture.
ELSE Quantity keeps the Quantity column unchanged for all other combinations of Sales Manager and Product Category.
Moving ahead, often you need to split the data into different categories for further analysis. You can do this in the data extraction phase itself using CASE the expression in SQL.
As you will be using if-else logic to split the records into categories,
CASE..WHEN expression is the perfect tool in this application.
Data can be split into multiple categories based on other numerical as well as non-numerical columns.
As opposed to the GROUP BY clause, which groups the rows into different groups, CASE-WHEN simply creates a new column and mentions a group name to which each row of the dataset belongs. ✅
For example, you have the quantity for each order in the sales data. Suppose you want to categorize it as ‘High’, ‘Low’, or ‘Medium’ Order Volume based on column Quantity. You can use CASE-WHEN statement as below —
WHEN Quantity > 65 THEN 'High'
WHEN Quantity < 15 THEN 'Low'
END AS Order_Volume
All you need to do is mention the conditions, based on which you want to split the dataset rows into different groups.
You can also include multiple conditions in the
WHEN clause as you did in the previous example. And such a type of categorization also works, when you have multiple conditions on columns from different tables.
For example, suppose you have two tables Orders and Shipments as below.
As I didn’t have this tables ready, I used CTE to create them as below —
WITH Orders AS
You can learn more about using CTE here.
Now, suppose you want to categorize the records from the Orders table into ‘High Priority’, ‘Medium Priority’, and ‘Low Priority’ based on the values in the columns Quantity and Shipping Cost.
This might look complex, but pretty straight-forward with
CASE..WHEN expression as below.
WHEN O.Quantity > 50
OR S.Shipping_Cost > 30
THEN 'High Priority'
WHEN O.Quantity < 15
OR S.Shipping_Cost < 10
THEN 'Low Priority'
ELSE 'Medium Priority'
END AS Order_Type
FROM Orders O
LEFT JOIN Shipments S
ON O.OrderID = S.OrderID
You need to use the
JOIN clause and column aliases. The rest part is the same simple
CASE..WHEN expression. You can see in the above picture, that a new column Order_Type is created with the priority values for each OrderID.
Moving ahead, sometimes your dataset contains different values of the same meaning which makes the data inconsistent and non-uniform.
The simplest example is — the gender column in certain datasets can contain values such as ‘W’, ‘F’, ‘Woman’, and ‘Female’, and all of them indicate the gender is ‘Female’. Then why we should have different terms to indicate the same meaning?
It is always better to standardize the dataset to make it more consistent.
The next use case demonstrates this in detail.
Data standardization is used to make data more consistent i.e. ensuring that each column has the same type of content and a single term or value indicating a single meaning.
CASE-WHEN statement is very useful to transform inconsistent or non-uniform data into a more structured and uniform one. Using this SQL functionality, you can always create a new column with standardized values and use it for further analysis.
Let’s see this with an example.
Suppose, in this dataset you want to get the order status as either Completed or Not Completed i.e. standardizing the status to these two values. However, you can see the status column contains the values such as — Delivered, Not Delivered, Not Shipped, and Shipped.
The CASE WHEN statement can be used in this scenario as below
WHEN Status = 'Delivered' THEN 'Completed'
ELSE 'Not Completed'
END as Order_Status
As you see in the above picture, the Order_Status contains only two values. If the Status is Delivered, then only the Order_Status is Completed and in all other cases, it is Not Completed.
Well, I agree, this was a pretty weak example, but once you understand the use case, you can use it in complex scenarios.
Also, data analytics is incomplete without data aggregations. In my last article, 5 Pandas Group By Tricks You Should Know in Python you read how to use aggregate functions with GROUP BY in SQL.
How about using aggregate functions on conditional selection of data??
Yes, this is possible in SQL & there is always an “easy way” using CASE.
This is one of the complicated use-cases of CASE statements and using aggregate functions with it can be sometimes tricky.
Therefore, let’s try to understand it with an example.
Suppose, you want to get the total number of orders each Sales Manager handled which shows the status Delivered and Not Delivered.
Ideally, you need to count the orders which belong to a Sales Manager and a Status combination. But with a CASE statement, you don’t need to type each and every combination. You can simply use an aggregate function
COUNT as below.
COUNT(CASE WHEN Status = 'Delivered' THEN OrderID ELSE NULL END) AS Delivered_Orders,
COUNT(CASE WHEN Status = 'Not Delivered' THEN OrderID ELSE NULL END) AS NonDelivered_Orders
GROUP BY Sales_Manager;
You should mention the
CASE..WHEN..THEN within the aggregate function, as above. Here, in the column Delivered_Orders, you are counting all the orders
WHEN Status is Delivered to get the final output as below.
Simple it is!
You can also get the same results using another aggregate function —
SUM(). Only you need to change the CASE-WHEN statement slightly as below.
SUM(CASE WHEN Status = 'Delivered' THEN 1 ELSE 0 END) AS Delivered_Orders,
SUM(CASE WHEN Status = 'Not Delivered' THEN 1 ELSE 0 END) AS NonDelivered_Orders
GROUP BY Sales_Manager;
In the above CASE statement, you are actually creating an intermediate column (invisible) of
0 based on the Status and ultimately doing an addition of this column’s values to get the required output.
In a similar fashion, you can use other aggregate functions as well.
So far, you witness CASE usage within the SELECT statement. However, it can be used in the ORDER BY clause as well.
In SQL, the ORDER BY clause is used to arrange the output dataset in ascending or descending order. Using
ORDER BY offers you more flexibility and allows you to sort data other than only standard ways.
For example, suppose you want to arrange all the ‘Delivered’ orders in ascending order of unit price if they are shipped to the UK and ascending order of delivery time if they are shipped to any other country.
Using the standard ORDER BY clause, this seems an impossible task.
But using CASE with ORDER BY, your life is much easier. 💯
Let’s understand it using the below query.
WHERE Status LIKE 'Delivered'
AND Shipping_Address IN ('UK', 'India')
AND Delivery_Time <19
AND UnitPrice BETWEEN 140 AND 150
ORDER BY CASE
WHEN Shipping_Address = 'UK' THEN UnitPrice
📍 Note: the conditions mentioned in the WHERE clause are absolutely not required to get the results. I used them only to make the results more visible.
As mentioned in the above query, you can specify your conditions to sort the dataset in
ORDER BY clause using
CASE..WHEN expression to get the output as below.
So when Shipping_Address is the UK, all the orders will be arranged in the ascending order or UnitPrice. And for any other value in Shipping_Address all the orders are arranged in ascending order of Delivery_Time as seen above.
Similarly, to get all the records in descending order you can simply use the keyword — DESC — after the END keyword in the above query.
That’s it! 🏆