Pandas makes sorting simple. Let’s review the basics and 2 cool pythonic features.
Sorting values is a very common but important task when working with Excel. It helps in making sense of the data as well as visualizing it.
How can we do this with Pandas dataframes?
This blog post shows different ways. And once you’ve mastered the fundamentals, you can do cool things like add a custom function or combine it with groupby.
1. The Dataset
We will work today with the IMDB movies dataset. You can download it from Kaggle here if you want to follow along.
Let’s load it and see what it looks like:
Time to sort data!
2. Sorting rows on one column and controlling the ascending order
You must use the sort_values method and specify which column you want to sort by. Let’s start with the column released year.
The Released Year column is now sorted, and we can see that the first movie from 1920 is in ascending order. If we want it in descending order, we use the ascending=False attribute.
Apart from the Apollo film, which has ‘GP’ as the year, we can see that it is now sorted in descending order.
3. Sorting rows on 2 columns or more
You only need to add the columns in a list within the sort_values method.
Keep in mind that the order of the columns in the list is important when sorting the dataframe. Let’s see what happens if we start with ‘IMDB Rating.’
The IMDB rating is sorted by descending order first.
By passing a list of True or False values, we can specify the ascending order for each of the columns.
The ‘IMDB_Rating’ column is sorted in ascending order, and the ‘Genre’ for each rating is sorted in descending order.
4. Sorting missing values
You can use the attribute na position=’first’ to sort the missing values in a column first. Let’s take a look at the Gross column.
5. Cool Feature: add a custom sort function
One cool feature of sorted values and Pandas is that we can add a custom function for sorting.
Assume we want to sort the dataframe in ascending order by the director’s name. How can we do that?
We must split the director’s first and last names and then sort using only the last name. That can be inserted into a function with the key attribute:
6. Combine sort_values with another method like groupby
The magic happens when you combine sort_values with groupby for example.
Let’s suppose we are asked to show the average IMDB rating for each director and to sort it in descending order. Perfect opportunity to chain groupby and sort_values together:
Pandas library has great different ways to sort values by rows or columns. My advice is to practice it on different datasets to get familiar with it. It is very powerful when you can combine it with other methods like groupby.