Explore the dataset
I am taking the Co2 dataset from Kaggle. Data is present everywhere, just download the data and get going.
This is the basic project where you will analyze Co2 emissions around the world. You will know which are the top countries emitting Co2 and the different sources that contributed to Co2 emission.
You can download the dataset from here. I am running the commands in Jupyter notebook. You can use Google collab or Kaggle notebook itself to run.
First, we will import the libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
Read the data:
dataset = pd.read_csv('/Users/maryam/Downloads/owid-co2-dataa.csv')
I have given the path where my file was located. You read the dataset based on your file location.
View the data:
dataset.head(): It will show only the first 5 rows of the dataset. If you want to see more rows then just add numbers in the parenthesis. For example, dataset.head(10)dataset.shape: It will show the number of rows and columns of dataset
So, our dataset has 25204 rows and 58 columns.
Delete the columns:
Suppose you want to delete some columns. We will delete 3 columns out of 58.
Suppose now, you want to work only on a few columns out of these 55 columns then instead of dropping so many columns just specify the column on which you want to work.
Specifying columns that we will work on:
Suppose, if you want to work on data after the year 1995 only, then we will run the below query:
Now, if you want to clean the data and work on your dataset only for a few countries then we will run the below query:
To check null values:
As we can see few null values but we will be using only a few columns. So, I am keeping the null values as it is because I can’t just add any number to those columns.
We will draw the graph based on our dataset and analyze some results. First, we will draw the pie chart based on the country and Co2 column and see which country has the highest Co2 emission.
As we can see China is the topmost followed by the United States and then India and so on.
If we analyze further based on sources of Co2 and look for the year 2020 only then we will have the below result:
As we can see China has the most Co2 emission through coal. And we also can notice that three main Co2 sources are coal_co2, oil_co2, and gas_co2. So, If look for only three main sources then:
If we analyze further for the United States only then:
This shows that in the United States coal_co2 and oil_co2 got decreased over time but gas_co2 has increased over the years.
If we look at Co2 emissions for the years 2000 and 2020 and see if the emission decreased or not and we found out.
If we analyze the top three countries. We can say that Co2 emissions in the:
United States has improved from 34% in 2000 to 18.6% in 2020.
China has increased its Co2 emission from 19.4% in 2000 to 41.7% in 2020.
India has increased too from 5.54% in 2000 to 9.54% in 2020.
The above are some of the visualizations for the Co2 dataset. As global warming is a great issue and every country needs to work on it together.