[ad_1]

Natural resources play an essential role in living, but natural hazards such as tropical cyclones, earthquakes, tsunamis, landslides, and drought in different areas hinder the sustainable development of humans and living things. An increase in global warming increases the probability of climate-related disasters like heat waves, drought, and wildfire.

Wildfire is an unpredictable, unplanned, and uncontrolled fire that can be caused by various human activities such as coal seams, campfire(bonfire), smoking, Debris and Arson burning, unauthorized burning, firework, shooting, exploding targets or various natural phenomena such as lightning, meteor, volcanic activities, high temperature that can burn the natural vegetation like the forest, savannah, and other natural ecosystem. In this study forest fire information dataset from NASA is taken for the experiment

## Data Source

The data source used in this study is obtained from NASA’s Fire Information for Resource Management System (FIRMS) satellite data and observations of wildfire cases in India 2021 from NASA’s MODIS (Moderate Resolution Imaging Spectroradiometer) instrument.

## Reading and Understanding the Data

Import NumPy and Pandas and read the MODIS dataset of forest fire

# Supress Warningsimport warnings

import numpy as np

warnings.filterwarnings('ignore')

import pandas as pddf = pd.read_csv("modis_2021_India.csv")# Check the head of the dataset

df.head()

**Dataset Features and Description**

- latitude
- longitude
- brightness = brightness tempertaure of fire in pixel measured in Kelvin
- scan = scan pixel size
- track = track pixel size
- acq_date = acquisition date
- acq_time = acquisition time
- satellite = Acua and Terra satellite as A and T
- instrument =MODIS
- confidence = Collection of intermediate algorithm quantities used in the detection process and estimate range between 0% -100%
- version = Version identifies the collection and data processing source
- bright_t31 = brightness temperature of fire from Channel 31 that is measured in Kelvin
- frp =Fire Radiative Power in megawatts unit
- daynight = D and N as Day and Night
- type = 0 as presumed vegetation, 1 as active volcano , 2 as static land source 3 as detection for offshore

df.shape(111267, 15)df.info()df.columnsIndex(['latitude', 'longitude', 'brightness', 'scan', 'track', 'acq_date',

RangeIndex: 111267 entries, 0 to 111266

Data columns (total 15 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 latitude 111267 non-null float64

1 longitude 111267 non-null float64

2 brightness 111267 non-null float64

3 scan 111267 non-null float64

4 track 111267 non-null float64

5 acq_date 111267 non-null object

6 acq_time 111267 non-null int64

7 satellite 111267 non-null object

8 instrument 111267 non-null object

9 confidence 111267 non-null int64

10 version 111267 non-null float64

11 bright_t31 111267 non-null float64

12 frp 111267 non-null float64

13 daynight 111267 non-null object

14 type 111267 non-null int64

dtypes: float64(8), int64(3), object(4)

memory usage: 12.7+ MB

'acq_time', 'satellite', 'instrument', 'confidence', 'version',

'bright_t31', 'frp', 'daynight', 'type'],

dtype='object')df.isnull().sum()latitude 0

longitude 0

brightness 0

scan 0

track 0

acq_date 0

acq_time 0

satellite 0

instrument 0

confidence 0

version 0

bright_t31 0

frp 0

daynight 0

type 0

dtype: int64df.describe()

Visualize our data using matplotlib and seaborn

`import matplotlib.pyplot as plt`

import seaborn as sns

**Visualizing Numeric Variables**

Let’s make a pairplot of all the numeric variables

Seaborn is more integrated for working with Pandas data frames. It extends the Matplotlib library for creating beautiful graphics with Python using a more straightforward set of methods.The show() function simply displays the result of a series of operations. Because of that, we can gradually fine-tune a lot of details in the figure.

`sns.pairplot(df)`

plt.show()

**2. Visualizing Categorical Variables**

As you might have noticed, there are a few categorical variables as well. Let’s make a boxplot for some of these variables.

plt.figure(figsize=(20, 12))

plt.subplot(2,2,1)

sns.boxplot(x = 'satellite', y = 'confidence', data = df)

plt.subplot(2,2,2)

sns.boxplot(x = 'daynight', y = 'confidence', data = df)

Let’s check the correlation coefficients to see which variables are highly correlated

plt.figure(figsize=(10, 10))

sns.heatmap(df.corr(),annot=True,cmap='viridis',linewidths=.5)

sns.barplot(x='acq_date',y='frp',data=df)

`df_topaffected=df.sort_values(by='frp',ascending=False)`

df_topaffected.head(10)

Data can be in the form of text, videos, images, or more which may contain various missing values, null values, and unnecessary data. This unstructured data may interrupt while training the model or will through some errors. So, before training, the unstructured data must be formatted or cleaned to implement the Machine Learning algorithms effectively

Data cleaning is filling the missing values, cleaning the irrelevant data, and transforming data using normalization and feature selection.

df = df.drop(['track'], axis = 1)df = df.drop(['instrument', 'version'], axis = 1)

df['satellite'] = df['satellite'].map({'Terra':0,'Aqua':1})

df['daynight'] = df['daynight'].map({'D':0,'N':1})

df['month'] = df['acq_date'].apply(lambda x:int(x.split('-')[1]))

df = df.sample(frac=0.2)

df = df.reset_index().drop("index", axis=1)df.head()

df.shape(22253, 13)y = df['confidence']

firedf = df.drop(['confidence', 'acq_date'], axis = 1)plt.figure(figsize=(10, 10))

sns.heatmap(firedf.corr(),annot=True,cmap='viridis',linewidths=.5)

`firedf.head()`

Splitting the dataset into two part of training set (i.e. 70% of data) and testing set(i.e. 30% of data)

`X = df[['latitude','longitude','month','brightness','scan','acq_time','bright_t31','daynight']]`

y = df['frp']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

## Evaluation Matrix

To measures model performance using the training set of the Machine Learning model, some of the loss functions used for the evaluation, such as MSE and MAE among the actual values and the predicted values, can be calculated by importing the scikit-learn python library as it consists of the predefines functions like mean_squared_error() function and mean_absolute_error() fuction. It also provides the predefined process for R squared score as r2_score().

**Mean Squared Error (MSE)**

MSE can be calculated mathematically from the average squared difference of the actual valus from the dataset and predicted values

y =actual y value

ŷ =predicted y value

n = no. of observations

**2. Mean Absolute Error (MAE)**

Mean Absolute Error can be mathematically calculated as the average difference of the value that is predicted from the data and the actual values

yi =actual y value

ŷi =predicted y value

n = no. of observations

**3. Root Mean Square Error (RMSE)**

Mathematically RMSE can be calculated as the square root of the average squared difference of the values predicted and the actual value from the data, or we can say it is the square root value of MSE.

yi =actual y value

ŷi =predicted y value

n = no. of observations

**4. R Squared Score**

R squared score can be calculated from the equation given below:

yi =actual y value

ŷi =predicted y value

ӯ = mean of y value

N = no. of observations

## Algorithm Used

## 1. Gradient Boosting Regressor

Gradient boosting Regression calculates the difference between the current prediction and the known correct target value. To train the model, import GradientBoostingRegressor class. We can use the function fit() with which we can fit the values for Xtrain and values for ytrain.

from sklearn.ensemble import GradientBoostingRegressor

from sklearn.metrics import mean_absolute_error as mae

from sklearn.metrics import mean_squared_error as mse

from sklearn.metrics import r2_score

model1 = GradientBoostingRegressor(n_estimators = 100, learning_rate=0.1,

max_depth = 10, random_state = 0, loss = 'ls')

model1.fit(X_train, y_train)

y_pred = model1.predict(X_test)

print ('MSE =',mse(y_pred, y_test))

print ('RSME =',np.sqrt(mse(y_pred, y_test)))

print ('MAE =',mae(y_pred, y_test))

print ('R2_score =',r2_score(y_pred, y_test))print("Performance ofGBR Model R^2 metric {:.5f}".format(model1.score(X_train,y_train)))

MSE = 449.63478569897046

print("GBR Accuracy, {:.5f}%".format(model1.score(X_test,y_test)*100))

RSME = 21.20459350468597

MAE = 3.2932809652939343

R2_score = 0.9227647882671393

Performance ofGBR Model R^2 metric 0.99982

GBR Accuracy, 92.86853%

## 2. Decision Tree Regressor

Another approach for supervised learning is the Decision Tree Algorithm for solving decision-related problems,which can be used for both Regression and Classification. It consists of various nodes in the form of branches, such as Root node(just like roots of the tree), Interior node, and leaf node(just like leafs of the tree), as this algorithm is a tree structure-based algorithm.

To train the model, import DecisionTreeRegressor class. We can use the function fit() with which we can fit the values for Xtrain and values for ytrain.

from sklearn.tree import DecisionTreeRegressor as dtr

reg = dtr(random_state = 42)

reg.fit(X_train,y_train)

Y_pred = reg.predict(X_test)

print("MSE = ",mse(Y_pred, y_test))

print ('RSME =',np.sqrt(mse(Y_pred, y_test)))

print("MAE =",mae(Y_pred,y_test))

print("R2 score =",r2_score(Y_pred,y_test))

print("Performance of Decision Tree Regressor Model R^2 metric {:.5f}".format(reg.score(X_train,y_train)))

print("Decision Tree Regressor Accuracy, {:.5f}%".format(reg.score(X_test,y_test)*100))MSE = 362.66616536848414

RSME = 19.043795981066488

MAE = 4.693229478729778

R2 score = 0.9316181035003857

Performance of Decision Tree Regressor Model R^2 metric 1.00000

Decision Tree Regressor Accuracy, 94.24790%

Graphical representation of the efficiency of both the ML Algorithm used to build the predictive model for Forest Fire Prediction is compared with the help of Radar Chart A radar chart is a graphical method used for displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. (Source: Wikipedia)

`categories = ['Training score', 'Testing score', 'RMSE', 'MAE','r2','_']`Decision_Tree_Regressor = [100,94.24, 19.4,4.69, 0.93,0]

Gradient_Boosting_Regressor = [99.97, 92.86, 21.20,3.29, 0.92,0]

label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(Decision_Tree_Regressor))

plt.figure(figsize=(8, 8))

plt.subplot(polar=True)

plt.plot(label_loc, Decision_Tree_Regressor , label='Decision Tree Regression')

plt.plot(label_loc, Gradient_Boosting_Regressor, label='Gradient Boosting Regressor')

plt.title('Algorithms comparison for MODIS dataset', size=20)

lines, labels = plt.thetagrids(np.degrees(label_loc), labels=categories)

plt.legend()

plt.show()

**Conclusion**

Our motive is to estimate the susceptibility and the risk of fire occurrence due to natural factors or human activities such as road development & construction and human-forest interfaces.

The proposed approach has achieved high prediction accuracy with ML algorithms like Gradient Boosting Regressor i.e. 93% and Decision tree Regressor i.e. 94% for NASA FIRMS dataset. In the future, considering the NASA FIRMS, operational data used for the study and the proposed method can create a real-time learning environment to predict forest fire risk effectively.

[ad_2]

Source link