Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Flight Forecast: An In-depth Analysis and Predictive Modeling of Small U.S. Airlines | by Linh Cao | Sep, 2023

admin by admin
September 15, 2023
in Machine Learning


Part 1: Data Collection and Defining the Analytical Scope

This project is aimed at analyzing the factors leading to commercial flight delays, specifically targeting small operating airlines. The goal is to analyze historical flight data from Carrier On-Time Performance Report (2018 — present) on Bureau of Transportation Statistics (BTS) [1] to draw insights and eventually use this knowledge to develop a machine learning model that predicts the probability of flight delays.

Data for this analysis was sourced from the Bureau of Transportation Statistics. This involved downloading datasets from January 2018 to April 2023. The datasets were consolidated into a singular comprehensive file to facilitate more streamlined analysis called combined_data.

Combined_data has 32 million records and 120 columns. Among these columns, two have been specifically marked as possible predicted labels:

  • ArrDel15: This column categorizes flights into two groups — delayed (1) and not delayed (0). All flights with delayed time more than fifteen minutes were considered as delayed and vice versa [2].
  • ArrDelayMinutes: This column details the exact duration of a flight’s delay in minute.

The remainder of the columns are factors that may influence or correlate with delays. They contain both internal and external factors, including: reason of delays, seasonal-related factors (months, days of week, years, etc.), operational-related aspects (wheel-on, wheel-off, flight schedules, tail numbers, etc.), and so on.

However, combined_data is a very big file for analysis. Therefore, the research was conducted to define which factors have the strong impact on delays. Seventy-four columns were dropped from the dataset and forty-six columns are kept for data analysis and modeling. The new combined_data file has 32 million rows and 46 columns after manual feature selection step.

In addition, an initial analysis from the new combined_data file shows that 2020 has the lowest number of delays. Covid-19 may be the factors that caused the number of flights in 2020 drop dramatically, leading to the small number of delays. According to TranStats [2], the number of arrival delays in 2020 was half that of 2018, 2019, 2022, and the predicted figures for 2023. Meanwhile, the number of canceled flights in 2020 was four times that of the other years in the period. This leads us to conclude that the anomalies observed in 2020 were likely due to the impact of COVID-19 on flight schedules and operations. Include 2020 in the trained model may impact on the accuracy of the model, thus, the final data for data analysis and modeling does not include data of 2020. The new combined_data file include 26 million rows and 46 columns after year selection step.

Tools and Techniques

Insufficient computation resources presented challenges in merging csv files after downloading from BTS. To address this problem, cloud computing and MySQL server were deployed. The 64 csv files were uploaded to MySQL with the aid of dBeaver. The connection between Tableau and MySQL was established for data visualization. The data selection step was processed on Google Colab. Other tools which were utilized throughout the project are:

  • Development Environment: Jupiter Notebook
  • Python library: Scikit-learn, Pandas, NumPy, Matplotlib, Plotly
  • BI tool: Tableau
  • Version Control: Git, GitHub
  • Databased: MySQL, dBeaver

Exploratory Data Analysis Approach

Table 1. Questions for exploratory data analysis.

Data Modeling Approach

In Section 1.3.1, two potential predictive labels are outlined: ArrDel15 and ArrDelMinutes. For ArrDel15, classification models aim to predict the probability of delays, while for ArrDelayMinutes, regression models are employed to forecast the duration in minutes. These two labels suggest differing modeling approaches; however, both utilize supervised learning technique.

Figure 1. Correlation heatmap between numeric features and label.

In this project, the predictive objective was ArrDel15, therefore, classification models were used, including: Gradient Boosting, Random Forest, Decision Tree, AdaBoost. To train the model, a set of features were selected based on their correlation to the label. Important features which have the highest correlation with ArrDel15 in the correlation heatmap (Figure 1) were chosen. In addition, two evaluation matrices are used, including Accuracy and Confusion Matrix (precision and recall are used).

An analysis was conducted to calculate average flight per month of each airline in the dataset in years 2018, 2019, 2021, 2022, and 2023. According to the information given by Chris MacNeel, airlines with more than 20,000 flights per month are big airlines and vice versa. Therefore, airlines in the dataset were categorized into three primary group:

  • Big Airlines (>20,000 monthly flights): Southwest Airlines (WN), Delta Air Lines (DL), American Airlines (AA), SkyWest Airlines (OO), United Airlines (UA), Republic Airways (YX), Envoy Air (MQ), JetBlue Airways (B6).
  • Small Airlines (<20,000 monthly flights):

Regional/cargo airlines: Endeavor Air (9E), PSA Airlines (OH), Mesa Airlines (YV), Horizon Air (QX), West Atlantic (PT), Air Wisconsin (ZW), GoJet Airlines (G7), Commute Air (C5), Empire Airlines (EM), PenAir (KS), Cape Air (9K).

Small operating airlines: Alaska Airlines (AS), Spirit Airlines (NK), Frontier Airlines (F9), Allegiant Air (G4), Hawaiian Airlines (HA).

  • Terminated Airlines: Trans States Airlines (AX — 2020), Virgin America (VX — 2018), Compass Airlines (CP — 2020), ExpressJet Airlines (EV — 2020).

The project only focuses on small airlines because they are the targeted customers of our client.

Part 2:

Part 3:

[1] “Marketing Carrier On-Time Performance (Beginning January 2018).” Bureau of Transportation Statistics, www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGK&QO_fu146_anzr=. Accessed 27 Aug. 2023.

[2] Schonland, Addison. The Value of Time for an Airline | AirInsight. 13 Feb. 2023, airinsight.com/the-value-of-time-for-an-airline/. Accessed 28 Aug. 2023.



Source link

Previous Post

What is a Signature Identifier?

Next Post

Rubik and Markov. Here we obtain the probability of… | by Eduardo Testé | Aug, 2023

Next Post

Rubik and Markov. Here we obtain the probability of… | by Eduardo Testé | Aug, 2023

MicroAI™ Launches New Product — "Machine Intelligence"

5 Amazing & Free LLMs Playgrounds You Need to Try in 2023

Related Post

Artificial Intelligence

Genius Cliques: Mapping out the Nobel Network | by Milan Janosov | Sep, 2023

by admin
October 1, 2023
Machine Learning

Detecting Anomalies with Z-Scores: A Practical Approach | by Akash Srivastava | Oct, 2023

by admin
October 1, 2023
Machine Learning

What are SWIFT Payments and How Does It Work?

by admin
October 1, 2023
Artificial Intelligence

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

by admin
October 1, 2023
Edge AI

Unleashing LiDAR’s Potential: A Conversation with Innovusion

by admin
October 1, 2023
Artificial Intelligence

16, 8, and 4-bit Floating Point Formats — How Does it Work? | by Dmitrii Eliuseev | Sep, 2023

by admin
September 30, 2023

© Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.