Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Do You Really Need Deep Learning? | by Luís Fernando Torres | May, 2023

admin by admin
May 6, 2023
in Machine Learning


When working with tabular data, do you even need deep learning? Researchers say you don’t!

https://unsplash.com/photos/xTA7sAAtG40

Artificial intelligence is undoubtedly a hot topic. The success of tools such as ChatGPT, Midjourney, Stable Diffusion, and many others has left many people interested in studying and understanding how A.I. works. As a result, many beginners have started their own machine learning journey.

With its impressive performance in text, audio, image, and video tasks, deep learning and neural networks have captured the attention of enthusiasts and beginners alike. They often jump straight into the deep learning wagon as soon as they start their studies on machine learning, trying to apply neural networks to the simplest of regression and classification tasks.

But is that really necessary?

The following papers, Tabular Data: Deep Learning is Not All You Need (2021) and Why do tree-based models still outperform deep learning on tabular data? (2022), tested a few deep learning models on tabular data to compare their performance against tree-ensemble models, like XGBoost and RandomForest.

Let’s see what they found out!

https://www.statology.org/tabular-data/

Before delving into the results achieved by the papers, we first need to understand what exactly is tabular data.

We may refer as tabular data any kind of data that is structured within a table with rows and columns. For instance, in a house price prediction dataset, each house will be represented by a row – a sample – and its attributes will be organized among columns, which contain information on that specific house.

A tabular data containing 9 samples and 5 attributes (https://www.statology.org/tabular-data/)

A lot of data in Finance, Healthcare, Housing, and others are organized in a table containing rows and columns, hence, these data are what we call tabular data.

For dealing with this kind of data, we have many algorithms and methods, such as decision trees, ensemble learners, logistic regression, linear regression, support vector machines, etc. Whereas, deep learning is used for data that are not tabular, such as pictures and audio.

Both academic papers mentioned above have tested tree-ensemble models against deep learning models on different datasets. Tabular Data: Deep Learning is Not All You Need (2021) focused on both classification and regression tasks, while Why do tree-based models still outperform deep learning on tabular data? (2022) approached only classification task.

In the Why do tree-based models still outperform deep learning on tabular data? (2022) paper, XGBoost, RandomForest and GradientBoostingTrees were compared against MLP, Resnet, FT_Transformer, and SAINT deep learning models.

Medium-sized datasets, with only numerical features
Medium-sized datasets, with both numerical and categorical features

The images above show the accuracy of the models on the validation set of different datasets across different random iterations. It’s possible to see that, for datasets with only numerical features, as well as those with both numerical and categorical features, the tree-ensemble models outperform the deep learning models.

The study also highlights the fact that each random iteration is generally slower for the neural networks than for the tree-ensemble models, which is another factor of disadvantage for this type of approach when dealing with tabular data.

In the Tabular Data: Deep Learning is Not All You Need (2021) paper, the authors put the XGBoost against the TabNet, NODE, DNF-NET, and 1D-CNN deep learning models, while highlighting the 1D-CNN model as the one that have achieved the best single model performance in a Kaggle competition with tabular data.

The study also compares the performance of a simple ensemble model (SVM and CatBoost), an ensemble of the deep learning models alone, and an ensemble of the deep learning models with the XGBoost.

The experiment was made on 11 tabular datasets, containing both classification and regression tasks. And these are the results.

Each column is a dataset. Better performance are highlighted with bold numbers.
Each column is a dataset. Better performance are highlighted with bold numbers.

The models were evaluated on the cross-entropy loss for binary classification tasks, while the root-mean-square error was used to evaluate models on regression tasks.

Overall, it was concluded that the XGBoost model outperforms the deep learning models on most datasets. Beyond that, there was not a single deep learning model that consistently outperformed the other models, and each deep learning model was better only on the datasets that were tested in its on paper.

Even though the ensemble of deep learning models and XGBoost consistently outperformed the other models, including XGBoost alone, it was concluded that the XGBoost model alone would be the easiest to optimize and the faster to converge, which would be a relevant advantage under tight time constraints.

Without a doubt, neural networks are exciting, and deep learning has been allowing us to perform tasks that were unimaginable a few years ago on non-tabular data. However, when dealing with tabular data, it turns out that the more “traditional” machine learning models may be faster and achieve better results than deep learning.

It’s also important to mention that there is no such thing as a “holy grail” in this industry. There is not a single model that beats every other model in any particular task. Testing, fine-tuning, validating, and making changes to see what works best for the problem at hand, then repeating this process over and over again, is a part of being a data scientist.

Even though the papers above suggest that, for now, there is no particular reason to jump right away into deep learning for solving tabular-data tasks, it’s indispensable to try different methodologies in your work to see which provides the best result.

It’s also relevant to note that studies on neural networks are advancing rapidly, and it may be a matter of time until we have a deep learning model that can consistently beat XGBoost on tabular data. The game is still on!

Thank you for reading,

Luís Fernando Torres

LinkedIn

Kaggle



Source link

Previous Post

How to remove irrelevant pages from invoices?

Next Post

How to Build Simple ETL Pipelines With GitHub Actions

Next Post

How to Build Simple ETL Pipelines With GitHub Actions

Bonus: Chatbot draws more stuff

New Toolchain and Software Package from STMicroelectronics Ease Development of Edge Processing with Intelligent Inertial Sensors

Related Post

Edge AI

Nvidia Market Cap Exceeds US$1 Trillion, an Early Winner in the AI Boom

by admin
June 5, 2023
Artificial Intelligence

Unraveling the Design Pattern of Physics-Informed Neural Networks: Part 05 | by Shuai Guo | Jun, 2023

by admin
June 5, 2023
Machine Learning

A Primer in Machine Learning for Beginners | by Unnati Shah | Jun, 2023

by admin
June 5, 2023
Machine Learning

Integrating AI into Your Finance Function

by admin
June 5, 2023
Artificial Intelligence

Configure and use defaults for Amazon SageMaker resources with the SageMaker Python SDK

by admin
June 5, 2023
Edge AI

Solving Unsolvable Combinatorial Problems with AI

by admin
June 4, 2023

© Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.