Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Case Study: Practical Label Encoding with Rainbow Method | by Anna Arakelyan | Feb, 2023

admin by admin
February 24, 2023
in Artificial Intelligence


A real-world test on MassMutual’s production model

Co-authored with Dmytro Karabash

Photo by Jason Pogacnik on Unsplash

In our previous article, “Hidden Data Science Gem: Rainbow Method for Label Encoding”, we discussed the advantages of using label encoding over one-hot encoding for categorical variables, especially when developing tree-based models. We introduced the Rainbow method, which helps identify the most appropriate ordinal encoding for different types of categorical variables.

In this article, we will continue exploring the Rainbow method— this time, from a practical perspective, demonstrating its effectiveness using a real project developed by the Data Science team at MassMutual, an esteemed life insurance company that prides itself on promoting data scientists, engineers, and technologists to help make informed business decisions.

The goal is to predict one of five Mindset Segments for each prospective customer. Essentially, it is a multi-class classification problem.

The segmentation framework comprises five classes that reflect a person’s age, financial stability, and attitude toward financial decisions. The MassMutual marketing team then uses the predicted segments in various campaigns for targeting and customization.

Figure 1 (drawn by Anna Arakelyan)

For instance, customers exhibiting Mindset A tend to prioritize independence and autonomy in deciding whether to purchase a life insurance policy, whereas customers with Mindset B often prefer to receive guidance and detailed explanations of various financial products from a dedicated advisor.

The data represent a small set of labeled individuals (17.5K persons) with labels provided by a MassMutual vendor who designed the segment assignment rules. We first added columns from our main prospect database to this data. The objective is to learn the best model using these target labels and available features, and predict segments for all other (unlabeled) prospective customers.

We use a consumer database that covers approximately 300 columns representing a diverse range of demographic characteristics, such as household composition, income and net worth propensities, financial behavior, and digital savviness.

In this article, we compare the conventional One-hot encoding with the ordinal encoding via the Rainbow method using the consumer database and the Mindset Segmentation project. We showcase a few standard metrics — such as Macro Avg F1 score, Macro Avg AUC ROC, Cohen’s Kappa, and Accuracy — for interpretation and comparison purposes in this 5-class classification problem.

We selected all categorical variables in the consumer database — including interval, ordinal, and nominal variables — but excluded quantitative and binary variables. The aim is to demonstrate the difference in model performance between One-hot and Rainbow encoding for the same set of categorical factors.

We performed a target stratified 4-fold Cross-Validation split, and all data processing from this point was done inside the cross-validation loop. This included creating One-hot features and Rainbow features from each fold’s train set and then applying them to each fold’s validation set.

The total set of 111 variables was transformed into 121 Rainbow features, and alternatively into 2260 One-hot features.

Table 1. List of variables before and after encoding

The Rainbow transformation for interval and ordinal variables was straightforward, resulting in 64 Rainbows from 64 interval features and 14 Rainbows from 14 ordinal features.

The nominal variables’ transformation was more involved, where we created 23 natural attribute Rainbows and 20 artificial Rainbow features for the remaining 10 variables. Since we were dealing with five classes, we applied correlation ordering and target percent ordering for a random class (see the section Automated Rainbow Selection of the original article). For example, the nominal variable “Financial_Cluster” was transformed into features “Financial_Cluster_Mindset_B_correlation_rank” and “Financial_Cluster_Mindset_D_target_percent”. Overall, 33 nominal variables were transformed into 43 Rainbows.

Making a choice on the actual ordering — whether it be a natural attribute Rainbow or artificial Rainbow — is highly project and context-specific, and is more of an art than a science. It requires balancing model simplicity, performance, and interpretability.

Unlike ordinal encoding, One-hot transformation generates over two thousand features.

Why do we make One-hot features for interval and ordinal variables here? Because we want to compare Rainbow with One-hot on the full continuum of possible ordering — from perfect order, to fuzzy order, to no order (or wrong order).
Additionally, the classification of a variable into ordinal or nominal is sometimes a subjective decision. One bright example is Color. As we discussed in our first article, Color is considered nominal by some modelers, and ordinal by others.

Initially, we pooled all categorical variables, but later in the article, we analyzed interval, ordinal, and nominal variables separately.

We trained all XGBoost models covering the hyperparameter space shown below:

params = {
'objective': 'multi:softprob',
'eval_metric': 'mlogloss',
'num_class': 5,
'subsample': 0.8,
'max_depth': [2, 3, 5],
'eta': [0.1, 0.3, 0.5],
'n_estimators': [50, 100, 200],
}

We avoided setting max_depth higher than 5 due to the relatively small data size and the need to have at least 100 samples at the end of each branch. We preferred using a simple model, which also helps prevent overfit.

All of the results below represent cross-validation average metrics.

Let us start with the overall averages across all runs. Clearly, the average metrics across all models are higher for Rainbow encoding. The overall difference is a few percentage points.

Figure 2 (generated by author)

The plots below illustrate the changes in metrics for each hyperparameter while keeping all other hyperparameters constant. These plots also clearly demonstrate that Rainbow outcomes exceed One-hot outcomes for each hyperparameter and metric.

Figure 3a (generated by author)
Figure 3b (generated by author)
Figure 3c (generated by author)

Next, let’s compare the runtimes for each method.

One-hot: 65.059 s
Rainbow: 5.491 s

The average time to run a single Rainbow model is nearly 12 times faster than that of a single One-hot model! So, in addition to significantly improving the model performance metrics, we can see that the Rainbow method can also save data scientists an enormous amount of time.

Next, we ran the models which covered the bundles of interval, ordinal, and nominal features separately. The results are listed below.

Figure 4 (generated by author)

These outcomes once again highlight the fact that Rainbow is preferred to One-hot. As expected, interval and ordinal features gain the most from Rainbow encoding — less so for nominal variables.

Clearly, the more defined the category order, the higher the benefits of preferring Rainbow to One-hot. While the performance of Rainbow for nominal variables is either similar or negligibly lower than One-hot, it still achieves the same level of performance in considerably less time by using substantially less space, and the resulting model is significantly simpler.

Finally, to ensure a fair comparison in terms of dimensionality, we selected the top 10, 50, and 100 features from each feature set (Rainbow and One-hot). We utilized the feature importance attribute of the XGBoost model and aggregated feature importance scores for four cross-validation folds on the best hyperparameter set for each encoding type. The results are presented below.

Figure 5 (generated by author)

Rainbow encoding outperforms One-hot encoding with ease, particularly for a smaller number of features. Rainbow encoding reaches its peak performance faster than One-hot encoding and utilizes fewer features. Indeed, with only 10 features, Rainbow encoding is already near its peak, while it takes One-hot encoding 50–100 features to reach a similar level!

Furthermore, Rainbow encoding even exhibits better results for 50 features than One-hot encoding does for 100 features. It is worth noting that when reducing the feature count from 50 to 10, the reduction in Macro-F1 for One-hot encoding is six times higher than that of the Rainbow method (three times for Kappa and Accuracy, and two times for Macro-AUC).

The example of the Mindset Segmentation model at MassMutual clearly illustrates that Rainbow label encoding is superior to One-hot encoding. Not only does it save significant time for modelers, it also substantially reduces dimensionality and provides an organic framework for feature selection. Moreover, if the chosen Rainbow order agrees with the data-generating process, then this encoding can also lead to significant improvements in model performance metrics.



Source link

Previous Post

How to predict building electricity consumption ? | by Maxime | Feb, 2023

Next Post

Pose Estimation: The Ultimate Overview in 2023

Next Post

Pose Estimation: The Ultimate Overview in 2023

Does AI Get its Own Batman?

Planning for AGI and beyond

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.