Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence
Semi-Supervised Approach for Transformers [Part 2] | by Divyanshu Raj | Sep, 2022

Semi-Supervised Approach for Transformers [Part 2] | by Divyanshu Raj | Sep, 2022

admin by admin
September 7, 2022
in Artificial Intelligence


A Siamese Network structure to fine-tune the RoBERTa model and train a Natural Language Inference Classification task

Photo by Faye Cornish on Unsplash

This article will use Siamese architecture to fine-tune the RoBERTa model. We will also observe the dataset’s embedding distribution with PCA and TSNE as we fine-tune and train it. Training will happen on the fine-tuned RoBERTa model for Natural Language Inference classification (a subset of SNLI).

Code for this article. To understand the intuition behind the semi-supervised approach for transformers read Part 1.

Dataset

A subset of the SNLI dataset is used for training. 20,000 training samples, 10,000 validation samples, and 10,000 test samples. The NLI (Natural Language Inference) dataset has a premise and a hypothesis. The goal of the model is to predict whether the hypothesis is in contradiction, entailment, or neutral to the premise.

The original dataset has ~550,000 training samples. This experiment done on the full set will yield better results.

Let’s view the dataset analysis:

Dataset Analysis (Image by Author)

We can observe from the above graph that the dataset is balanced. Max sentence length can be estimated as 64 words after concatenating the premise with the hypothesis.

Model

We will be using 2 models for this. The first one is for Siamese architecture to fine-tune RoBERTa. The second model will have a classification head to predict whether the sentences are entailment, contradiction, or neutral to each other.

Sentence Transformer Model:

Code snippet to create Sentence Transformer model for RoBERTa (by Author)

The output of this model is a 768 feature or the context vector. We will compute embeddings for premise and hypothesis simultaneously and use cosine similarity loss to fine-tune the model.

  • -1 for contradiction
  • 0 for neutral
  • 1 for entailment
Siamese Architecture (Photo from sbert.net)

The above diagram represents the architecture. After the fine-tuning is done, the weights of the RoBERTa model are saved for further downstream tasks.

Auto Model For Sequence Classification:

Code snippet to load a fine-tuned RoBERTa for sequence classification (by Author)

After fine-tuning the model with the Siamese architecture we will use the same RoBERTa weights to define a sequence classification model. In this model, the new weights will be assigned only to the last “classifier” layer. This will be further trained on the training data with a cyclic learning rate and Adam optimizer.

Cosine Scheduler for Learning Rate (Image from HuggingFace)

A learning rate 2e-5 with a cosine scheduler having 2 cycles is used for training. This is to avoid overfitting.

Graphs for Loss (left) and Accuracy (right) (Image by Author)

The above graph displays the loss and accuracy curves for the sequence classification. Test accuracy of 87% is achieved.

After training, we will save the weights. These weights can be again loaded to the Sentence Transformer model, and trained for a couple of epochs to get even better sentence embeddings.

Sentence Embeddings Analysis

We have trained the model 3 times:

  • Load the original RoBERTa weights into Sentence Transformers to train with Siamese architecture. We can save these fine-tuned weights.
  • Load the fine-tuned weights from the above training into a sequence classification model with RoBERTa. After training, we save these weights.
  • Load the weights from the above step into Sentence Transformers to train with Siamese architecture for a couple of epochs and then save the weights.

The sentence embeddings for all the sentence pairs in the training set are saved 3 times:

  • Original RoBERTa weights.
  • After fine-tuning with Siamese architecture.
  • After sequence classification and fine-tuning with Siamese architecture again.

On these 3 versions of sentence embeddings, we will run PCA and TSNE fit to analyze the effect of fine-tuning with the dataset. The following graphs represent the analysis:

Graph 1: We run PCA with n_components = 3 on all the 3 instances of sentence embeddings and plot the graph for 2 dimensions.

PCA (n = 3), Graph for PCA-x, PCA-y : Original RoBERTa (left), Fine Tuning with Siamese (center), After NLI Classification (right). (Image by Author)

Graph 2: We run PCA with n_components = 3 on all the 3 instances of sentence embeddings and plot a 3D graph for all the data points.

PCA (n = 3), Graph for PCA-x, PCA-y, PCA-z : Original RoBERTa (left), Fine Tuning with Siamese (center), After NLI Classification (right). (Image by Author)

Graph 3: We run TSNE fitting with n_components = 2 on all the 3 instances of sentence embeddings and plot the graph.

TSNE (n = 2), Graph for X, Y : Original RoBERTa (left), Fine Tuning with Siamese (center), After NLI Classification (right). (Image by Author)

Graph 4: We run PCA with n_components = 50 and run TSNE fitting on top of it with n_component = 2 on all the 3 instances of sentence embeddings. In this graph, we compare the plot for the previous 2 plots and the current plot.

Graph 4.1: For original RoBERTa weights. The observation on this is that all the data points are scattered and no logic is sufficient to show a clear distinction between sentence pair embeddings based on their label.

Comparison of PCA[n = 3][X, Y] (left), TSNE[n = 2][X, Y] (center), TSNE[n = 2, PCA[n = 50]][X, Y] (right) for Original RoBERTa (Image by Author)

Graph 4.2: After fine-tuning the Siamese architecture we can observe some distinctions between sentence pair embeddings based on their label.

Comparison of PCA[n = 3][X, Y] (left), TSNE[n = 2][X, Y] (center), TSNE[n = 2, PCA[n = 50]][X, Y] (right) for Siamese RoBERTa (Image by Author)

Graph 4.3: After sequence classification and fine-tuning the Siamese architecture we can observe that there is a clear distinction between sentence pair embeddings based on their label.

Comparison of PCA[n = 3][X, Y] (left), TSNE[n = 2][X, Y] (center), TSNE[n = 2, PCA[n = 50]][X, Y] (right) for RoBERTa after NLI Classification (Image by Author)

I have presented the impact of training a transformers model with a semi-supervised approach that always leads to a robust model. This experiment was done on a small set of 20,000 data points. Doing this on an actual dataset will have a bigger impact. There are already fine-tuned models with Siamese architecture on huggingface that be used directly.

Other techniques can be used to better cluster or visualize the sentence pair embeddings. These visualizations can also be done for any other NLP task.

References:

  1. Sentence BERT
  2. Attention is all you need
  3. Sentence Transformers for Semantic Textual Similarity
  4. Fine Tuning BERT for Natural Language Inference
  5. Visualizing high-dimensional datasets using PCA and t-SNE in Python



Source link

Previous Post

Setting up a Machine Learning Experiment Tracking Service for your Team with AWS and MLflow | by Eduardo Alvarez | Sep, 2022

Next Post

NuNet Development Update: September 2022 | by NuNet Team | NuNet | Sep, 2022

Next Post

NuNet Development Update: September 2022 | by NuNet Team | NuNet | Sep, 2022

Making pictures with words. The rise and rise of AI text-to-image… | by Sau Sheong | Sep, 2022

A Guide to Neural Architecture Search | by Nilotpal Sinha PhD | Sep, 2022

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.