Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

How to Train the LILT Model on Invoices and Run Inference | by Walid Amamou | Jan, 2023

admin by admin
January 7, 2023
in Artificial Intelligence


A Step-by-Step Tutorial

Image by Zinkevych_D from Envanto

In the realm of document understanding, deep learning models have played a significant role. These models are able to accurately interpret the content and structure of documents, making them valuable tools for tasks such as invoice processing, resume parsing, and contract analysis. Another important benefit of deep learning models for document understanding is their ability to learn and adapt over time. As new types of documents are encountered, these models can continue to learn and improve their performance, making them highly scalable and efficient for tasks such as document classification and information extraction.

One of these models is the LILT model (Language-Independent Layout Transformer), a deep learning model developed for the task of document layout analysis. Unlike it’s layoutLM predecessor, LILT is originally designed to be language-independent, meaning it can analyze documents in any language while achieving superior performance compared to other existing models in many downstream tasks application. Furthermore, the model has the MIT license, which means it can be used commercially unlike the latest layoutLM v3 and layoutXLM. Therefore, it is worthwhile to create a tutorial on how to fine-tune this model as it has the potential to be widely used for a wide range of document understanding tasks.

In this tutorial, we will discuss this novel model architecture and show how to fine-tune it on invoice extraction. We will then use it to run inference on a new set of invoices.

One of the key advantages of using the LILT model is its ability to handle multi-language document understanding with state-of-the-art performance. The authors achieved this by separating the text and layout embedding into their corresponding transformer architecture and using a bi-directional attention complementation mechanism (BiACM) to enable cross-modality interaction between the two types of data. The encoded text and layout features are then concatenated and additional heads are added, allowing the model to be used for either self-supervised pre-training or downstream fine-tuning. This approach is different from the layoutXLM model, which involves collecting and pre-processing a large dataset of multilingual documents.

LILT Model Architecture. Source

The key novelty in this model is the use of the BiACM to capture the cross-interaction between the text and layout features during the encoding process. Simply concatenating the text and layout model output results in worse performance, suggesting that cross-interaction during the encoding pipeline is key to the success of this model. For more in-depth details, read the original article.

Similar to my previous articles on how to fine-tune the layoutLM model, we will use the same dataset to fine-tune the LILT model. The data was obtained by manually labeling 220 invoices using UBIAI text annotation tool. More details about the labeling process can be found in this link.

To train the model, we first pre-pre-process the data output from UBIAI to get it ready for model training. These steps are the same as in the previous notebook training the layoutLM model, here is the notebook:

We download the LILT model from Huggingface:

from transformers import LiltForTokenClassification
# huggingface hub model id
model_id = "SCUT-DLVCLab/lilt-roberta-en-base"

# load model with correct number of labels and mapping
model = LiltForTokenClassification.from_pretrained(
model_id, num_labels=len(label_list), label2id=label2id, id2label=id2label
)

For this model training, we use the following hyperparameters:

NUM_TRAIN_EPOCHS = 120
PER_DEVICE_TRAIN_BATCH_SIZE = 6
PER_DEVICE_EVAL_BATCH_SIZE = 6
LEARNING_RATE = 4e-5

To train the model, simply run trainer.train() command:

Image by Author: Model Training In Progress.

On GPU, training takes approximately 1h. After training, we evaluate the model by running trainer.evaluate():

{
'eval_precision': 0.6335952848722987,
'eval_recall': 0.7413793103448276,
'eval_f1': 0.6832627118644069,
}

We get a precision, recall and F-1 score of 0.63, 0.74 and 0.68 respectively. The LILT model evaluation F-1 score of 0.68 indicates that the model is performing well in terms of its ability to accurately classify and predict outcomes with a moderate to good accuracy. It is worth noting, however, that there is always room for improvement, and it is beneficial to continue labeling more data in order to further increase its performance. Overall, the LILT model evaluation F-1 score of 0.68 is a positive result and suggests that the model is performing well in its intended task.

In order to assess the model performance on unseen data, we run inference on a new invoice.

We make sure to save the model so we can use it for inference later on using this command:

 torch.save(model,'/content/drive/MyDrive/LILT_Model/lilt.pth')

To test the model on a new invoice, we run the inference script below:

Below is the result:

Image by Author: LILT output on invoice 1

The LILT model correctly identified a wide range of entities, including seller names, invoice numbers, and total amounts. Let’s try out a couple more invoices:

Image by Author: LILT output on invoice 2
Image by Author: LILT output on invoice 3

As we can see, the LILT model was able to handle a variety of different formats with different context with a relatively good accuracy although it made few mistakes. Overall, the LILT model performed well and its predictions were similar to those produced by layoutlm v3 highlighting its effectiveness for document understanding tasks.

In conclusion, the LILT model has proven to be effective for document understanding tasks. Unlike the layoutLM v3 model, the LILT model is MIT licensed which allows for widespread commercial adoption and use by researchers and developers, making it a desirable choice for many projects. As a next step, we can improve the model performance by labeling and improving the training dataset.

If you want to efficiently and easily create your own training dataset, checkout UBIAI’s OCR annotation feature for free.

Follow us on Twitter @UBIAI5 or subscribe here!





Source link

Previous Post

Federated Learning and data privacy on your keyboard. | by Juan 🪴

Next Post

Best practices for creating Amazon Lex interaction models

Next Post

Best practices for creating Amazon Lex interaction models

Graph Convolutional Neural Network | by Mohd Saqib | Dec, 2022

Overfitting (Aşırı Öğrenme), Underfitting (Eksik Öğrenme) ve Bias- Variance Tradeoff (Yanlılık/Varyans İkilemi) | by Hilal Gevrek | Jan, 2023

Related Post

Artificial Intelligence

Amazon SageMaker built-in LightGBM now offers distributed training using Dask

by admin
January 30, 2023
Artificial Intelligence

Don’t blame a Data Scientist on failed projects! | by Darya Petrashka | Dec, 2022

by admin
January 30, 2023
Edge AI

BrainChip Tapes Out AKD1500 Chip in GlobalFoundries 22nm FD SOI Process

by admin
January 30, 2023
Big Data

You Have More Data Quality Issues Than You Think: Here’s Why.

by admin
January 30, 2023
Artificial Intelligence

Using Generative Models for Creativity

by admin
January 30, 2023
Artificial Intelligence

CRPS : Scoring Function for Bayesian ML Models | by Itamar Faran

by admin
January 29, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.