Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

How I fixed over 50 label issues in a popular semantic segmentation dataset | by Jamie Murdoch | Sep, 2022

admin by admin
September 8, 2022
in Machine Learning


Finding pixel-wise annotation errors in MIT ADE20K

TL;DR — Bad labels are a major problem in AI. I introduce the first technique for finding annotation errors in semantic segmentation datasets. On MIT ADE20K I find over 50 label issues, with confirmed errors on 7% of total pixels. For some rarer label classes, I triple the number of annotations. I’m building a company to improve computer vision datasets, if you’d like to find errors in your dataset contact me. Click here for similar results on MS COCO object detection.

In the minds of many ML academics, the typical model development path looks something like this

However, those who have spent time building models in industry know that the real challenge lies is dealing with scenarios like this.

Example labels from MIT ADE20K. When trees overlap with the sky, it is sometimes labeled as “tree”, sometimes labeled as “sky”

In conversations with over 75 ML teams, this comes up all the time. It doesn’t matter how big your dataset or how fancy the model — without clean, high quality labels you’re not going to get good performance.

Unfortunately, while tweaking hyperparameters is easy, finding broken labels is not. Semantic segmentation is particularly hard as you need to search over all the pixels contained in an image. As a consequence, the current state of the art for finding errors in semantic segmentation models is to look through every image manually, which is very expensive. To repeat myself, no techniques exist, in industry or academia, to find errors in semantic segmentation datasets.

Until now, that is! In this post, we extend FIXER to find pixel-wise errors in semantic segmentation datasets. FIXER uses novel explainable AI techniques to flag arbitrary image patches for manual review. On the MIT ADE20K dataset, it identifies over 50 distinct issues, finding¹ confirmed errors in 7% of total labels. In some of the rarer classes, such as “pillow”, the number of annotated pixels is tripled by FIXER.

  • I am developing Breakpoint, a no-code UI for exploring and improving computer vision datasets using FIXER (without the need to share data). If you would like to be a design partner, or be placed on the waitlist, please sign up here.
  • If you would like to use FIXER on your computer vision dataset, please contact me. I provide a consulting service: send me your dataset and I’ll send you a cleaned version back.

We also handle object detection models, and image classification, and are actively adding new model types.

About me: I am a UC Berkeley PhD in explainable AI who spent time at Facebook AI and Google Brain, has been cited 1300+ times, and founded and sold, Clientelligent, an AI startup.

Finding errors in MIT ADE20K

MIT ADE20K is one of the most widely used semantic segmentation datasets, with over 20,000 images. Each pixel in each image is labelled into one of 150 classes, ranging from “floor” to “radiator”.

In total, FIXER finds a total of 7% confirmed errors in ADE20K, with 48% of the discovered errors falling across 52 specific issues, and the other 52% general errors. FIXER’s output both selects a patch of pixels with a particular label, and suggests a corrected class for that patch. Some of the different error types FIXER captures are listed below (more examples are provided in the appendix).

1 — General errors (not in a specific issue). These are clear mistakes made by the labeler, that don’t follow any particular pattern.

“Ceiling” incorrectly labeled as “floor”

2— Ambiguous labels are situations where labels are inconsistently applied, and the ground truth is unclear (these often require a judgement call)

Walking streets are labeled as road, rather than sidewalk

3 — False negatives, where a rare class (e.g. cushion) is missed in favor of a more common class (e.g. couch)

Pixels where a “cushion” was mislabeled as “sofa”

Conclusion

In this post, I introduce FIXER — the first technique for finding pixel-level errors in semantic segmentation datasets. On MIT ADE20K, it finds over 50 different label issues, covering general errors, ambiguous labels, and false negatives. While I focused on the outputs of FIXER in this post, I intend to present the underlying methodology in future work.

In object detection, FIXER previously found nearly 300,000 errors in MS COCO, and an upcoming post will showcase some surprising results on image classification.

If you’d like to hear about future posts, consider following me on Medium, Twitter or Linkedin. If this problem intrigues you, I’d love to chat: hello@setbreakpoint.com. We’re also actively looking for design partners/building a waitlist for Breakpoint (our no-code UI for improving models), consulting clients (you share your data, we send back a cleaned version), and founding engineers.

[1] : We estimated our error numbers by randomly choosing 25 flagged image patches from each label, for a total of 25*150=3750 patches, and manually checking each one by hand. We then use the estimated accuracy to compute the expected number of errors.

While “7% of labels are incorrect” is simple and easy to understand, it actually misses a lot. For instance, in this dataset the “blind, screen” class accounts for 0.1% of total labels, and FIXER-corrected labels more than triples that. That is a very significant change, but is effectively a rounding error when looking at the % of labels corrected. Collectively, there are 8 classes that account for only 0.7% of total labels, who’s size increases by almost 250% after FIXER. This is the type of highly material change that should be measured.

To fix this, I argue that we should evaluate labels the same way we evaluate model predictions. That is, we can treat the original labels as “predictions” of the corrected labels, and use the same metrics used to evaluate our models, in this case mean Intersection over Union(mIoU). These metrics are already designed to handle things like the rare class issue above.

Under this metric, we estimate that FIXER achieves improvements equal to 82.7 mIoU. As a comparison point, the current SOTA is 62.8 mIoU.

While there are currently no methods to benchmark against in this space, this point is worth noting as we continue to improve on FIXER. It is also noteworthy that the error rates for SOTA models are not that much lower than the error rates of the original labels.

Additional examples of general label errors

Click to enlarge

Additional examples of ambiguous labels

Blinds on top of windows are often labeled as windows
The ceiling molds joining ceiling and walls are labeled as both “wall” and “ceiling”
Drawers in kitchen cabinets are sometimes labeled as “chest of drawers” (which typically references bedroom dressers)

Additional examples of false negatives

Ceiling fans are incorrectly labeled as “ceiling”
Blankets on armchairs are incorrectly labeled as “armchair”
Lights contained in ceiling are incorrectly labeled as “ceiling”





Source link

Previous Post

A Comprehensive Tutorial on Stereo Geometry and Stereo Rectification with Python | by Neeraj Krishna | Sep, 2022

Next Post

Introduction to ML in Production. Digging into the machine learning… | by Javier Fernandez | Sep, 2022

Next Post

Introduction to ML in Production. Digging into the machine learning… | by Javier Fernandez | Sep, 2022

Unsupervised Learning: Building Score Metrics for Cluster Points | by Samson Afolabi | Sep, 2022

Some of The Most Important SQL Commands

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.