Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Machine Learning with Expert Models: A Primer | by Samuel Flender | Sep, 2023

admin by admin
September 6, 2023
in Artificial Intelligence


How a decades-old idea enables training outrageously large neural networks today

Towards Data Science

(Pexels)

Expert models are one of the most useful inventions in Machine Learning, yet they hardly receive as much attention as they deserve. In fact, expert modeling does not only allow us to train neural networks that are “outrageously large” (more on that later), they also allow us to build models that learn more like the human brain, that is, different regions specialize in different types of input.

In this article, we’ll take a tour of the key innovations in expert modeling which ultimately lead to recent breakthroughs such as the Switch Transformer and the Expert Choice Routing algorithm. But let’s go back first to the paper that started it all: “Mixtures of Experts”.

Mixtures of Experts (1991)

The original MoE model from 1991. Image credit: Jabocs et al 1991, Adaptive Mixtures of Local Experts.

The idea of mixtures of experts (MoE) traces back more than 3 decades ago, to a 1991 paper co-authored by none other than the godfather of AI, Geoffrey Hinton. The key idea in MoE is to model an output “y” by combining a number of “experts” E, the weight of each is being controlled by a “gating network” G:

An expert in this context can be any kind of model, but is usually chosen to be a multi-layered neural network, and the gating network is

where W is a learnable matrix that assigns training examples to experts. When training MoE models, the learning objective is therefore two-fold:

  1. the experts will learn to process the output they’re given into the best possible output (i.e., a prediction), and
  2. the gating network will learn to “route” the right training examples to the right experts, by jointly learning the routing matrix W.

Why should one do this? And why does it work? At a high level, there are three main motivations for using such an approach:

First, MoE allows scaling neural networks to very large sizes due to the sparsity of the resulting model, that is, even though the overall model is large, only a small…



Source link

Previous Post

The Role of AI in Natural Disaster Response and Recovery Efforts | by 141_Anubrata Dutta | Sep, 2023

Next Post

Top Factors to Consider When Integrating Multiple Cameras into Embedded Vision Applications

Next Post

Top Factors to Consider When Integrating Multiple Cameras into Embedded Vision Applications

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

The 5 Best Workflow Automation Software in 2023

Related Post

Artificial Intelligence

How to Implement Random Forest Regression in PySpark | by Yasmine Hejazi | Sep, 2023

by admin
September 26, 2023
Machine Learning

Mastering The Method of Choosing Your Most Accurate Machine Learning Algorithms: A Comprehensive Guide

by admin
September 26, 2023
Machine Learning

Mastering How to Calculate the Return on Equity: A Guide

by admin
September 26, 2023
Deep Learning

How Observability in DevOps is Transforming Dev Roles

by admin
September 26, 2023
Artificial Intelligence

Innovation for Inclusion: Hack.The.Bias with Amazon SageMaker

by admin
September 26, 2023
Edge AI

Flex Logix Expands Upon Industry-leading Embedded FPGA Customer Base

by admin
September 26, 2023

© Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.