Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Evaluating GPT-3 and GPT-4 on the Winograd Schema Challenge (Reasoning Test) | by Denis Kazakov | Mar, 2023

admin by admin
March 16, 2023
in Machine Learning


Just a little fun benchmarking new ChatGPT on confusing sentences.

I found that GPT-4 significantly outperforms GPT-3 on the Winograd Schema Challenge. Specifically,

  • GPT-4 got an accuracy of 94.4%,
  • GPT-3 got 68.8%. *
  • Random baseline is 50% (since there are always only 2 options)

On the WSC285 Winograd set.

All the code/answers are in https://github.com/d-kz/gpt_winograd.

The Winograd Schema Challenge is a task used to evaluate natural language processing models. It gives an ambiguous sentence, which is tricky to understand without having general knowledge of how the world works and using that knowledge to resolve the ambiguity.

For example,

“The man couldn’t lift his son because he was so weak.” In ‘he was so weak’, does ‘he’ refer to ‘the man’ or ‘the son’?

  • We know it can be difficult to lift somebody up if you are weak. Since ‘the man’ is lifting his son, we can assume it’s ‘the man’ that’s weak and not the son.

“Dan took the rear seat while Bill claimed the front because his “”Dibs!”” was slow.” Whose ‘dibs’ was slow? Bill’s or Dan’s.

  • To know that Dan was too slow with his Dibs, we need to know that front seats are the desirable ones. Otherwise, we can’t know who was too slow.

To make such logical reasoning, the model needs to both have:

  1. a good understanding of the world and
  2. be able to relate that understanding to the context it is presented with.
  • ChatGPT UI was used to feed GPT models data in batches of 50 using the prompt:
You will receive rows of data. Each column is separated by ';' symbol. Columns are as following:"text";"pronoun";"quote";"options". 

Your job is to answer each row of data with the following question. What does "pronoun" in "quote" refer to in the "text", given "options"? Choose your answer from "options".

Pay particular attention to ambiguities and try to infer the answer using your knowledge of how the world works to get the right answer. Think it over three times before giving your answer.

Make sure your output is only one of the "options". Provide your answers as a list. don't repeat the question or give your reasoning, only give answers.

  • The context had to be reset after each batch (start new conversation) to avoid degradation.

GPT models are trained on a lot of data and we can only assume it didn’t cheat and just recite the answers to the Winograd challenge. The reasoning it gave when explaining itself seems like it didn’t though.

Feel free to replicate this evaluation with a completely new (i.e. unseen by GPT model) Winograd challenge set, but for now, you can just take GPT’s word for it?!:)

Switching the gender on the question still gets the right answer.

Prompting the models to explain their logic made GPT-4 correct its mistake, while GPT-3 remained firm with its initial decision.

GPT-4 corrects itself (LEFT), GPT-3 fails to correct itself (RIGHT)
All of mistakes by GPT-4



Source link

Previous Post

Quickbooks Receipt Scanning with Nanonets Scanner

Next Post

From Centralized to Federated Learning | by Gergely D. Németh | Mar, 2023

Next Post

From Centralized to Federated Learning | by Gergely D. Németh | Mar, 2023

Introducing Microsoft 365 Copilot – your copilot for work

Low Code and No Code Platforms for AI and Computer Vision

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.