Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

The Best Way to do Named Entity Recognition (NER) | by Yujian Tang | Sep, 2022

admin by admin
September 8, 2022
in Machine Learning


Natural Language Processing

2 ways to do Named Entity Recognition in Python

Originally published on Plain Simple Software as How to do Named Entity Recognition NER with Python

Named Entity Recognition (NER) is a common Natural Language Processing technique. It’s so often used that it comes in the basic pipeline for spaCy. NER can help us quickly parse out a document for all the named entities of many different types. For example, if we’re reading an article, we can use named entity recognition to immediately get an idea of the who/what/when/where of the article.

In this post we’re going to cover three different ways you can implement NER in Python. We’ll be going over:

Named Entity Recognition, or NER for short, is the Natural Language Processing (NLP) topic about recognizing entities in a text document or speech file. Of course, this is quite a circular definition. In order to understand what NER really is, we’ll have to define what an entity is. For the purposes of NLP, an entity is essentially a noun that defines an individual, group of individuals, or a recognizable object. While there is not a TOTAL consensus on what kinds of entities there are, I’ve compiled a rather complete list of the possible types of entities that popular NLP libraries such as spaCy or Natural Language Toolkit (NLTK) can recognize. You can find the GitHub repo here.

Entity TypeDescription of the NER objectPERSONA person — usually a recognized as a first and last nameNORPNationalities or Religious/Political GroupsFACThe name of a FacilityORGThe name of an OrganizationGPEThe name of a Geopolitical EntityLOCA locationPRODUCTThe name of a productEVENTThe name of an eventWORK OF ARTThe name of a work of artLAWA law that has been published (US only as far as I know)LANGUAGEThe name of a languageDATEA date, doesn’t have to be an exact date, could be a relative date like “a day ago”TIMEA time, like date it doesn’t have to be exact, it could be like “middle of the day”PERCENTA percentageMONEYAn amount of money, like “$100”QUANTITYMeasurements of weight or distanceCARDINALA number, similar to quantity but not a measurementORDINALA number, but signifying a relative position such as “first” or “second”

Earlier, I mentioned that you can implement NER with both spaCy and NLTK. The difference between these libraries is that NLTK is built for academic/research purposes and spaCy is built for production purposes. Both are free to use open source libraries. NER is extremely easy to implement with these open source libraries. In this article I will show you how to get started implementing your own Named Entity Recognition programs.

We’ll start with spaCy, to get started run the commands below in your terminal to install the library and download a starter model.

pip install spacy
python -m spacy download en_core_web_sm

We can implement NER in spaCy in just a few lines of code. All we need to do is import the spacy library, load a model, give it some text to process, and then call the processed document to get our named entities. For this example we’ll be using the “en_core_web_sm” model we downloaded earlier, this is the “small” model trained on web text. The text we’ll use is just some random sentence I made up, we should expect the NER to identify Molly Moon as a Person (NER isn’t advanced enough to detect that she is a cow), to identify the United Nations’ as an organization, and the Climate Action Committee as a second organization.

spacy named entity recognitionimport spacy

nlp = spacy.load("en_core_web_sm")

text = "Molly Moon is a cow. She is part of the United Nations Climate Action Committee."

doc = nlp(text)

for ent in doc.ents:
print(ent.text, ent.label_)

After we run this we should see a result like the one below. We see that this spaCy model is unable to separate the United Nations and its Climate Action Committee as separate orgs.

Let’s take a look at how to implement NER with NLTK. As with spaCy, we’ll start by installing the NLTK library and also downloading the extensions we need.

pip install nltk

After we run our initial pip install, we’ll need to download four extensions to get our Named Entity Recognition program running. I recommend simply firing up Python in your terminal and running these commands as the libraries only need to be downloaded once to work, so including them in your NER program will only slow it down.

python
>>> import nltk
>>> nltk.download(“punkt”)
>>> nltk.download(“averaged_perceptron_tagger”)
>>> nltk.download(“maxent_ne_chunker”)
>>> nltk.download(“words”)

Punkt is a tokenizer package that recognizes punctuation. Averaged Perceptron Tagger is the default part of speech tagger for NLTK. Maxent NE Chunker is the Named Entity Chunker for NLTK. The Words library is an NLTK corpus of words. We can already see here that NLTK is far more customizable, and consequently also more complex to set up. Let’s dive into the program to see how we can extract our named entities.

Once again we simply start by importing our library and declaring our text. Then we’ll tokenize the text, tag the parts of speech, and chunk it using the named entity chunker. Finally, we’ll loop through our chunks and display the ones that are labeled.

named entity recognition nltkimport nltk

text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."

tokenized = nltk.word_tokenize(text)
pos_tagged = nltk.pos_tag(tokenized)
chunks = nltk.ne_chunk(pos_tagged)
for chunk in chunks:
if hasattr(chunk, 'label'):
print(chunk)

When you run this program in your terminal you should see an output like the one below.

Notice that NLTK has identified “Climate Action Committee” as a Person and Moon as a Person. That’s clearly incorrect, but this is all on pre trained data. Also this time, I let it print out the entire chunk, and it shows the parts of speech. NLTK has tagged all of these as “NNP” which signals a proper noun.

Alright, now that we’ve discussed how to implement NER with open source libraries, let’s take a look at how we can do it without ever having to download extra packages and machine learning models! We can simply ping a web API that already has a pre-trained model and pipeline for tons of text processing needs. We’ll be using the open beta of the The Text API, scroll down to the bottom of the page and get your API key.

The only library we need to install is the requests library, and we only need to be able to send an API request as outlined in How to Send a Web API Request. So, let’s take a look at the code.

All we need is to construct a request to send to the endpoint, send the request, and parse the response. The API key should be passed in the headers as “apikey” and also we should specify that the content type is json. The body simply needs to pass the text in. The endpoint that we’ll hit is “https://app.thetextapi.com/text/ner”. Once we get our request back, we’ll use the json library (native to Python) to parse our response.

named entity recognition with a web apiimport requests
import json
from config import apikey

text = "Molly Moon is a cow. She is part of the United Nations' Climate Action Committee."
headers = {
"Content-Type": "application/json",
"apikey": apikey
}
body = {
"text": text
}
url = "https://app.thetextapi.com/text/ner"

response = requests.post(url, headers=headers, json=body)
ner = json.loads(response.text)["ner"]
print(ner)

Once we send this request, we should see an output like the one below.

Woah! Our API actually recognizes all three of the named entities successfully! Not only is using The Text API simpler than downloading multiple models and libraries, but in this use case, we can see that it’s also more accurate.

If you liked this article, please share it on Twitter! For unlimited access to Medium articles, sign up to become a Medium member today! Don’t forget to follow me, Yujian Tang, for more articles on growth, technology, and more!



Source link

Previous Post

SQL Interview Study Plan for Data Scientists | by Yasmine Hejazi | Sep, 2022

Next Post

9 Visualizations to Show Proportions or Percentages instead of a Pie chart | by Boriharn K | Sep, 2022

Next Post

9 Visualizations to Show Proportions or Percentages instead of a Pie chart | by Boriharn K | Sep, 2022

Huawei 2022 DIGIX: Global AI Challenge Başlıyor | by Elanuryurur | Huawei Developers - Türkiye | Sep, 2022

Keep Your ML Models out of Your Application Servers | by Dimitris Poulopoulos | Sep, 2022

Related Post

Artificial Intelligence

Exploring TensorFlow Model Prediction Issues | by Adam Brownell | Feb, 2023

by admin
February 2, 2023
Machine Learning

Different Loss Functions used in Regression | by Iqra Bismi | Feb, 2023

by admin
February 2, 2023
Machine Learning

How to organize bills? – 3 ways to track bills

by admin
February 2, 2023
Artificial Intelligence

How to decide between Amazon Rekognition image and video API for video moderation

by admin
February 2, 2023
Artificial Intelligence

The Future of AI: GPT-3 vs GPT-4: A Comparative Analysis | by Mohd Saqib | Jan, 2023

by admin
February 2, 2023
Deep Learning

6 Ways To Streamline Tech Hiring With A Recruitment Automation Platform

by admin
February 2, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.