Getting started with nltk-wordnet in Python
The WordNet English dictionary is part of the Natural Language Tool Kit (NLTK) in Python. Natural Language Processing (NLP) is made simple and straightforward using this comprehensive set of tools. <!--more--> This tutorial will cover the basic actions that can be done using this tool.
Prerequisites
To have a better understanding of this article, the reader should:
- Have basic knowledge of the python language.
- Have python installed.
- Have nltk and its corpus installed.
Table of contents
- A start with Synonyms and Synsets
- Understanding NLTK Hypernyms and Hyponyms
- A look into Meronyms and Holonyms
- Understanding NLTK Entailments
- Conclusion
A start with Synonyms and Synsets
WordNet categorizes English words into synonyms, referred to as Synsets (short for a set of synonyms). Every Synset contains a name, a part-of-speech (nouns, verbs, adverbs, and adjectives), and a number.
Synsets are used to store synonyms, where each word in the Synset shares the same meaning. Essentially, each Synset is a collection of synonyms. Some words have just one Synset, while others have multiple Synsets. Every Synset has a definition associated with it. Synset makes it easier for users to look up words in the WordNet database.
Getting the Synsets of a word
Synsets of a word are other words with the same meaning as the supplied word. To get the Synsets of the word given, we use the function wordnet.synsets('word')
. The function returns an array containing all the Synsets related to the word passed as the argument.
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('book')
Output:
[Synset('book.n.01'),Synset('book.n.02'),Synset('record.n.05'),Synset('script.n.01'),Synset('ledger.n.01'),Synset('book.n.06'),Synset('book.n.07'),Synset('koran.n.01'),Synset('bible.n.01'),Synset('book.n.10'),Synset('book.n.11'),Synset('book.v.01'),Synset('reserve.v.04'),Synset('book.v.03'),Synset('book.v.04')]
The function also allows you to restrict the word's part of speech by providing an optional position argument—for example, if we want to get all the synsets of a word that are verbs:
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('book', pos=wn.VERB)
Output:
[Synset('book.v.01'),
Synset('reserve.v.04'),
Synset('book.v.03'),
Synset('book.v.04')]
On the other hand, if we want to get all the Synsets of a noun word, we specify the same in the positional argument.
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('book', pos=wn.NOUN)
Output:
[Synset('book.n.01'),Synset('book.n.02'),Synset('record.n.05'),Synset('script.n.01'),Synset('ledger.n.01'),Synset('book.n.06'),Synset('book.n.07'),Synset('koran.n.01'),Synset('bible.n.01'),Synset('book.n.10'),Synset('book.n.11')]
Getting the definition of a Synset
To get the definition of a Synset, you can utilize the definition()
function, which can further analyze the Synset for a common definition to all of its lemmas.
This method returns a string that conforms to the essential specification. There are two ways of achieving this:
Example one: In order to get at one of the items in the array provided by Synsets('word'), we can do the following:
import nltk
from nltk.corpus import wordnet as wn
synset_array = wn.synsets('book')
synset_array[1].definition()
Output:
'physical objects consisting of a number of pages bound together
Example two
import nltk
from nltk.corpus import wordnet as wn
synset_array = wn.synsets('book')
synset_array[3].definition()
Output:
'a written version of a play or other dramatic composition; used in preparing for a performance.'
Names, a part of speech, and how many times a Synset has been defined can be obtained using the synset()
function. For instance:
import nltk
from nltk.corpus import wordnet as wn
wordnet.synset('book.n.02').definition()
Output:
physical objects consisting of a number of pages bound together
Example two
import nltk
from nltk.corpus import wordnet as wn
wordnet.synset('script.n.01').definition()
Output:
physical objects consisting of a number of pages bound together
How to get Lemmas of a Synset
Lemmas are all the words that are in a Synset. Using the Lemma_names()
method, the user can get all lemmas of the specified Synset. This method can be used in two different ways to get an array of all the Lemma names:
First way
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('book')
synset_array = wn.synsets('book')
print(synset_array[3].lemma_names())
Output:
['script', 'book', 'playscript']
Second way
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('book')
print(wn.synset('book.n.07').lemma_names())
Output:
['book', 'rule_book']
Understanding NLTK Hypernyms and Hyponyms
A Hyponym is a type of Synset that has been modified for a specific purpose instead of a generic Synset. In terms of inheritance, it is similar to the concept of a "child class."
A synonym is a function returning an array containing all Synsets that form the hyponyms of the Synset passed as an argument to the function.
Hypernyms exist in several shapes and sizes, but the Synset is the most popular. The terms hyponym and hypernym are opposed. A Synset's hypernyms are returned in the form of an array of numbers.
For example, the words 'banana' and 'mango' are hyponyms for the word 'fruit'. In this case, they are more specific concepts of the word 'fruit'. Furthermore, the term "fruit" is a hypernym for the words "banana" and "mango" because it refers to the general idea of fruits.
Example:
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('eclipse')
Output:
[Synset('eclipse.n.01'), Synset('overshadow.v.01'), Synset('eclipse.v.02')]
To get hyponym:
import nltk
from nltk.corpus import wordnet as wn
print(wn.synset('eclipse.n.01').hyponyms())
Output:
[Synset('lunar_eclipse.n.01'), Synset('partial_eclipse.n.01'), Synset('solar_eclipse.n.01'), Synset('total_eclipse.n.01')]
To get hypernym:
import nltk
from nltk.corpus import wordnet as wn
print(wn.synset('partial_eclipse.n.01').hypernyms())
Output:
[Synset('eclipse.n.01')]
A look into Meronyms and Holonyms
Meronyms and Holonyms create a part-to-whole relationship.
The Meronym represents the half, whereas the Holonym represents the whole. As you can see, the Meronym and Holonym both refer to the same thing, but in different ways.
For example, the word 'bedroom' is a Meronym for home. This is because the bedroom is considered a component of the house. Likewise, the nose, eyes, and mouth are Meronyms for the face.
Examples:
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('face')
Output:
[Synset('face.n.01'),Synset('expression.n.01'),Synset('face.n.03'), Synset('face.n.04'), Synset('face.n.05'),Synset('side.n.04'),Synset('face.n.07'),Synset('face.n.08'),Synset('grimace.n.01'),Synset('font.n.01'),Synset('face.n.11'),Synset('boldness.n.02'),Synset('face.n.13'), Synset('confront.v.02'),Synset('confront.v.01'),Synset('front.v.01'),Synset('face.v.04'),Synset('face.v.05'),Synset('confront.v.03'),Synset('face.v.07'),Synset('face.v.08'),Synset('face.v.09')]
Example for Holonym:
import nltk
from nltk.corpus import wordnet as wn
wn.synset('face.n.01').part_holonyms()
Output:
[Synset('head.n.01'), Synset('homo.n.02')]
Example for Meronym
import nltk
from nltk.corpus import wordnet as wn
wn.synset('face.n.01').part_meronyms()
Output:
[Synset('beard.n.01'),Synset('brow.n.01'),Synset('cheek.n.01'),Synset('chin.n.01'),Synset('eye.n.01'),Synset('eyebrow.n.01'),Synset('facial.n.01'),Synset('facial_muscle.n.01'),Synset('facial_vein.n.01'),Synset('feature.n.02'),Synset('jaw.n.02'),Synset('jowl.n.02'),Synset('mouth.n.02'),Synset('nose.n.01')]
Understanding NLTK Entailments
An entailment is similar to an insinuation, a conclusion that can only be derived from something even though it is not specifically expressed.
For example
import nltk
from nltk.corpus import wordnet as wn
wn.synsets('eat')
Output:
[Synset('eat.v.01'),
Synset('eat.v.02'),
Synset('feed.v.06'),
Synset('eat.v.04'),
Synset('consume.v.05'),
Synset('corrode.v.01')]
Example for Entailments
import nltk
from nltk.corpus import wordnet as wn
wn.synset('eat.v.01').entailments()
Output:
[Synset('chew.v.01'), Synset('swallow.v.01')]
Conclusion
In this article, we have looked at the different concepts applied using the nltk wordnet in python.
We started with understanding the synonyms and synsets by discussing how to use different methods to get Synsets, the definition, and all lemmas of a Synset.
Then, we also looked at the Hypernyms and Hyponyms. Lastly, we went through Meronyms, Holonyms, and Entailments.
Peer Review Contributions by: Jerim Kaura