Natural language processing (NLP)

Natural language processing (NLP)

Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret, and manipulate human language.

Indirakumar S's photo
Indirakumar S
·Sep 21, 2022·

5 min read

Subscribe to our newsletter and never miss any upcoming articles

Play this article

Table of contents

  • RECAP
  • What is NLP
  • Techniques:
  • Applications:
  • See you soon

RECAP

Previous blog, We learned about what reinforcement is and the types of approaches it can take. Here, we are going to see a very interesting concept called "NLP".

What is NLP

The field of computer science known as "natural language processing" (NLP) is more particularly the field of "artificial intelligence" (AI) that is focused on enabling computers to comprehend spoken and written language like that of humans.

Computational linguistics, or the rule-based modelling of human language, is combined with statistical, machine learning, and deep learning models to form NLP. With the use of these technologies, computers can now interpret human language in the form of text or audio data and fully "understand" what is being said or written, including the speaker's or writer's intentions and mood.

Computer programmes that translate text between languages, reply to spoken commands, and quickly summarise vast amounts of text even in real-time are all powered by NLP. You've probably used NLP through voice-activated GPS devices, digital assistants, speech-to-text dictation programmes, customer service chatbots, and other consumer conveniences. However, the use of NLP in corporate solutions is expanding as a means of streamlining company operations, boosting worker productivity, and streamlining mission-critical business procedures.

Techniques:

Segmentation

In order to identify the lengthier processing units made up of one or more words, sentences must first be segmented. To complete this test, you must recognise the boundaries between sentences and words. Due to the punctuation marks that appear at the ends of sentences in the majority of written languages, sentence segmentation is also known as sentence boundary detection, sentence boundary disambiguation, or sentence boundary identification. Choosing how to segment a text into sentences for further processing is the task that is described by all of these words.

Tokenisation

By identifying the word boundaries the locations where one-word finishes and another begins and tokenizing the text, the sequence of letters in the text is broken up. These words are sometimes referred to as tokens for computational linguistics reasons. Word segmentation, which is sometimes used synonymously with tokenization in written languages where word boundaries are not explicitly defined by the writing system, is another name for tokenization.

Stop word

The terms "stop words," "stop word list," and even "stop list" are frequently used while dealing with text mining software. In general, stop words are a group of frequently used terms in all languages, not only English.

Determiners: Determiners tend to mark nouns in situations when a determiner ordinarily is followed by a noun. Examples: the, a, an, another

Coordinating conjunctions: Coordinating conjunctions join words, phrases, and clauses. Examples include: for, an, nor, but, or, yet, and so.

Prepositions: Prepositions convey geographical or temporal relationships. Some examples are in, beneath, toward, and before.

Stemming

Stemming is a method for eliminating affixes from words to reveal their basic structure. It is analogous to trimming a tree's branches down to the trunk. For instance, the word eat is the root of the verbs eating, eats, and eaten.

Stemming is used by search engines to index words. Because of this, a search engine can only record the stems of a word rather than all of its variations. Stemming does this by reducing the size of the index and improving retrieval precision.

Lemmatization

Similar to stemming is the lemmatization approach. The final product of lemmatization is referred to as a "lemma," which is a root word as opposed to a root stem, the final product of stemming. We will obtain a valid term with the same meaning after lemmatization.

The WordNetLemmatizer class from NLTK is a simple wrapper for the WordNet corpus. To locate a lemma, this class employs the WordNet CorpusReader class's Morphy() method.

Part of Speech (PoS) Tagging

A phrase is transformed into forms using this method, which results in a list of words and a list of tuples, each of which has a form (word, tag). Whether a word is a noun, adjective, verb, etc., is indicated by the part-of-speech tag in the instance.

Named Entity Recognition

The term "Named Entity" was initially suggested at the Message Understanding Conference (MUC-6) to recognise names of businesses, individuals, and places in the text, as well as phrases for money, time, and percentages. Since then, NER and Information Extraction (IE) algorithms using text-based data for numerous scientific events have attracted growing interest.

Applications:

Speech recognition-The process of accurately translating voice data into text is known as speech recognition, commonly referred to as speech-to-text. Any programme that responds to voice commands or inquiries must use speech recognition. The way individuals speak quickly, slurring words together, with varied emphasis and intonation, in various dialects, and frequently using improper grammar makes speech recognition particularly difficult.

Part of speech tagging-Grammatical tagging, also known as part of speech tagging, is the technique of identifying the part of speech of a certain word or passage of text based on its use and context. I can construct a paper plane, and make it classified as a part of speech as a noun in 'What makes of automobile do you own?'

Word sense disambiguation-Choosing a word's meaning from among its potential meanings by utilising the semantic analysis to determine which word makes the most sense in the given circumstance is known as word sense disambiguation. By way of example, word sense disambiguation helps to distinguish between the meanings of the verbs "make," "make the grade," (accomplish), and "make a bet" (place).

Sentiment analysis-Sentiment analysis looks for intangible elements in the text, such as attitudes, feelings, sarcasm, bewilderment, and mistrust.

Natural language generation-Natural language generation is the process of converting structured data into human language; it is frequently referred to as the reverse of voice recognition or speech-to-text.

See you soon

Turn on your notifications and stay in touch with us to learn about new ideas. Share this blog with your friends and family... I will see you in the next blog. Thank you all.

 
Share this