In English grammar, words are grouped together based on their grammatical properties. At the highest level the parts of speech are the noun, pronoun, adjective, verb, adverb, preposition, conjunction, interjection, numeral, article, and determiner.
In natural language processing, each word in a sentence is assigned a part-of-speech (known as part-of-speech tagging). PoS tagging helps in information retrieval, text-to-speech conversion, and word sense disambiguation. There are some pre-defined sets of tags, such as UPenn TreeBank II that help to keep things standardized.
PoS tagging is challenging primarily due to the ambiguity of language. Take the sentence “flies like a flower” as an example:
- Flies could be a noun or a verb
- Like could be a preposition, adverb, conjunction, noun, or verb
- A could be an article, noun, or preposition.
- Flower could be a noun or verb.
The main approaches to tagging are rule-based, transformation-based, and stochastic or probabilistic, which is most commonly used.