Out of Vocabulary (OOV)
Out-of-vocabulary (OOV) are terms that are not part of the normal lexicon found in a natural language processing environment.
In speech recognition, it’s the audio signal that contains these terms. Word vectors are the mathematical equivalent of word meaning. But the limitation of word embeddings is that the words need to have been seen before in the training data.
When a word that’s not in the training set occurs in real data, this causes a problem. There are various techniques to avoid a zero-probability occurrence including smoothing and replacing the word a synonym.