Word embedding

A word embedding (or word vector) refers to a dense vector representation of a word.

Before word embeddings, words were typically represented with sparse vectors in bag-of-words models or with n-grams.

Word embeddings are typically trained with an unsupervised model over a large corpus of text. In the training process, the vectors are updated to better predict elements of the corpus, such as words surround a given target word.

At the end of this process, word embeddings often have geometric relations to each other that encode semantic meaning. A common example for this is using vector addition and subtraction to find related words. The vector KingvManv+Woman is most similar to the vector Queenv. This example comes from the project page for word2vec.

Common implementations of word embeddings include: