Search Results

Word embedding

A word embedding (or word vector) refers to a dense vector representation of a word.

Before word embeddings, words were typically represented with sparse vectors in bag-of-words models or with n-grams.

Word embeddings are typically trained with an unsupervised model over a large corpus of text. In the training process, the vectors are updated to better predict elements of the corpus, such as words surround a given target word.

At the end of this process, word embeddings often have geometric relations to each other that encode semantic meaning. A common example for this is using vector addition and subtraction to find related words. The vector \(\mathrm{King}_v - \mathrm{Man}_v + \mathrm{Woman}\) is most similar to the vector \(\mathrm{Queen}_v\). This example comes from the project page for word2vec.

Common implementations of word embeddings include: