Word embedding

A word embedding (or word vector) refers to a dense vector representation of a word.

Before word embeddings, words were typically represented with sparse vectors in bag-of-words models or with n-grams.

Word embeddings are typically trained with an unsupervised model over a large corpus of text. In the training process, the vectors are updated to better predict elements of the corpus, such as words surround a given target word.

At the end of this process, word embeddings often have geometric relations to each other that encode semantic meaning. A common example for this is using vector addition and subtraction to find related words. The vector ${K i n g}_{v} - {M a n}_{v} + W o m a n$ is most similar to the vector ${Q u e e n}_{v}$ . This example comes from the project page for word2vec.

Common implementations of word embeddings include:

word2vec
GloVe
fastText

Search Results