A bag-of-\(n\)-grams model is a way to represent a document, similar to a [bag-of-words][/terms/bag-of-words/] model.
A bag-of-\(n\)-grams model represents a text document as an unordered collection of its \(n\)-grams.
For example, let’s use the following phrase and divide it into bi-grams (\(n = 2\)).
James is the best person ever.
In a typical bag-of-\(n\)-grams model, these 6 bigrams would be a sample from a large number of bigrams observed in a corpus. And then James is the best person ever. would be encoded in a representation showing which of the corpus’s bigrams were observed in the sentence.
A bag-of-\(n\)-grams model has the simplicity of the bag-of-words model, but allows the preservation of more word locality information.