A bag-of--grams model is a way to represent a document, similar to a [bag-of-words][/terms/bag-of-words/] model.
A bag-of--grams model represents a text document as an unordered collection of its -grams.
For example, let’s use the following phrase and divide it into bi-grams ().
James is the best person ever.
becomes
<start>James
James is
is the
the best
best person
person ever.
ever.<end>
In a typical bag-of--grams model, these 6 bigrams would be a sample from a large number of bigrams observed in a corpus. And then James is the best person ever. would be encoded in a representation showing which of the corpus’s bigrams were observed in the sentence.
A bag-of--grams model has the simplicity of the bag-of-words model, but allows the preservation of more word locality information.