Cross-validation refers to a group of techniques for dividing a dataset into multiple pieces.
Many methods of cross validation boil down to some variation of the following steps:
- Split the input dataset into a training dataset and a validation dataset.
- Train a model on the training dataset.
- Measure the trained model’s accuracy against the validation set.
- Repeat the above steps with different training/validation splits, allowing you to measure the variance in validation accuracy caused by different splits.
For example, \(k\)-fold cross validation is a more general version of the above procedure:
- The input dataset is divided into \(k\) shards.
- Each individual shard is used as a separate validation dataset, measuring the performance of a model trained against the other \(k-1\) shards.