Capture-recapture model

A capture-recapture model is a technique to estimate an unknown population by capturing, tagging, and re-capturing samples from the population.

In the article How many Mechanical Turk workers are there?, Panos Ipeirotis explains a simple version of a capture-recapture model as follows:

The simplest possible technique is the following:

• Capture/marking phase: Capture $$n_1$$ animals, mark them, and release them back.

• Recapture phase: A few days later, capture $$n_2$$ animals. Assuming there are $$N$$ animals overall, $$n_1/N$$ of them are marked. So, for each of the $$n_2$$ captured animals, the probability that the animal is marked is $$n_1/N$$ (from the capture/marking phase).

• Calculation: On expectation, we expect to see $$n_2 \cdot \frac{n_1}{N}$$ marked animals in the recapture phase. (Notice that we do not know $$N$$.) So, if we actually see $$m$$ marked animals during the recapture phase, we set $$m = n_2 \cdot \frac{n_1}{N}$$ and we get the estimate that:

$N=n_1 \cdot \frac{n_2}{m}$

He adds that this basic version of a capture-recapture model makes the following assumptions, and the estimate $$N$$ can be inaccurate when these assumptions are violated:

• Assumption of no arrivals / departures (“closed population”): The vanilla capture-recapture scheme assumes that there are no arrivals or departures of workers between the capture and recapture phase.

• Assumption of no selection bias (“equal catchability”): The vanilla capture-recapture scheme assumes that every worker in the population is equally likely to be captured.