The softmax turns numbers in into a probability distribution proportional to the size of the numbers.
Given an -dimensional vector with all component terms in , the softmax of is:
The softmax turns numbers in into a probability distribution proportional to the size of the numbers.
Given an -dimensional vector with all component terms in , the softmax of is: