References
Notes {{word-count}}
Summary:
Key points:
We want to learn , and it is a model which approximates the true .
A good model should make the data look probable.
We choose such that is maximized.
However, one numerical problem here is that we are multiply together many numbers less than one.
To solve the problem, we can use to convert multiplication into addition.
This can also be formulated as a minimization problem.
This is also called Negative Log-Likelihood (NLL).