Machine Learning Garden

Powered by 🌱Roam Garden

Maximum Likelihood Estimation (MLE)

References

Tags: concept

Sources:

Related notes:

Updates:

April 20th, 2021: created note.

Notes {{word-count}}

Summary:

Key points:

We want to learn pθ(yx)p_\theta (y \mid x), and it is a model which approximates the true p(yx)p(y \mid x).

A good model should make the data look probable.

We choose θ\theta such that p(D)=ip(xi)pθ(yixi)p(\mathcal{D})=\prod_{i} p\left(x_{i}\right) p_{\theta}\left(y_{i} \mid x_{i}\right) is maximized.

However, one numerical problem here is that we are multiply together many numbers less than one.

To solve the problem, we can use log\log to convert multiplication into addition.

logp(D)=ilogp(xi)+logpθ(yixi)=ilogpθ(yixi)+ const \log p(\mathcal{D})=\sum_{i} \log p\left(x_{i}\right)+\log p_{\theta}\left(y_{i} \mid x_{i}\right) =\sum_{i} \log p_{\theta}\left(y_{i} \mid x_{i}\right)+\text { const }

θargmaxθilogpθ(yixi)\theta^{\star} \leftarrow \arg \max _{\theta} \sum_{i} \log p_{\theta}\left(y_{i} \mid x_{i}\right)

Maximum Likelihood Estimation (MLE)