Machine Learning Garden

Powered by 🌱Roam Garden

Maximum Likelihood Estimation (MLE)


Tags: concept


Related notes:


April 20th, 2021: created note.

Notes {{word-count}}


Key points:

We want to learn pθ(y∣x)p_\theta (y \mid x), and it is a model which approximates the true p(y∣x)p(y \mid x).

A good model should make the data look probable.

We choose θ\theta such that p(D)=∏ip(xi)pθ(yi∣xi)p(\mathcal{D})=\prod_{i} p\left(x_{i}\right) p_{\theta}\left(y_{i} \mid x_{i}\right) is maximized.

However, one numerical problem here is that we are multiply together many numbers less than one.

To solve the problem, we can use log⁡\log to convert multiplication into addition.

log⁡p(D)=∑ilog⁡p(xi)+log⁡pθ(yi∣xi)=∑ilog⁡pθ(yi∣xi)+ const \log p(\mathcal{D})=\sum_{i} \log p\left(x_{i}\right)+\log p_{\theta}\left(y_{i} \mid x_{i}\right) =\sum_{i} \log p_{\theta}\left(y_{i} \mid x_{i}\right)+\text { const }

θ⋆←arg⁡max⁡θ∑ilog⁡pθ(yi∣xi)\theta^{\star} \leftarrow \arg \max _{\theta} \sum_{i} \log p_{\theta}\left(y_{i} \mid x_{i}\right)

Maximum Likelihood Estimation (MLE)