**References**

**Notes** {{word-count}}

**Summary**:

**Key points**:

Cross-Entropy measures how similar two distributions $p_\theta$ and $p$ are.

$H\left(p, p_{\theta}\right)=-\sum_{y} p\left(y \mid x_{i}\right) \log p_{\theta}\left(y \mid x_{i}\right)$

If we assume $y_{i} \sim p\left(y \mid x_{i}\right)$, meaning the label is sampled from the true distribution, then $H\left(p, p_{\theta}\right) \approx-\log p_{\theta}\left(y_{i} \mid x_{i}\right)$.

Cross-Entropy measures how similar two distributions $p_\theta$ and $p$ are.

Negative Log-Likelihood (NLL) is sometimes also called as Cross-Entropy.