We choose θ\thetaθ such that p(D)=∏ip(xi)pθ(yi∣xi)p(\mathcal{D})=\prod_{i} p\left(x_{i}\right) p_{\theta}\left(y_{i} \mid x_{i}\right)p(D)=∏ip(xi)pθ(yi∣xi) is maximized.