We want to learn pθ(y∣x)p_\theta (y \mid x)pθ​(y∣x), and it is a model which approximates the true p(y∣x)p(y \mid x)p(y∣x).