References
Notes {{word-count}}
Summary:
Key points:
How is a Dataset generated?
There exists an underlying data generating distribution .
The Conditional Probability distribution over labels is represented as .
By the Chain Rule (Probability), the joint distribution of is .
A training set, \mathcal{D}=\left{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \ldots,\left(x_{n}, y_{n}\right)\right}, is generated by the joint distribution of .