Machine Learning Garden

Powered by 🌱Roam Garden

Supervised Learning

References

Notes {{word-count}}

Summary:

Key points:

In Supervised Learning, given \mathcal{D}=\left{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \ldots,\left(x_{n}, y_{n}\right)\right}, the objective is to learn fθ(x)yf_{\theta}(x) \approx y.

Generally, our goal is to predict yy given some xx.

However, prediction itself is very difficult because there are many boundary cases in the real world.

Predicting probabilities

So we use probabilities to represent the likelihood of a prediction falling into a certain category.

Predicting probabilities instead of labels can make training easier, which is due to smoothness,

Intuitively, discrete labels cannot be changed by a bit. It's either all or nothing.

Given \mathcal{D}=\left{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \ldots,\left(x_{n}, y_{n}\right)\right}, the objective is to learn pθ(yx)p_{\theta}(y \mid x).

xx is a Random Variable representing the input.

xx is a random variable because we do not know what xx we will get. There is some true underlying process in the real world that gives rise to different xx's.

yy is a Random Variable representing the output.

p(x,y)=p(x)p(yx)p(x, y)=p(x) p(y \mid x) by Chain Rule (Probability).

p(yx)=p(x,y)p(x)\displaystyle p(y \mid x)=\frac{p(x, y)}{p(x)} by the definition of Conditional Probability.

Models that learn p(yx)p(y \mid x) are called Discriminative Model because the goal is to discriminate between different yy's.

When predicting probabilities, instead of representing the output by object labels, we do it using objective probability: what is the likelihood of this object falling into this category?

Referenced in

Supervised Learning

In Supervised Learning, given \mathcal{D}=\left{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \ldots,\left(x_{n}, y_{n}\right)\right}, the objective is to learn fθ(x)yf_{\theta}(x) \approx y.

Supervised Learning