Machine Learning Garden

Powered by 🌱Roam Garden

Softmax

References

Tags: concept

Sources:

Related notes:

Updates:

Notes {{word-count}}

Summary:

Key points:

How to make a number zz postive?

exp⁑(z)\exp(z) is especially convenient because it is one to one and onto and it maps the entire real number line to an entire set of positive real numbers.

How to make a bunch of numbers sum to 1?

We do this using normalization by z1βˆ‘i=1nzi\displaystyle \frac{z_1}{\sum^n_{i=1} z_i}.

softmax⁑dog⁑(fdog⁑(x),fcat(x))=exp⁑(fdog⁑(x))exp⁑(fdog⁑(x))+exp⁑(fcat(x))\displaystyle \operatorname{softmax}_{\operatorname{dog}}\left(f_{\operatorname{dog}}(x), f_{\mathrm{cat}}(x)\right)=\frac{\exp \left(f_{\operatorname{dog}}(x)\right)}{\exp \left(f_{\operatorname{dog}}(x)\right)+\exp \left(f_{\mathrm{cat}}(x)\right)}

For NN possible labels, p(y∣x)p(y \mid x) is a vector of NN elements, and fθ(x)f_\theta (x) is a vector -valued function with NN outputs.

p(y=i∣x)=softmax⁑(fΞΈ(x))[i]=exp⁑(fΞΈ,i(x))βˆ‘j=1Nexp⁑(fΞΈ,j(x))\displaystyle p(y=i \mid x)=\operatorname{softmax}\left(f_{\theta}(x)\right)[i]=\frac{\exp \left(f_{\theta, i}(x)\right)}{\sum_{j=1}^{N} \exp \left(f_{\theta, j}(x)\right)}

Why is it called a Softmax?

Softmax