# Neural Nets¶

Neural nets can be used to approximate nonlinear functions with a hypothesis!

Neural nets are an intuitive extension of perceptron - think of perceptron as a 1-layer neural net.

But in perceptron, the output is just the sign of the linear combination of the inputs - in NN, we use an
*activation function*

Activations should be nonlinear - if they’re linear, it’s redundant

## Activations¶

- sign: -1, 0, or 1
- not differentiable
- often used for output layer

- tanh: \(\frac{e^{2x}-1}{e^{2x}+1}\)
- sigmoid: \(\frac{e^x}{1+e^x}\)
- ReLU: \(\max(0, x)\)

## Training¶

Let’s consider the following loss objective on a 2-layer neural net with weights W and v:

we just need to find

We do this using *backpropogation*.

## VAE¶

Whereas normal AE turns images into a latent vector, VAE tries to learn the parameters of a gaussian distribution that the image is a mixture of

where \(c_i\) is a component of the latent vector, \(e_i\) is a random exponential term, and \(m_i\) and \(\sigma_i\) are the gaussian variables.

## KL Divergence¶

Roughly, a measure of how close two distributions are to each other (>= 0)