Generative Deep Learning: Variational Autoencoders (part I)

Last update: 16 February 2020

Generative models

Generative models are a class of statistical models that are able generate new data points. They have a variety of applications and they are really fun to play with.
In this article, we're gonna explore one type of the generative models called Variational autoencoder (VAE)
Before diving into VAE, let's first understand what autencoders are.



Autoencoders is an unsupervised learning approach that aims to learn lower dimensional features representation of the data. This is achieved by training a neural network to reconstruct the original data by placing some constraints on the architecture.

Autoencoder architecture

Given enough capacity, the autoencoders can learn the identity function and miss to capture the most important features of the data. So to prevent this from happening, one solution would be to introduce regularization.
Let's explore some used techniques.

Regularized Autoencoder

Sparse Autoencoder

The sparsity can be achieved by different methods, the most used ones are :

Using KL divergence as a regularization term
KL divergence measures how two probabilistic distributions are different from each other. We're gonna use this property to add regularization to our network, and this is how we do it:

We define a sparsity parameter \(\pmb{\rho}\) that typically takes small values \(\pmb{\rho} = 0.01 \) .
Then we calculate the average activation of the hidden unit \( j \) that we call \( \pmb{\widehat{\rho}_{j}} \)

$$ \hat{\rho_j} = \frac{1}{m} \sum_{i=1}^m h_j(x^{(i)}) $$

The goal is to enforce:
$$ \hat{\rho_j} = \rho $$

To achieve this, we're gonna add an extra term to the total loss that penalizes \( \widehat{\rho}_j \) that deviate significalty from \( \rho \)

$$ L = \mathcal{L}(x, {x'} ) + \lambda\sum_{j}KL(\rho \| \hat{\rho_j}) $$

Applying L1 or L2 regularization

$$ L = \mathcal{L}(x, {x'} ) +\lambda\sum_{j} h_j ^2 $$

$$ L = \mathcal{L}(x, {x'} ) +\lambda\sum_{j} |h_j| $$

Using Relu activation function for the hidden layer
Using dropout

Denoising Autoencoder

With denoising autoencoder, rather than adding a regularization term, the network is trained to recover the original undistorted input from a partially corrupted input. This force the network learn the useful features.

Contractive Autoencoder

Use Frobenius norm of the Jacobian matrix of the encoder with respect to the input.

$$ L = \mathcal{L}(x, {x'} ) +\lambda\sum_{j} || \nabla_xh_j ||^2 $$

\( \mathcal{L}(x, x') \) is often L2 loss.

Applications of Autoencoders

Autoencoders have a variety of applications, among those we can find:

Variational Autoencoders


VAE are based on two important assumptions:

The goal now would be to estimate the parameters of the distribution without having access to latent state z.
For simplicity we're gonna assume that:

To estimate the latent state z from input data x, we could use the Bayes rule:
$$ p_\theta(z|x) = \frac{p_\theta(x|z)p_\theta(z)}{p_\theta(x)} $$

The problem we get here is that \( p_\theta(x) \) is intractable integral. So, to solve this, we're gonna use neural network to estimate a distribution over latent states that we call \( q_\phi(z|x) \) then we sample over it to get the latent state \( z\).

Generating new data with VAE

After the training, we throw away the encoder part and we sample from the latent space \( z \) to generate new samples.

In the next part, we'll see how to train VAEs to generate new data, stay tuned !

Recommended reading