• Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

Giuseppe Bonaccorso

Artificial Intelligence – Machine Learning – Data Science

  • Blog
  • Podcasts
  • Books
  • Resume / CV
  • Bonaccorso’s Law
  • Essays
  • Contact
  • Testimonials
  • Disclaimer

Lossy image autoencoders with convolution and deconvolution networks in Tensorflow

07/29/2017 Convnet Deep Learning Neural networks Python Tensorflow 6 Comments

Fork
Autoencoders are a very interesting deep learning application because they allow a consistent dimensionality reduction of an entire dataset with a controllable loss level. The Jupyter notebook for this small project is available on the Github repository: https://github.com/giuseppebonaccorso/lossy_image_autoencoder.

The structure of a generic autoencoder is represented in the following figure:

The encoder is a function that processes an input matrix (image) and outputs a fixed-length code:

In this model, the encoding function is implemented using a convolutional layer followed by flattening and dense layers. The code is then fed into the decoder, which reconstructs a lossy version of the original image:

The decoder is implemented using a deconvolutional (separable convolution) layer with 3 filters (one per channel). The model is trained minimizing the L2 loss:

For the experiment, I’ve used the CIFAR-10 dataset (https://www.cs.toronto.edu/~kriz/cifar.html), using only the training samples (50000 32 x 32 RGB images) and the Keras wrapper:

from keras.datasets import cifar10

(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

The model is implemented using Tensorflow with RMSProp optimizer:

import tensorflow as tf

width = 32
height = 32
batch_size = 10
nb_epochs = 15
code_length = 128

graph = tf.Graph()

with graph.as_default():
    # Global step
    global_step = tf.Variable(0, trainable=False)
    
    # Input batch
    input_images = tf.placeholder(tf.float32, shape=(batch_size, height, width, 3))

    # Convolutional layer 1
    conv1 = tf.layers.conv2d(inputs=input_images,
                             filters=32,
                             kernel_size=(3, 3),
                             kernel_initializer=tf.contrib.layers.xavier_initializer(),
                             activation=tf.nn.tanh)

    # Convolutional output (flattened)
    conv_output = tf.contrib.layers.flatten(conv1)

    # Code layer
    code_layer = tf.layers.dense(inputs=conv_output,
                                 units=code_length,
                                 activation=tf.nn.tanh)
    
    # Code output layer
    code_output = tf.layers.dense(inputs=code_layer,
                                  units=(height - 2) * (width - 2) * 3,
                                  activation=tf.nn.tanh)

    # Deconvolution input
    deconv_input = tf.reshape(code_output, (batch_size, height - 2, width - 2, 3))

    # Deconvolution layer 1
    deconv1 = tf.layers.conv2d_transpose(inputs=deconv_input,
                                         filters=3,
                                         kernel_size=(3, 3),
                                         kernel_initializer=tf.contrib.layers.xavier_initializer(),
                                         activation=tf.sigmoid)
    
    # Output batch
    output_images = tf.cast(tf.reshape(deconv1, 
                                       (batch_size, height, width, 3)) * 255.0, tf.uint8)

    # Reconstruction L2 loss
    loss = tf.nn.l2_loss(input_images - deconv1)

    # Training operations
    learning_rate = tf.train.exponential_decay(learning_rate=0.0005, 
                                               global_step=global_step, 
                                               decay_steps=int(X_train.shape[0] / (2 * batch_size)), 
                                               decay_rate=0.95, 
                                               staircase=True)
    
    trainer = tf.train.RMSPropOptimizer(learning_rate)
    training_step = trainer.minimize(loss)

For a code length equal to 128 (float32/64), which is quite smaller than the image size: 32 x 32 x 3 = 3072 bytes, therefore the reconstruction error will be medium-high (it’s useful to test different values to find the best trade-off). Moreover, it’s also possible to add an L1 regularization to the code in order to increase sparsity. The training code is shown in the following snippet:

import numpy as np

def create_batch(t, gray=False):
    X = np.zeros((batch_size, height, width, 3 if not gray else 1), dtype=np.float32)
        
    for k, image in enumerate(X_train[t:t+batch_size]):
        if gray:
            X[k, :, :, :] = rgb2gray(image)
        else:
            X[k, :, :, :] = image / 255.0
        
    return X


def train():
    for e in range(nb_epochs):
        total_loss = 0.0

        for t in range(0, X_train.shape[0], batch_size):
            feed_dict = {
                input_images: create_batch(t)
            }

            _, v_loss = session.run([training_step, loss], feed_dict=feed_dict)
            total_loss += v_loss

        print('Epoch {} - Total loss: {}'.format(e+1, total_loss))

After 15 epochs (in a production implementation it’s useful to increase this value until the loss function stops decreasing), the reconstruction of some random images is shown in the following figure (first row, original images, second row, reconstructed ones):

As expected the quality is not very high, but the “semantics” of each image is almost preserved. Possible improvements include:

  • Adding a flag (using a placeholder) to use the model for both training and prediction. In the former mode, the input is an image batch, while in the latter is a code batch
  • Using L1 (and/or L2) code regularization

See also:

CIFAR-10 image classification with Keras ConvNet – Giuseppe Bonaccorso

CIFAR-10 is a small image (32 x 32) dataset made up of 60000 images subdivided into 10 main categories. Check the web page in the reference list in order to have further information about it and download the whole set.

Share:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Skype (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to print (Opens in new window)

You can also be interested in these articles:

autoencodercifarconvnettensorflow

Machine Learning Algorithms

SVD Recommendations using Tensorflow

6 thoughts on “Lossy image autoencoders with convolution and deconvolution networks in Tensorflow”

  1. randompost27
    10/04/2017 at 22:57

    Hi Giuseppe.

    I have once read a blog of yours where you have trained a convolutional autoencoder on a Cifar10 dataset. I have tried to replicate the code with a different architecture (no of layers and activation function etc) and it seems that the reconstruction are quite decent from the images in the test set.

    Now, I am obtaining the intermediate representation from the encoder and is trying to find k-nearest neighbours given a query image. However, the retrieved images are not consistent with the query.

    Could you please share some insight into why the reconstruction is good however the retrieval is kind of worse. I am reasoning that a good reconstruction trsnslates to good learned intermediate feature but it fails during knn for retrieval.

    Your comments is highly appreciated. Thanks

    Reply
    • Giuseppe Bonaccorso
      10/05/2017 at 9:16

      Hi,

      thanks for the comment.

      I think there’s a basic problem: autoencoders don’t preserve the distances. When you try use kNN, I suppose you’re adopting a Euclidean metrics. However, the internal representations can be projected on a manifold where the distances are completely distorted. You’re using, for example, Isomap, which tries to preserve the inner products, but a method that encodes part of of information in the weights of both networks.

      What you can try to do is to impose a L1 penalty on the feature vector, so to encourage sparsity, but I think the result won’t be excellent either. Another possible strategy is add a term which is proportional to distance between original images and feature vectors. This can slow down the process, but it can assure a better isometric mapping.

      Otherwise, if you need a non-linear dimensionality reduction with preserved distances, look at methods like Isomap or TSNE.

      Keep me posted about your experiments!

      Reply
  2. wine lover
    01/15/2018 at 23:57

    Hi, Thanks for the post. If both input and output are of one-dimensional, how to modify your code to handle that kind of scenario?

    Thanks,

    Reply
    • Giuseppe Bonaccorso
      01/16/2018 at 11:19

      Hi,
      if you need to work with tensors like (batch_size, dim), you can simply use dense layers. A simple structure like Input -> Hidden 1 -> Code -> Hidden 2 -> Reconstruction can be enough. If the dataset has a temporal structure (batch_size, time step, dim), you can use 1D convolutions, however it depends on the specific problem.

      Reply
  3. Jean-Francois Ducher
    10/29/2018 at 12:23

    Hi
    I am new to convolution auto-encoders so my questions will probably sound very basic and I apologize for this.

    The decoder and encoder in your implementation are very dissymmetrical.

    conv1 is 32 30×30 maps, which translates into a flattened 28800 dimension structure (a huge increase from the original 3072-dimension image).

    Then there is a HUGE dimension reduction to the 128 dimension code layer – that’s 225-to-1 !

    But then, the next fully connected layer (code_output) expands to 30303 dimensions (that is to say, roughly 10 times smaller than on the other side)

    That layer then deconvolves to the original image size of 32323 dims.

    would it make sense to have the 32 30×30 maps on the decoder side too and link the weights of the deconvolution layer to those of the convolution layer ? (following http://people.idsia.ch/~ciresan/data/icann2011.pdf)

    wouldn’t it improve the quality of the learning to use 2-3 convolution layers with pooling in between, instead of fully connected layers in the middle, to avoid the dramatic dimension reduction mentioned above ?

    Looking forward to reading you
    best

    Reply
    • Giuseppe Bonaccorso
      11/04/2018 at 11:41

      Hi Jean-Francois,

      the general goal of an autoencoder is to reduce the dimensionality of a dataset by transforming each sample into a very compact code. There are no specific rules in modeling the architecture and your comment is quite interesting. Symmetry is not a necessary condition when, for example, the goal is to force the network to extract all the main features and to generate very compact (and sometimes, also sparse) codes (which are generally represented as dense layers) that can be reconstructed using a smaller deconvolutional network.

      The quality is impacted by all dimensions, so if you increase the number of weights (adding convolutions or deconvolutions), the results are likely to be more accurate. However, an auto-encoder stores part of the “knowledge” in the network itself, so there should be always a trade-off between precision and complexity.

      Mine was a very basic example that everybody can improve by increasing the complexity. The goal was to show the “dynamics” of a simple auto-encoder without analyzing all the influencing factors (e.g. you can add an L1 loss on the code layer to force sparsity). If you’re interested, you can find some examples in the repository https://github.com/PacktPublishing/Mastering-Machine-Learning-Algorithms (Chapter 11) where I show more complex standard auto-encoders, denoising auto-encoders, sparse-autoencoder and variational auto-encoders.

      Reply

Leave a Reply Cancel reply

Follow Me

  • linkedin
  • twitter
  • facebook
  • googlescholar
  • youtube
  • github
  • amazon
  • medium

Search articles

Latest blog posts

  • Mastering Machine Learning Algorithms Second Edition 02/03/2020
  • EphMrA 2019 Switzerland one day meeting 08/30/2019
  • Machine Learning Algorithms – Second Edition 08/28/2018
  • Recommendations and User-Profiling from Implicit Feedbacks 07/10/2018
  • Are recommendations really helpful? A brief non-technical discussion 06/29/2018

Subscribe to this blog

Join 2,199 other subscribers

Follow me on Twitter

My Tweets
Copyright © Giuseppe Bonaccorso. All Rights Reserved
Proudly powered by WordPress | Theme: Doo by ThemeVS.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.