4 Comments

  1. Hi Giuseppe.

    I have once read a blog of yours where you have trained a convolutional autoencoder on a Cifar10 dataset. I have tried to replicate the code with a different architecture (no of layers and activation function etc) and it seems that the reconstruction are quite decent from the images in the test set.

    Now, I am obtaining the intermediate representation from the encoder and is trying to find k-nearest neighbours given a query image. However, the retrieved images are not consistent with the query.

    Could you please share some insight into why the reconstruction is good however the retrieval is kind of worse. I am reasoning that a good reconstruction trsnslates to good learned intermediate feature but it fails during knn for retrieval.

    Your comments is highly appreciated. Thanks

    • Hi,

      thanks for the comment.

      I think there’s a basic problem: autoencoders don’t preserve the distances. When you try use kNN, I suppose you’re adopting a Euclidean metrics. However, the internal representations can be projected on a manifold where the distances are completely distorted. You’re using, for example, Isomap, which tries to preserve the inner products, but a method that encodes part of of information in the weights of both networks.

      What you can try to do is to impose a L1 penalty on the feature vector, so to encourage sparsity, but I think the result won’t be excellent either. Another possible strategy is add a term which is proportional to distance between original images and feature vectors. This can slow down the process, but it can assure a better isometric mapping.

      Otherwise, if you need a non-linear dimensionality reduction with preserved distances, look at methods like Isomap or TSNE.

      Keep me posted about your experiments!

  2. Hi, Thanks for the post. If both input and output are of one-dimensional, how to modify your code to handle that kind of scenario?

    Thanks,

    • Hi,
      if you need to work with tensors like (batch_size, dim), you can simply use dense layers. A simple structure like Input -> Hidden 1 -> Code -> Hidden 2 -> Reconstruction can be enough. If the dataset has a temporal structure (batch_size, time step, dim), you can use 1D convolutions, however it depends on the specific problem.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.