Using Autoencoders for Image Classification

Author: Charter Global
Published: May 15, 2019
Categories: Machine Learning

Machine learning tasks are usually described in terms of how the machine learning model should process given data. For example, in a classification task, given the image of an elephant as an input, the AI model is required to specify the elephant subspecies it belongs to (ex: Indian Elephant, African Elephant, Tusker etc.). The state-of-the-art pre-trained models for classification  (NASNet, ResNet, Inception etc.) can only be used for similar tasks they are trained to handle.

Further, if they require base-level training for various tasks, the models require over 500 images per category. Autoencoders serve as a solution to the lack of pre-trained models for the use of building autoencoders for image classification works with fewer training images, eliminating the need to appoint over 500 per category. Also, autoencoders use a semi-supervised learning algorithm, that combines the strengths of both supervised and unsupervised learning algorithms.

Supervised Learning (Classification Algorithm):

Supervised learning consists of a set of distorted images along with corresponding labels as input and predicts desired label as output.

Unsupervised Learning (Clustering Algorithm):

Unsupervised learning contains images with no corresponding labels as an input. This model can categorize the data according to their patterns, similarities and differences, and returns them as an output.

The below figure is an example of how two different kinds of bottles, a tea cup and a trophy cup images that lack corresponding labels are classified as an input. Unsupervised learning model autonomously categorize the images as water bottles, teacup, trophy cup according to their similarities.

Semi-supervised Learning ( Autoencoders)

Autoencoders are neural networks that are comprised of an encoder and a decoder.  The encoder tries to create a fingerprint (encoding) of an input image. The decoder then tries to reconstruct the original input image from the encoded image. An autoencoders for image classification can take as input a distorted/transformed input image and can reconstruct the original good image.

In the below example, the autoencoders for image classification will learn during training that the 3 distorted images on the LHS are same as the good image on the RHS. The AI model is not explicitly taught about the similarities between distorted and good images, it learns the underlying patters by itself. So sometimes, semi-supervised learning is also called self-supervised learning.

When a new tea cup image is provided by the customer, the AI predictive model will retrieve matching product(s) from the catalogue based on the cosine similarity score.