Introduction:

Multilabel classification is an important task in computer vision that involves predicting multiple labels or categories from an image.

We can use multilabel classification in e-commerce platforms like bagisto, where e-commerce provides product search through images. For example, Google Lens can search products by Images.

Traditionally, this has been achieved by training separate models for each label and then merging their predictions. However, this approach can be time-consuming and inefficient because of multiple models for predictions.

Slow response time is bad for the user experience. To reduce the response time we use multiple classification single model. In this blog post, I will tell you about developing an efficient multilabel classification architecture and its use in e-commerce.

1. Single Multiclass Classification Model:

To start, I began with a single multiclass classification model where each image is assigned a single label from different categories.

However, this approach required training with different models, one for each label, and then combining their predictions to obtain the final multilabel output.

Example:

Load Different Models:

# Import necessary libraries
import numpy as np
from keras.models import load_model

# Load the pre-trained models
color_model = load_model('color_detection_model.h5')
master_category_model = load_model('master_category_model.h5')
category_model = load_model('category_model.h5')
sub_category_model = load_model('sub_category_model.h5')

# Import necessary libraries

import numpy as np

from keras.models import load_model

# Load the pre-trained models

color_model = load_model('color_detection_model.h5')

master_category_model = load_model('master_category_model.h5')

category_model = load_model('category_model.h5')

sub_category_model = load_model('sub_category_model.h5')

In this, we load trained models using Keras library.

Define Prediction Function:

# Function to predict the label using each model
def predict_label(image):
    color_label = np.argmax(color_model.predict(image))
    master_category_label = np.argmax(master_category_model.predict(image))
    category_label = np.argmax(category_model.predict(image))
    sub_category_label = np.argmax(sub_category_model.predict(image))
    
    return color_label, master_category_label, category_label, sub_category_label

# Function to predict the label using each model

def predict_label(image):

color_label = np.argmax(color_model.predict(image))

master_category_label = np.argmax(master_category_model.predict(image))

category_label = np.argmax(category_model.predict(image))

sub_category_label = np.argmax(sub_category_model.predict(image))

return color_label, master_category_label, category_label, sub_category_label

In this, we predict different labels from different models using the model.predict. model.predict gives probabilities for different categories, np.argmax selects maximum probability and we get a label.

Predictions:

# Sample image (you would load your own image here)
image = np.random.rand(1, 224, 224, 3)

# Get the predicted labels
color_label, master_category_label, category_label, sub_category_label = predict_label(image)

# Combine the predictions to get the final multilabel output
multilabel_output = {
    "color": color_label,
    "master_category": master_category_label,
    "category": category_label,
    "sub_category": sub_category_label,
}

# Sample image (you would load your own image here)

image = np.random.rand(1, 224, 224, 3)

# Get the predicted labels

color_label, master_category_label, category_label, sub_category_label = predict_label(image)

# Combine the predictions to get the final multilabel output

multilabel_output = {

"color": color_label,

"master_category": master_category_label,

"category": category_label,

"sub_category": sub_category_label,

}

Now we get a result, like color: Red, master_category: Clothes, Category: T-shirt, and sub_category: Casual.

I train different models like color detection, Master Category of product, category of product, and sub-category of product. Each model predicts different labels, and we use these models by merging them together.

Suppose these models are in e-commerce websites for visual shopping, Customers upload a photo, and the model can predict the actual product and show similar products to the customer.

Although accurate, this method suffered from slow response times due to the need for multiple models.

2. Researching Efficient Multilabel Classification:

In order to overcome the slow response time issue, I had to do further research and discovered an alternative approach that could yield more efficient results.

The key insight was to design a single model architecture capable of predicting all multiple labels simultaneously.

By combining the architecture of the individual models, I could eliminate the need for merging predictions and achieve faster inference times.

3. Building the Multilabel Classification Model:

To implement this approach, I organized the model architecture as a class method. I created a class that encapsulates the architecture design and functionality.

Within this class, I defined separate functions, each responsible for constructing the architecture of one label.

Architecture:

Import Necessary libraries

import tensorflow as tf
from tensorflow.keras.layers import Lambda
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Activation, BatchNormalization

import tensorflow as tf

from tensorflow.keras.layers import Lambda

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Activation, BatchNormalization

we import different Tensorflow.keras.layers

Conv2D, MaxPooling2D: for features extraction from images.
Flatten: for flattening the features vector.

Dense: Nural network layer.

Dropout, Activation, and BatchNormalization are three commonly used techniques in deep learning architectures to improve model performance, prevent overfitting, and accelerate convergence during training

Model:

class MultiLableClf:
    @staticmethod
    def build_category1(inputs, num_category1, finalAct="softmax"):
        # CONV => BN => RELU => POOL
        x = Conv2D(32, (3, 3), padding="same")(inputs)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)
        x = Flatten()(x)
        # Fully connected layers
        x = Dense(256)(x)
        x = Activation("relu")(x)
        x = Dense(num_Category)(x)
        x = Activation(finalAct, name="category")(x)
        
        # Return the category prediction sub-network
        return x

    @staticmethod
    def build_category2(inputs, num_category2, finalAct="softmax"):
        # CONV => BN => RELU => POOL
        x = Conv2D(32, (3, 3), padding="same")(inputs)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.25)(x)
        x = Flatten()(x)
        # Fully connected layers
        x = Dense(256)(x)
        x = Activation("relu")(x)
        x = Dense(num_SubCategory)(x)
        x = Activation(finalAct, name="subcategory")(x)
        return x

        def .....(...):
           ......
@staticmethod
    def build_model(width, height,num_category1, num_category2, num_category3,....,num_categoryN)):
        # Initialize the input shape and channel dimension
        inputShape = (height, width, 3)
        inputs = Input(shape=inputShape)
        category1  = MultiLableClf.build_category1(inputs, num_category1)
        category2 = MultiLableClf.build_category2(inputs, num_category2)
        category3 = MultiLableClf.build_category3(inputs, num_category3)
       
       # Create the model using the input and multiple outputs
        model = Model(inputs=inputs, outputs=[all categories], name="productnet")

        # Return the constructed network architecture
        return model

class MultiLableClf:

@staticmethod

def build_category1(inputs, num_category1, finalAct="softmax"):

# CONV => BN => RELU => POOL

x = Conv2D(32, (3, 3), padding="same")(inputs)

x = BatchNormalization()(x)

x = Activation("relu")(x)

x = MaxPooling2D(pool_size=(2, 2))(x)

x = Dropout(0.25)(x)

x = Flatten()(x)

# Fully connected layers

x = Dense(256)(x)

x = Activation("relu")(x)

x = Dense(num_Category)(x)

x = Activation(finalAct, name="category")(x)

# Return the category prediction sub-network

return x

@staticmethod

def build_category2(inputs, num_category2, finalAct="softmax"):

# CONV => BN => RELU => POOL

x = Conv2D(32, (3, 3), padding="same")(inputs)

x = BatchNormalization()(x)

x = Activation("relu")(x)

x = MaxPooling2D(pool_size=(2, 2))(x)

x = Dropout(0.25)(x)

x = Flatten()(x)

# Fully connected layers

x = Dense(256)(x)

x = Activation("relu")(x)

x = Dense(num_SubCategory)(x)

x = Activation(finalAct, name="subcategory")(x)

return x

def .....(...):

......

@staticmethod

def build_model(width, height,num_category1, num_category2, num_category3,....,num_categoryN)):

# Initialize the input shape and channel dimension

inputShape = (height, width, 3)

inputs = Input(shape=inputShape)

category1 = MultiLableClf.build_category1(inputs, num_category1)

category2 = MultiLableClf.build_category2(inputs, num_category2)

category3 = MultiLableClf.build_category3(inputs, num_category3)

# Create the model using the input and multiple outputs

model = Model(inputs=inputs, outputs=[all categories], name="productnet")

# Return the constructed network architecture

return model

These functions incorporated unique layers and connections specific to each label. By linking these functions together within the “build_model” function, I created a comprehensive architecture capable of handling all labels simultaneously.

In class functions inputs are a pixel dimension of an image, once we set these values for training, we can’t give any other value while predicting.

The “build_model” function initializes the CNN model structure and connects it to the subsequent label-specific architecture functions, setting up the input layer and shared convolutional layers that are common to all labels.

Train & Save Model

 model = MultiLableClf.build_model(200, 200, num_category2, num_category2, num_category3, num_category4,num_category5)##Change Dimensions
    model.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy'])

    # Model Training
    model.fit(np.array(images_array),
              [encoded_labels1, encoded_labels2, encoded_labels3,
               encoded_labels4, encoded_labels5],
              epochs=10)

    # Save Models and encoders
    tf.keras.models.save_model(model, 'Models/imageClassifier')

model = MultiLableClf.build_model(200, 200, num_category2, num_category2, num_category3, num_category4,num_category5)##Change Dimensions

model.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy'])

# Model Training

model.fit(np.array(images_array),

[encoded_labels1, encoded_labels2, encoded_labels3,

encoded_labels4, encoded_labels5],

epochs=10)

# Save Models and encoders

tf.keras.models.save_model(model, 'Models/imageClassifier')

In this we build the model:

First, we give inputs and num_category. num_category are the total unique number of labels in a particular category.
In model.compile we use adam as optimizer and CategoricalCrossetropy() because the data is categorical and there are more than two labels for each category. metrics=[‘accuracy’] for model accuracy.
In model.fit we give the first parameter as all images in numpy array as X, encoded_labels list data as y, and set epochs. encoded_labels is the numerical values og categorical data for each image and epochs are a number of iterations.
In last, we save our trained model using save_model().

4. Inference and Response Time:

Load and inference:

import numpy as np
import pickle
from PIL   import Image
import tensorflow as tf
##Loading CNN Model

model = tf.keras.models.load_model('model')

def Prediction(image_path):
    
    image = Image.open(image_path).convert('RGB')
    img = image.resize((200, 200))  # Resize the image to match the input size of the model
    image_array = np.array(img)  # Convert the image to a numpy array
    image_array = image_array / 255.0  # Normalize the image pixels to the range [0, 1]
    image_array = np.expand_dims(image_array, axis=0)  # Add a batch dimension

    # Make predictions
    
    predictions = model.predict(image_array)

    # Extract the predicted labels
    predicted_1  = np.argmax(predictions[0])
    predicted_2  = np.argmax(predictions[1])
    predicted_3  = np.argmax(predictions[2])
    predicted_4  = np.argmax(predictions[3])
    predicted_5  = np.argmax(predictions[4])
    
    return predicted_1, predicted_2, predicted_3, predicted_4 ,predicted_5

import numpy as np

import pickle

from PIL import Image

import tensorflow as tf

##Loading CNN Model

model = tf.keras.models.load_model('model')

def Prediction(image_path):

image = Image.open(image_path).convert('RGB')

img = image.resize((200, 200)) # Resize the image to match the input size of the model

image_array = np.array(img) # Convert the image to a numpy array

image_array = image_array / 255.0 # Normalize the image pixels to the range [0, 1]

image_array = np.expand_dims(image_array, axis=0) # Add a batch dimension

# Make predictions

predictions = model.predict(image_array)

# Extract the predicted labels

predicted_1 = np.argmax(predictions[0])

predicted_2 = np.argmax(predictions[1])

predicted_3 = np.argmax(predictions[2])

predicted_4 = np.argmax(predictions[3])

predicted_5 = np.argmax(predictions[4])

return predicted_1, predicted_2, predicted_3, predicted_4 ,predicted_5

Make prediction function first, we open and process the image using PIL image library, convert them into an array, normalize the image vector by dividing 255, and change image dimensions as batch input.

Now we predict labels, prediction gives a List of predictions of all categories, so we split by predictions[index_unmber] and use np.argmax to get label from particular category prediction.

Our Prediction looks like this:

inference-and-response-time

This leads to faster inference, making the model more efficient and practical for real-time applications. As compared to the multi-model approach this approach in e-commerce is better, where we use only a single multilabel classification model.

With this approach, the model predicts multiple categories, such as color detection, master category, category, and sub-category of products, in a single inference pass. As a result, customers on an e-commerce website can upload a photo and receive quick and accurate predictions.

5. Maintenance:

The efficient multilabel classification architecture not only enhances the user experience but also reduces the model deployment maintenance. By managing the single model, the management of the system is easy, reducing the complexity of managing and updating multiple models separately.

Conclusion:

In this blog post, I have shared an efficient approach for using multilabel classification using a single CNN model architecture in E-Commerce.

By adopting this technique, developers and researchers can enhance the efficiency and performance of their multilabel classification models.

Overall, this approach to multilabel classification in e-commerce is beneficial, offering faster response times, improved user satisfaction, and simplified model maintenance.