Introduction:
Multilabel classification is an important task in computer vision that involves predicting multiple labels or categories from an image.
We can use multilabel classification in e-commerce platforms like bagisto, where e-commerce provides product search through images. For example, Google Lens can search products by Images.
Traditionally, this has been achieved by training separate models for each label and then merging their predictions. However, this approach can be time-consuming and inefficient because of multiple models for predictions.
Slow response time is bad for the user experience. To reduce the response time we use multiple classification single model. In this blog post, I will tell you about developing an efficient multilabel classification architecture and its use in e-commerce.
1. Single Multiclass Classification Model:
To start, I began with a single multiclass classification model where each image is assigned a single label from different categories.
However, this approach required training with different models, one for each label, and then combining their predictions to obtain the final multilabel output.
Example:
Load Different Models:
1 2 3 4 5 6 7 8 9 |
# Import necessary libraries import numpy as np from keras.models import load_model # Load the pre-trained models color_model = load_model('color_detection_model.h5') master_category_model = load_model('master_category_model.h5') category_model = load_model('category_model.h5') sub_category_model = load_model('sub_category_model.h5') |
In this, we load trained models using Keras library.
Define Prediction Function:
1 2 3 4 5 6 7 8 |
# Function to predict the label using each model def predict_label(image): color_label = np.argmax(color_model.predict(image)) master_category_label = np.argmax(master_category_model.predict(image)) category_label = np.argmax(category_model.predict(image)) sub_category_label = np.argmax(sub_category_model.predict(image)) return color_label, master_category_label, category_label, sub_category_label |
In this, we predict different labels from different models using the model.predict. model.predict gives probabilities for different categories, np.argmax selects maximum probability and we get a label.
Predictions:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Sample image (you would load your own image here) image = np.random.rand(1, 224, 224, 3) # Get the predicted labels color_label, master_category_label, category_label, sub_category_label = predict_label(image) # Combine the predictions to get the final multilabel output multilabel_output = { "color": color_label, "master_category": master_category_label, "category": category_label, "sub_category": sub_category_label, } |
Now we get a result, like color: Red, master_category: Clothes, Category: T-shirt, and sub_category: Casual.
I train different models like color detection, Master Category of product, category of product, and sub-category of product. Each model predicts different labels, and we use these models by merging them together.
Suppose these models are in e-commerce websites for visual shopping, Customers upload a photo, and the model can predict the actual product and show similar products to the customer.
Although accurate, this method suffered from slow response times due to the need for multiple models.
2. Researching Efficient Multilabel Classification:
In order to overcome the slow response time issue, I had to do further research and discovered an alternative approach that could yield more efficient results.
The key insight was to design a single model architecture capable of predicting all multiple labels simultaneously.
By combining the architecture of the individual models, I could eliminate the need for merging predictions and achieve faster inference times.
3. Building the Multilabel Classification Model:
To implement this approach, I organized the model architecture as a class method. I created a class that encapsulates the architecture design and functionality.
Within this class, I defined separate functions, each responsible for constructing the architecture of one label.
Architecture:
Import Necessary libraries
1 2 3 |
import tensorflow as tf from tensorflow.keras.layers import Lambda from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Activation, BatchNormalization |
we import different Tensorflow.keras.layers
Conv2D, MaxPooling2D: for features extraction from images.
Flatten: for flattening the features vector.
Dense: Nural network layer.
Dropout, Activation, and BatchNormalization are three commonly used techniques in deep learning architectures to improve model performance, prevent overfitting, and accelerate convergence during training
Model:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
class MultiLableClf: @staticmethod def build_category1(inputs, num_category1, finalAct="softmax"): # CONV => BN => RELU => POOL x = Conv2D(32, (3, 3), padding="same")(inputs) x = BatchNormalization()(x) x = Activation("relu")(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = Flatten()(x) # Fully connected layers x = Dense(256)(x) x = Activation("relu")(x) x = Dense(num_Category)(x) x = Activation(finalAct, name="category")(x) # Return the category prediction sub-network return x @staticmethod def build_category2(inputs, num_category2, finalAct="softmax"): # CONV => BN => RELU => POOL x = Conv2D(32, (3, 3), padding="same")(inputs) x = BatchNormalization()(x) x = Activation("relu")(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = Flatten()(x) # Fully connected layers x = Dense(256)(x) x = Activation("relu")(x) x = Dense(num_SubCategory)(x) x = Activation(finalAct, name="subcategory")(x) return x def .....(...): ...... @staticmethod def build_model(width, height,num_category1, num_category2, num_category3,....,num_categoryN)): # Initialize the input shape and channel dimension inputShape = (height, width, 3) inputs = Input(shape=inputShape) category1 = MultiLableClf.build_category1(inputs, num_category1) category2 = MultiLableClf.build_category2(inputs, num_category2) category3 = MultiLableClf.build_category3(inputs, num_category3) # Create the model using the input and multiple outputs model = Model(inputs=inputs, outputs=[all categories], name="productnet") # Return the constructed network architecture return model |
These functions incorporated unique layers and connections specific to each label. By linking these functions together within the “build_model” function, I created a comprehensive architecture capable of handling all labels simultaneously.
In class functions inputs are a pixel dimension of an image, once we set these values for training, we can’t give any other value while predicting.
The “build_model” function initializes the CNN model structure and connects it to the subsequent label-specific architecture functions, setting up the input layer and shared convolutional layers that are common to all labels.
Train & Save Model
1 2 3 4 5 6 7 8 9 10 11 |
model = MultiLableClf.build_model(200, 200, num_category2, num_category2, num_category3, num_category4,num_category5)##Change Dimensions model.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy']) # Model Training model.fit(np.array(images_array), [encoded_labels1, encoded_labels2, encoded_labels3, encoded_labels4, encoded_labels5], epochs=10) # Save Models and encoders tf.keras.models.save_model(model, 'Models/imageClassifier') |
In this we build the model:
- First, we give inputs and num_category. num_category are the total unique number of labels in a particular category.
- In model.compile we use adam as optimizer and CategoricalCrossetropy() because the data is categorical and there are more than two labels for each category. metrics=[‘accuracy’] for model accuracy.
- In model.fit we give the first parameter as all images in numpy array as X, encoded_labels list data as y, and set epochs. encoded_labels is the numerical values og categorical data for each image and epochs are a number of iterations.
- In last, we save our trained model using save_model().
4. Inference and Response Time:
Load and inference:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
import numpy as np import pickle from PIL import Image import tensorflow as tf ##Loading CNN Model model = tf.keras.models.load_model('model') def Prediction(image_path): image = Image.open(image_path).convert('RGB') img = image.resize((200, 200)) # Resize the image to match the input size of the model image_array = np.array(img) # Convert the image to a numpy array image_array = image_array / 255.0 # Normalize the image pixels to the range [0, 1] image_array = np.expand_dims(image_array, axis=0) # Add a batch dimension # Make predictions predictions = model.predict(image_array) # Extract the predicted labels predicted_1 = np.argmax(predictions[0]) predicted_2 = np.argmax(predictions[1]) predicted_3 = np.argmax(predictions[2]) predicted_4 = np.argmax(predictions[3]) predicted_5 = np.argmax(predictions[4]) return predicted_1, predicted_2, predicted_3, predicted_4 ,predicted_5 |
Make prediction function first, we open and process the image using PIL image library, convert them into an array, normalize the image vector by dividing 255, and change image dimensions as batch input.
Now we predict labels, prediction gives a List of predictions of all categories, so we split by predictions[index_unmber] and use np.argmax to get label from particular category prediction.
Our Prediction looks like this:
This leads to faster inference, making the model more efficient and practical for real-time applications. As compared to the multi-model approach this approach in e-commerce is better, where we use only a single multilabel classification model.
With this approach, the model predicts multiple categories, such as color detection, master category, category, and sub-category of products, in a single inference pass. As a result, customers on an e-commerce website can upload a photo and receive quick and accurate predictions.
5. Maintenance:
The efficient multilabel classification architecture not only enhances the user experience but also reduces the model deployment maintenance. By managing the single model, the management of the system is easy, reducing the complexity of managing and updating multiple models separately.
Conclusion:
In this blog post, I have shared an efficient approach for using multilabel classification using a single CNN model architecture in E-Commerce.
By adopting this technique, developers and researchers can enhance the efficiency and performance of their multilabel classification models.
Overall, this approach to multilabel classification in e-commerce is beneficial, offering faster response times, improved user satisfaction, and simplified model maintenance.