This blog offers a guide on designing a system that classifies product images into categories and suggests similar items in the inventory.
Suppose you're an e-commerce platform that has a lot of different categories of products in your inventory. Every time a new product comes into your inventory, you would like it to be automatically classified into one of the predefined categories. Also, you would like to know which are the similar kinds of products in your inventory to keep track of the goods you have. This type of system can be incredibly useful for e-commerce platforms as it streamlines the product categorization process, saves time and manual effort, and enables efficient inventory management.
Moreover, by identifying similar products, you can better understand your product assortment, identify potential gaps or redundancies, and make informed decisions about product procurement, pricing, and promotions. This way you can optimize your inventory levels to minimize stockouts and overstocking.
At Superteams.ai, we decided to take on this problem statement and created an AI-powered solution to address it. We designed a system that classifies product images into their respective categories and then suggests similar products already available in the inventory.
To achieve this, we leveraged two cutting-edge technologies: OpenAI's CLIP model and the Qdrant Vector Database.
OpenAI's CLIP (Contrastive Language-Image Pre-training) model is a game-changer in the AI domain. This powerful model generates both text and image embeddings in a shared vector space, enabling seamless mapping between textual concepts and visual representations. The magic lies in the model's ability to place semantically similar texts and images in close proximity within the vector space. This means that images and their corresponding textual descriptions are naturally aligned, making it incredibly easy to match images to their respective categories.
A demo of how CLIP would place embeddings of different images and texts
While the CLIP model lays the foundation for our solution, the Qdrant Vector Database takes it to the next level. Qdrant is an open-source database specifically designed for storing and retrieving vector embeddings and their associated metadata with exceptional speed and efficiency. Its advanced similarity search capabilities allow us to quickly explore the vast vector space and retrieve the most relevant information from the database corpus.
By leveraging Qdrant, we can perform lightning-fast lookups of user queries against the stored vector embeddings. This means that when a new product image is uploaded, our system can swiftly identify the most similar products already present in your inventory, empowering you to make informed decisions about product categorization and inventory management.
We used a Fashion Product Dataset from Kaggle. This dataset consists of around 44,000 images of various fashion products and their respective product categories. There are about 143 types of categories. And they look something like this:
Shorts
Trousers
Rain Trousers
Sweaters
Sarees
Shrug
Sports Sandals
…….
Bracelet
Body Wash and Scrub
Compact
Trunk
Mens Grooming Kit
Boxers
Rompers
Concealer
Deodorant
Hats
Heels
Headband
Wallets
Free Gifts
Workflow Diagram
Install the required packages.
Download the dataset:
Extract the product categories.
Load the CLIP model:
Compute the embeddings of the category texts:
Launch an instance of Qdrant on localhost:
Upsert the embeddings into a collection called “text_embeddings”. We also add a payload metadata along with the embeddings. The metadata contains the text labels for the product categories. We’ll use this metadata later to retrieve the product category.
Next, let’s create another collection for the image embeddings.
Next we create a Pandas dataframe to organize the image embeddings and the associated metadata (filepath of the images).
Since we have a large number of image embeddings (for 44,000 product images) it’s a good idea to upsert it into the VectorDB in batches of 1000 for speed and efficiency.
We’ll now start writing the functions for our Gradio UI. The first function takes a PIL image as an input. It then performs a similarity search with the text-embeddings collection and returns the top result, which is basically the category of the product.
The second function takes the PIL image as input and, by performing a similarity search on the image-embeddings collections, returns the top ten images (from the inventory) as a list of file paths.
Code for the Gradio UI:
Here's a brief explanation of the code:
1. The `with gr.Blocks() as demo:` statement creates a Gradio interface named `demo`.
2. Inside the `demo` block, there are three `gr.Row()` components, each representing a row in the user interface.
3. In the first row:
- `upload_image` is an `gr.Image` component that allows the user to upload an image. It has a label "Upload Your Image" and accepts PIL (Python Imaging Library) images.
- `classifier_text` is a `gr.Textbox` component that displays the type of item based on the image classification.
4. In the second row:
- `image_gallery` is a `gr.Gallery` component that displays similar items from the inventory. It has a label "Similar items in the inventory" and is configured to show 5 columns and 2 rows of images.
5. In the third row:
- `clr_btn` is a `gr.Button` component with the label "Clear".
6. The `first_step` variable is assigned the result of `upload_image.upload()`, which triggers the `image_classifier` function when an image is uploaded. The uploaded image is passed as input to the function, and the output is displayed in the `classifier_text` textbox.
7. The `first_step.then()` statement chains another function call to `image_path_list` after the `image_classifier` function completes. It takes the uploaded image as input and updates the `image_gallery` with the resulting list of recommended image paths.
8. The `clr_btn.click()` statement defines the behavior when the "Clear" button is clicked. It sets the `upload_image`, `classifier_text`, and `image_gallery` components to their default values (None, None, and an empty list, respectively).
9. Finally, `demo.launch(share=True)` launches the Gradio interface and makes it shareable, allowing others to access it via a generated URL.
Here are some screenshots from our UI:
You can access the code in this Github repository: https://github.com/vardhanam/Product_Classifier_Recommendation/tree/main