Academy
Updated on
Feb 25, 2026

From RAG to Voice: Building a Shopify Assistant for Indian Customers

This article describes how you can build a voice-enabled assistant for product discovery and support.

From RAG to Voice: Building a Shopify Assistant for Indian Customers
Ready to ship your own agentic-AI solution in 30 days? Book a free strategy call now.

Why This Matters for D2C and Shopify Sellers 

If you talk to any D2C founder, one problem comes up again and again: customers add items to cart, then they disappear.

Most Shopify sellers try to fix this using:

  • Automated emails
  • WhatsApp reminders
  • Discount nudges
  • Chat widgets

All of these approaches assume the customer is willing to read, scroll, and process information, usually in English.

But, in India, that assumption often breaks.

India is a voice-native market. People are comfortable asking questions verbally, especially when shopping:

  • “Is this kurta cotton?”
  • “What sizes are available?”

When answers are buried inside long descriptions written in English, customers don’t always abandon because they dislike the product. Often, they leave because finding the right information takes effort.

This project started from a simple observation:

If customers can speak their questions, why should stores only reply in text?




What I Set Out to Build

The goal was not to build a fancy chatbot.

The goal was to build a store-aware voice assistant that:

  • Understands Shopify product and policy data
  • Answers questions accurately using retrieval (not guessing)
  • Speaks answers out loud
  • Supports natural follow-up questions

At the end, the assistant behaves like a helpful store staff member, not an FAQ page.




High-Level Flow of the System

At a conceptual level, the system works like this:

  1. The customer speaks a question
  2. Speech is converted to text (STT)
  3. Question is answered using RAG over Shopify data
  4. Answer is converted back to speech (TTS)
  5. Assistant waits for the next question

This loop continues until the customer says “exit” or “stop”.




The Foundation: RAG Over Shopify Data

The engagement depends on the quality of the responses. And we need to remember that voice only amplifies wrong answers—it does not fix them.

Data the Assistant Knows

The assistant is built using real Shopify store information:

  • Product names
  • Prices
  • Available sizes
  • Fabric and material details
  • Shipping timelines
  • Return and refund policies

This data is fetched from Shopify, cleaned to remove HTML, and converted into plain text.




Chunking and Preparation

Raw product pages are long and messy. Reading them aloud would be useless.

So the content is:

  • Broken into small, meaningful chunks
  • Grouped logically (product-level, policy-level)
  • Stored in JSON

Chunking ensures:

  • Product questions don’t retrieve policy text
  • Policy questions don’t retrieve product descriptions
  • Answers stay relevant and short
def chunk_text(text, size=CHUNK_SIZE):
    chunks = []
    start = 0
    while start < len(text):
        chunks.append(text[start:start + size])
        start += size
    return chunks

with open(os.path.join(SCRIPT_DIR, "shopify_data.json"), "r", encoding="utf-8") as f:
    data = json.load(f)

chunks = []

for item in data:
    text_chunks = chunk_text(item["content"])
    for chunk in text_chunks:
        chunks.append({
            "title": item["title"],
            "type": item["type"],
            "content": chunk
        })

with open(os.path.join(SCRIPT_DIR, "rag_chunks.json"), "w", encoding="utf-8") as f:
    json.dump(chunks, f, indent=2)

print("Chunking complete. Total chunks:", len(chunks))




Chunking the data




Embeddings and Vector Search

Each chunk is converted into a vector embedding using a sentence-transformer model.

These embeddings are stored in a FAISS index, which allows semantic search.

When a customer asks:

“What sizes do you have in cotton kurta?”

The system retrieves:

  • The cotton kurta chunk
  • Not unrelated products
  • Not shipping policies

This is the core of the RAG system.




Handling Real Customer Questions (Not Textbook English)

Real customers don’t speak like documentation.

They say things like:

  • “cotton kurta size?”
  • “fabric of kurta”

To handle this, the assistant uses:

  • Keyword-based intent detection (price, size, fabric)
  • Product name extraction from the spoken text
  • Attribute-level answer extraction

Instead of reading the entire product description aloud, it extracts only what is relevant.

Example:

  • Question: “What is the fabric of kurta?”
  • Answer: “The cotton kurta is made from 100% pure cotton.”

This keeps voice responses short, focused, and natural.




Speech-to-Text (STT): How Voice Becomes Text

The Speech-to-Text module converts user voice input into text using Hugging Face transformer-based speech recognition models.

model = WhisperModel("base", compute_type="int8")
def listen(fs=16000, silence_threshold=0.015, silence_duration=1.5, max_duration=12):
    """
    Record audio dynamically until silence is detected after speech starts.
    - silence_threshold: RMS amplitude below which audio is considered silent
    - silence_duration:  seconds of silence needed to stop recording
    - max_duration:      hard cap on recording length in seconds
    """
    print("Listening...")

    chunk_size = int(fs * 0.1)          # process in 100ms chunks
    max_chunks = int(max_duration / 0.1)
    silence_chunks_needed = int(silence_duration / 0.1)
    audio_chunks = []
    silent_count = 0
    speech_started = False

    with sd.InputStream(samplerate=fs, channels=1, dtype='float32') as stream:
        for _ in range(max_chunks):
            chunk, _ = stream.read(chunk_size)
            audio_chunks.append(chunk.copy())

            rms = np.sqrt(np.mean(chunk ** 2))

            if rms > silence_threshold:
                speech_started = True
                silent_count = 0
            elif speech_started:
                silent_count += 1
                if silent_count >= silence_chunks_needed:
                    break

    if not audio_chunks or not speech_started:
        return ""

    recording = np.concatenate(audio_chunks, axis=0)
    # Convert float32 [-1, 1] → int16 for a standard WAV that Whisper handles reliably
    recording_int16 = (recording * 32767).astype(np.int16)

How It Works

  1. The user speaks into the microphone.
  2. The audio input is captured and processed.
  3. A pretrained Hugging Face ASR (Automatic Speech Recognition) model transcribes the speech into text.
  4. The transcribed text is forwarded to the RAG pipeline for query processing.

Technologies Used

  • faster-whisper (WhisperModel – base, int8 optimized)
  • sounddevice (microphone input streaming)
  • numpy (audio signal processing & RMS calculation)
  • tempfile (temporary audio storage)

File

scripts/hf_stt.py

Purpose

This module enables natural voice interaction with the AI assistant, eliminating the need for manual typing.




Text-to-Speech (TTS): Turning Answers into Voice

The Text-to-Speech module converts the AI-generated text response back into natural-sounding speech.

pygame.mixer.pre_init(frequency=44100, size=-16, channels=2, buffer=512)
pygame.mixer.init()

# Sweet, natural neural voice — options you can swap:
#   "en-US-JennyNeural"  — warm and friendly (default)
#   "en-US-AriaNeural"   — calm and clear
#   "en-US-SaraNeural"   — bright and polite
VOICE = "en-US-JennyNeural"
async def _generate(text: str, path: str):
    communicate = edge_tts.Communicate(text, voice=VOICE, rate="-5%", pitch="+0Hz")
    await communicate.save(path)

 How It Works

  1. The RAG module generates a text response.
  2. The text is passed to a Hugging Face TTS model.
  3. The model synthesizes human-like speech.
  4. The audio is played back to the user.

Technologies Used

  • edge-tts (Neural voice synthesis – Microsoft Edge voices)
  • asyncio (asynchronous speech generation)
  • pygame (audio playback)
  • os (file cleanup)

 File

scripts/hf_tts.py

 Purpose

This module provides a fully voice-enabled AI assistant experience by converting responses into speech.

SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))

# Load chunked data
with open(os.path.join(SCRIPT_DIR, "rag_chunks.json"), "r", encoding="utf-8") as f:
    data = json.load(f)

texts = [item["content"] for item in data]

# Load FAISS
index = faiss.read_index(os.path.join(SCRIPT_DIR, "faiss.index"))

# Load embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
welcome = "Shopify Voice Assistant started. Ask your question."
print(welcome)
speak(welcome)




Why Voice Matters Specifically in India

Voice changes how customers interact with a store:

  • It reduces effort
  • It feels conversational
  • It works well even when grammar isn’t perfect
  • It builds trust faster than long text

For Indian Shopify sellers, this can improve:

  • Product clarity
  • Customer confidence
  • Overall shopping experience

Sometimes, a clear spoken answer is all a customer needs to move forward.

Architecture Summary

Data Layer

  • Shopify products and policies

RAG Layer

  • Chunking
  • Embeddings
  • FAISS vector search

Logic Layer

  • Intent detection (price, size, fabric)
  • Product name extraction
  • Attribute-based answers

Voice Layer

  • STT
  • TTS

Each layer is independent, making the system:

  • Easier to debug
  • Easier to extend
  • Practical for real use

                                                Shopify voice assistant 




Possible Extensions Beyond This Project

Although this project focuses on voice-based support, the same architecture can be extended to:

  • Cart-drop voice follow-ups
  • WhatsApp voice notes
  • Multilingual STT/TTS engines
  • Human handoff for complex cases

Once the store knowledge is solid, voice becomes just another interface.

At Superteams, we help you build voice agents customized to your business needs. To learn more, speak to us.




Project Link

You can explore the project here:

🔗 Project Repository:
https://github.com/AbhinayaPinreddy/Shopify-Voice-AI-Assistant

Authors

Want to Scale Your Business with AI Deployed on your Cloud?

Talk to our team and get a complementary agentic AI advisory session.

We use cookies to ensure the best experience on our website. Learn more