Enterprise

Unify — Omnichannel AI Sales & Support Assistant

How we built Kala — an AI-powered sales assistant that speaks Hinglish, analyses customer photos, and closes saree sales 24/7 across WhatsApp, Instagram, and Telegram.

Parmeet Singh Talwar

•Sep 2025•

8 min read

•Live Demo

Unify — Omnichannel AI Sales & Support Assistant

Project Overview

A leading D2C saree retailer was losing sales every day because customer queries on WhatsApp and Instagram went unanswered after business hours. Agents were overwhelmed managing hundreds of chat threads manually. We built **Unify** — a multi-tenant SaaS platform with an AI persona called **Kala** — that acts like a knowledgeable in-store salesperson: she understands Telugu, Hinglish, and English; she can look at a photo a customer sends and find the matching saree in a 5,000-product catalog; and she hands off to a human agent the moment sentiment dips or a refund is requested.

Channels Supported

4 (EN / HI / TE / Hinglish)

Languages Handled

5,000+

Products Indexed in RAG

< 2s

Avg. Response Latency

~15%

Human Escalation Rate

99.9%

Uptime Target

System Architecture

Unify is built around an **Omnichannel Gateway** that normalises payloads from WhatsApp, Instagram, and Telegram into a single internal Message schema. Every inbound message is routed through a JWT-authenticated multi-tenant context resolver that scopes all data and AI config to the correct organisation. The AI pipeline runs a hybrid RAG retrieval (BM25 lexical + sentence-transformer semantic search fused with Reciprocal Rank Fusion) before sending the context to Gemini 2.0 Flash for generation. A real-time WebSocket layer pushes escalations and new messages to the React dashboard instantly.

Figure 1: System Architecture Diagram

Omnichannel Gateway

Unified webhook endpoints that normalise WhatsApp, Instagram, and Telegram payloads into a standard internal Message schema, with signature verification and multi-tenant context injection.

Hybrid RAG Engine

BM25 (rank-bm25) lexical retrieval fused with Sentence-Transformer semantic search via Reciprocal Rank Fusion (RRF). NLU entities (color, fabric, SKU, price range) are injected into the search query. A greedy diversity filter ensures results span different colors and fabrics.

LLM Orchestrator

Gemini 2.0 Flash via OpenRouter. Manages system prompt versioning, conversation history compression, language mirroring (Hinglish ↔ Telugu ↔ English), and brand guardrails (2–4 sentence replies, CTA in every message).

Vision Service

Multimodal image analysis: customer photos are base64-encoded and sent to the LLM with a specialised prompt that extracts colour, fabric, and pattern tags, which are then used to query the RAG catalog.

Escalation Engine

Rule-based scoring that triggers human handoff on keyword triggers ("refund", "talk to human"), negative sentiment threshold breaches, or consecutive AI failure turns. Sets human_controlled flag and pushes a WebSocket event to the Agent Inbox.

Unified Agent Inbox

React + Vite dashboard with a 3-column inbox (list | thread | context panel). Real-time red badge alerts via WebSocket. Agents can take over, reply via WhatsApp API, attach products, and resolve / hand back to AI.

Implementation Details

Code Example

python

class RAGService:
    """Hybrid retrieval over products: BM25 + semantic + RRF + diversity filter."""

    def retrieve(
        self,
        query: str,
        query_type: str = "descriptive",
        entities: Optional[dict] = None,
        top_k: Optional[int] = None,
    ) -> str:
        self._build_index()
        k = top_k or self.top_k

        # Expand query with NLU entities (color, fabric, sku) and synonyms
        search_query = _expand_query_with_entities(query, entities)
        search_query = _expand_query_synonyms(search_query) or search_query

        # Boost lexical for exact SKU; boost semantic for descriptive queries
        lexical_k = k * 2 if query_type == "exact" else k
        semantic_k = k * 2 if query_type == "descriptive" else k

        lex_idx = self._lexical_search(search_query, lexical_k)
        sem_idx = self._semantic_search(search_query, semantic_k)

        # Reciprocal Rank Fusion: score(d) = Σ 1/(rrf_k + rank_d)
        merged = self._merge_and_dedup(lex_idx, sem_idx, k)

        # Greedy diversity: prefer products with unseen color/fabric combination
        if self.diversity_enabled:
            merged = self._apply_diversity(merged, k)

        lines = [
            _format_product_for_context(self._products[i], self._site_base)
            for i in merged if i < len(self._products)
        ]
        return "CONTEXT FROM PRODUCT CATALOG:\n" + "\n".join(lines)

Agent Memory

Before passing the user's message to BM25/embeddings we extract NLU entities (color, fabric, price range) and append them to the search query. This means "show me something in green silk under ₹5,000" resolves to {color: 'green', fabric: 'silk', max_price: 5000} and retrieves from a much tighter candidate set — dramatically reducing hallucinated product mentions.

Workflow

A customer sends a WhatsApp message or image. The Omnichannel Gateway normalises the payload and resolves the org context. If the thread is human-controlled the message goes straight to the Agent Inbox; otherwise the AI pipeline runs: image → Vision AI → tags, or text → RAG retrieval → LLM reply. The Escalation Engine scores every turn; above threshold it flips the human_controlled flag, pauses AI replies, and pushes a WebSocket alert to the agent dashboard.

Figure 2: Workflow Diagram

Results & Impact

"Before Unify, our team was manually copy-pasting product links at 11 PM. Now Kala handles 85% of queries end-to-end — she even matches customers to sarees when they send us a blurry reference photo from Pinterest."

85% Queries Resolved by AI

Only ~15% of conversations required human intervention, down from 100% manual handling before deployment.

24/7 Sales Coverage

Kala operates across all time zones on WhatsApp, Instagram, and Telegram with sub-2-second response times.

Multilingual by Design

Language detection and mirroring means customers can type in Telugu or Hinglish and receive a perfectly matched reply in kind.

Visual Product Matching

Vision AI enables customers to send a reference photo and get matched sarees from a 5,000+ product catalog — a feature impossible with keyword-only search.

Zero-Setup RAG Updates

The catalog can be refreshed from Supabase or a CSV drop. The RAG index rebuilds automatically on the next query with no server restart required.

Multi-Tenant Isolation

Every query is scoped by org_id, ensuring Brand A's catalog and conversations are never visible to Brand B.

About the Author

Parmeet Singh Talwar

AI Context Engineer

15+

Projects Delivered

1.5+

Industry Experience

Parmeet Singh Talwar

AI Context Engineer

Apex Neural

Parmeet engineers context-driven AI that combines LLMs with structured backend architecture and multi-platform integrations. He builds AI-powered systems with secure OAuth, fine-tunes open-source LLMs, and integrates image and video generation into production pipelines. Focused on clean design and system reliability.

Contributors

Parmeet Singh Talwar

Ready to Build Your AI Solution?

Get a free consultation and see how we can help transform your business.