This blog explains how to build a RAG pipeline for a company’s internal business documents.
A Retrieval Augmented Generation (RAG) pipeline can be a valuable tool for companies looking to improve access and utilization of their internal business documents. Many companies struggle with information silos, where valuable knowledge gets trapped within specific departments or documents. A RAG pipeline acts like a powerful search engine for internal documents. Employees can ask questions in natural language, and the RAG pipeline retrieves relevant passages from contracts, employee handbooks, or knowledge bases.
RAG pipelines can go beyond simple retrieval. They can also use the retrieved information to generate summaries or answer specific questions. This allows employees to grasp complex topics in internal documents more easily, leading to better-informed decisions.
New hires often spend significant time searching for company policies, procedures, or benefits information. A RAG pipeline can act as a virtual assistant for new employees, providing them with quick access to relevant information and streamlining the onboarding process.
Internal documents often contain valuable knowledge about products, services, or troubleshooting procedures. A RAG pipeline can be integrated with customer support systems, allowing agents to find answers to customer queries quickly and efficiently from internal resources.
Depending on the industry, companies might need to navigate complex legal and regulatory frameworks. RAG pipelines can be trained on industry-specific documents and regulations. Employees can then use the pipeline to get tailored summaries or insights relevant to their specific situation.
Now if you’re convinced you need a RAG pipeline, we’ll show you the steps for it.
Below are the steps involved in designing a RAG workflow:
1. Data Preparation: Split your documents into smaller chunks so that only relevant information is fed into the Large Language Model. This can be achieved easily by using one of many text splitting modules offered by popular frameworks like LangChain, LlamaIndex, etc.
2. Document Indexing: Now these chunks of data need to be converted into high-dimensional embeddings and then upserted into a vector database. There are many AI models that convert textual data into their semantic embeddings. Similarly, there are many open-source vector databases in the market. Some of the popular ones are FAISS, Pinecone, Weaviate and Qdrant.
3. Generation Model: Choose a pre-trained large language model for text-generation. Popular choices are Llama2, Mistral 7B.
4. Context Retrieval and Answer generation: The relevant context is retrieved from the vector database by performing a similarity search on the user input query. Then this context is fed to the LLM along with the user query for generating an answer.
5. Deployment: Deploy your RAG pipeline in a production environment, ensuring scalability, reliability, and efficiency.
Below is an end-to-end production-ready code for a RAG pipeline that uses Mistral 7B as the large language model, and Weaviate as the vector database. The entire pipeline is wrapped around a Gradio UI for user interactivity.
First create a vector cluster on Weaviate and retain the url of your cluster. Then, in a file called app.py, paste the following code:
Here's a step-by-step description of the code:
1. The code imports necessary libraries and modules, including transformers for loading the language model, LangChain for document processing and vector store, and Gradio for building the user interface.
2. The `load_llm()` function is defined to load the Mistral 7B language model using the AutoTokenizer and AutoModelForCausalLM classes from the Transformers library. It sets up the tokenizer and model with specific configurations and returns a text generation pipeline.
3. The `embeddings_model()` function is defined to load the sentence-transformers/all-mpnet-base-v2 embeddings model using the HuggingFaceEmbeddings class from LangChain.
4. The `initialize_vectorstore()` function is defined to initialize an empty Weaviate vector store using the provided Weaviate URL and the loaded embeddings model.
5. The `text_splitter()` function is defined to create a RecursiveCharacterTextSplitter object for splitting text into chunks of a specified size and overlap.
6. The `add_pdfs_to_vectorstore()` function is defined to process uploaded PDF files. It iterates over each file, extracts the filename, and checks if it is a PDF. If it is a PDF, it loads the file using PyPDFLoader, splits the text into chunks using the text splitter, and adds the chunks to the vectorstore. It returns a success message with the count of added PDF files.
7. The `weaviate_client()` function is defined to create a Weaviate client using the provided URL.
8. The `answer_query()` function is defined to handle user queries. It takes the user's message and the chat history as input. It performs a similarity search on the vector store to find the most relevant context documents based on the query. It then constructs a prompt template with the context and the question, passes it to the language model for generating an answer, and appends the question-answer pair to the chat history. It returns the updated chat history.
9. The `clear_vectordb()` function is defined to clear the vector store and chat history when the "Clear VectorDB and Chat" button is clicked.
10. The code initializes the language model, embeddings model, Weaviate client, vector store, and text splitter using the respective functions.
11. The code sets up the Gradio user interface using the `gr.Blocks()` context manager. It defines the layout and components of the interface, including:
12. The code sets up event handlers for the interface components:
13. Finally, the code launches the Gradio interface, making it accessible via a web browser. It specifies the server name as '0.0.0.0' and enables sharing of the interface.
Overall, this code sets up a conversational AI system that allows users to upload PDF files, stores the text content in a Weaviate vector store, and enables users to ask questions based on the uploaded documents.
I used AI Claude to generate a standard document of a company’s leave policy. It looks something like this:
Then I saved this in PDF format.
By running the following command in the terminal, I launched my RAG application.
I then uploaded the pdf file onto my chatbot
After that, I could query the chatbot and receive my answers as desired.
I also implemented a button to clear the chat, and the associated information in the vector DB, in case the user wants to upload fresh new information.
The code for the article is publicly available here: https://github.com/vardhanam/RAG_Mistral_Weaviate_Gradio
Superteams.ai connects top AI talent with companies seeking accelerated product and content development. Superteamers offer individual or team-based solutions for projects involving cutting-edge technologies like fine-tuning combined with RAG pipelines, text-to-image synthesis, object detection and more. To explore partnership opportunities, please write to founders@superteams.ai or visit this link.