Learn how to build a RAG pipeline on code using StarCoder 2 and LangChain
In the rapidly evolving field of Large Language Models, AI coding copilots have emerged as powerful productivity tools for programmers. The most popular proprietary one is Github copilot. However, the challenge with proprietary AI coding copilots is that when enterprises deploy them, they risk sharing their internal code and IP with an external platform. A powerful alternative strategy is to use open source AI coding copilots like Code-Llama or StarCoder, which ensures that the enterprises’ own IP remains within their infrastructure and is not shared.
To deploy open source AI coding copilots, enterprises must ensure that the AI has the ability to query the codebase. This is where RAG or Retrieval-Augmented Generation plays a role. It combines the strengths of pre-trained language models with information retrieval systems to generate responses based on a large corpus of documents.
The BigCode Project, which describes itself as “an open scientific collaboration working on the responsible development and use of large language models for code”, is a collaboration between ServiceNow and HuggingFace, and has an open governance model.
BigCode recently released a set of next-generation open-source large language models designed for code: StarCoder 2. Three models have been launched:
In this blog post, we’ll see how StarCoder 2 15B can be harnessed to build a RAG pipeline on code. The process can be broken down into these steps:
Let's embark on this exciting journey of harnessing the power of RAG with StarCoder 2 using the LangChain Framework! We will use LangChain to bring together the entire pipeline.
Before we get started, install the required dependencies.
For this experimentation, I used this dataset, which consists of questions and codes related to Python. We’ll load the dataset using the “datasets” library.
Then, we’ll convert it into a pandas dataframe.
The dataset will look like:
After that, we’ll load the dataset using LangChain and save it into documents.
Then, we will split the documents into chunks using a Recursive Character Text Splitter.
We’ll generate embeddings using FastEmbed.
After the embeddings are generated, we’ll store them in the Qdrant vector database with the collection name “Python_codes”. We will also create a retriever from it.
The StarCoder 2 family comprises models of varying sizes, each tailored to different scales of tasks and computational resources. Ranging from 3 billion to 15 billion parameters, these models undergo rigorous training on trillions of tokens, resulting in a sophisticated understanding of code syntax, semantics, and structure. StarCoder 2 incorporates advanced techniques such as Grouped Query Attention and Fill-in-the-Middle training methodology, which enhances its ability to comprehend and generate code with contextually rich understanding.
In comprehensive evaluations across a spectrum of Code LLM benchmarks, StarCoder 2 demonstrates exceptional performance. Even the smallest variant, StarCoder 2-3B, surpasses other models of similar size on many metrics, while StarCoder 2-15B emerges as a powerhouse that is outperforming its larger counterparts like CodeLlama-34B. This superior performance extends across diverse coding tasks, including math and code reasoning challenges, as well as low-resource languages, underscoring the versatility and efficacy of StarCoder 2. For more details, visit this paper and this blog.
Using transformers and Bits and Bytes, we will quantize the StarCoder 2 model, which in this case, is the 15B model. We’ll set the tokenizer too.
Now, we’ll initiate the text generation pipeline with the help of the tokenizer and the model.
Using the Hugging Face Pipeline, we will initiate the large language model.
Then comes the retrieval. We’ll set the chain type as Stuff, and pass the retriever and the LLM to the Retrieval QA.
You will now be able to query StarCoder2 in the following way:
The response will be:
Question 2:
The response will be:
Question 3:
The response will be:
Question 4:
The response will be:
Question 5:
The response will be:
Now, we’ll deploy the chatbot with Gradio.
We experimented with the StarCoder 2 15B and saw the performance, it was impressive. With the help of FastEmbed and Qdrant, it was easy to build this RAG pipeline. Thanks for reading!
Superteams.ai connects top AI talent with companies seeking accelerated product and content development. Superteamers offer individual or team-based solutions for projects involving cutting-edge technologies like LLMs, image synthesis, audio or voice synthesis, and other cutting-edge open-source AI solutions. With over 500 AI researchers and developers, Superteams has facilitated diverse projects like 3D e-commerce model generation, advertising creative generation, enterprise-grade RAG pipelines, geospatial applications, and more. Focusing on talent from India and the global South, Superteams offers competitive solutions for companies worldwide. To explore partnership opportunities, please write to founders@superteams.ai or visit this link.