GenAI
Mar 27, 2024

A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders

This article serves as a friendly introduction to Large Language Models (LLMs) for leaders looking to understand and deploy AI applications within their organizations.

A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders
We Help You Engage the Top 1% AI Researchers to Harness the Power of Generative AI for Your Business.

The discussion over Generative AI exploded last year after the massively viral growth of ChatGPT. Along with that, a number of new terms entered the regular parlance in the technology world - LLMs, Transformers, Mistral, Hugging Face, RAG, Knowledge Graph, Stable Diffusion, Vectors, LoRA, PEFT, so on and so forth. In fact, if one made a list of jargons that emerged in the last one year, it would possibly be more than what we have witnessed over a decade or more in the world of technology.

It also came with a question (thanks to Blockchain, Crypto and its associated scams): ‘is this another passing fad, or something real?’ 

Now, the reports are in. Over 70% of business executives surveyed by Gartner reported a top-down push for generative AI implementation. Another McKinsey survey showed that 75% of participants anticipated generative AI to significantly impact their industries within three years. 

In this article, I will cut through the jargon and explain Large Language Models and the typical workflow that LLM AI applications are built upon. By the end of this, you will hopefully be able to explain this latest ongoing iteration of AI to your colleagues, and decide whether to give it a spin or not. 

Let’s start at the beginning: the technology behind LLMs or Large Language Models. 

What Are Large Language Models? 

Large Language Models (LLMs) are AI models that excel at processing and generating text (or code, as we will see later). Technically, they are deep learning models built using an AI architecture that became popular after the 2017 paper: ‘Attention Is All You Need’. This architecture is known as the Transformer architecture. 

So what are Transformers, then? 

Transformers are a specific kind of neural network adept at handling sequential data, like text. This neural network model relies solely on an attention mechanism, a technique that focuses on important parts of data (like key words in a sentence). This approach is unlike the previous models which used recurrent neural networks or convolutions. 

This focus on attention allows Transformers to process sequential data, such as text, more effectively. Researchers found that not only are Transformers better at machine translation tasks where sequential data is at play, they are also faster to train.

By the way, this paper took the AI world by storm, and it is a must read for anyone looking to understand the shift in the approach that the AI community has taken towards building AI models. It's dense, complicated, but interesting.

Large Language Models Are Built through Training. But Why?

So, LLMs are built on the Transformer architecture and they are trained on massive amounts of text data. However, simply putting together a neural network won’t make it an LLM. It needs to be trained for it to actually work.  

The training process is a multi-stage process that involves feeding the model massive amounts of text data and fine-tuning its abilities. It typically involves use of GPUs and GPU clusters, and may take days, weeks or even months (especially the pre-training step). 

Here's how it happens:

  • Pre-training (Self-supervised Learning): This is the initial stage where the LLM is exposed to a vast amount of unlabeled text data like books, articles, and code. The model isn't given specific tasks but learns by predicting the next word in a sequence or filling in missing pieces of text. This helps the LLM grasp the overall structure and patterns of language. 
  • Fine-tuning (Supervised Learning): After pre-training, the LLM is focused on specific tasks through supervised learning. Here, labeled data with clear inputs and desired outputs is used. The model learns by comparing its generated responses to the correct outputs, refining its ability to perform tasks like question answering or writing different kinds of creative content.

Why do we need these two stages? After the pre-training stage, an LLM would have a good grasp of language mechanics but wouldn't necessarily understand the meaning or real-world applications of language. Think of it as a child who's learned the alphabet and can sound out words but doesn't understand the stories those words create.

Here's an example:

  • You ask the LLM to complete the sentence "The cat sat on the..."
  • After pre-training, the LLM might be good at predicting upcoming letters based on patterns. It could respond with "...mat," a common word following "the cat sat on."

However, it wouldn't necessarily understand the concept of a cat or a mat, nor the physical possibility of a cat sitting on it. It would simply be using its statistical knowledge of what word typically follows that sequence.

This is a vital thing to understand. LLMs are not magical. They are simply built on a neural network model that’s great at predicting the next word (rather, token) after it has been trained. 

At this point, it's worthwhile understanding the meaning of a ‘token’, as you would come across this in AI literature quite often. 

Tokens are the smallest units of meaning in a language. In Natural Language Processing (NLP), tokens are typically created by dividing a sentence (or a document) into words or other meaningful units like phrases based on the tokenization algorithm used. Therefore, the tokenization process involves converting text into a series of tokens.

Now, think about it. Language in our world works in a similar way. When constructing a sentence, we start with a character or a word. Once we have the first word, the next word (or character, or phrase) is chosen by us from a finite number of possible words (or characters, or phrases) that we could use, and so on. And this is how a sentence eventually comes together. 

This approach has come to be known as ‘next token prediction’, and is incredibly powerful. Researchers are applying this approach to a range of other domains, and it is giving miraculous results. For instance, researchers at Meta recently used this technique to train an AI model to understand a 3D scene.

A 3D scene, if you think about it, can also be sequential. You have a wall, next to which you have a door, above which you have a ceiling, and below which you have a floor. So, if an AI model is able to predict the next word, why can’t it predict the next object in a scene? 

A similar example is that of code. Code is entirely sequential. When you train LLMs on code, they learn to predict the next ‘token’, in a very similar way that it learned to predict the next word. 

Now, if pre-training has already trained the LLM to predict the next word (or, token), why do we need to ‘fine-tune’ it? 

Why Are LLMs Fine-Tuned? 

This is where things get interesting. Once you have pre-trained an LLM, it has the ability to write a sentence (or a paragraph, say). However, that is where its abilities stop. That sentence can be in any domain, any subject, anything whatsoever, and not necessarily on any specific topic. 

For an LLM to follow instructions, it needs to be fine-tuned. Example: asking ChatGPT to create the text of an email for you, or content for a deck to pitch your company. Or, a content writer asking the LLM to write the story of the dotcom crash. Fine-tuning enables an LLM to follow instructions. For many models, therefore, researchers release both the base model and the instruct tuned models (which has been fine-tuned to follow instructions).

During fine-tuning, a labeled dataset is used, which has clear inputs and desired outputs that the LLM should generate. For example, to train a question-answering LLM, the data would include questions and their corresponding correct answers.

Now once labeled data has been prepared, the LLM is presented with that data. As it has been pre-trained, it generates outputs based on its pre-trained knowledge. These outputs are programmatically compared to the desired outputs in the labeled data. Any discrepancies between the LLM’s outputs and the desired outputs are calculated as errors. This, then, is used to adjust its internal parameters (essentially, weights and biases within the neural network) through a process called backpropagation. 

So, during fine-tuning, the LLM parameters get adjusted to create outputs that are closer to the desired output. 

Let’s look at a fine-tuning dataset to understand this a bit better. One of the most popular ones is Dolly-2K. Dolly-2K has four fields for every row of data (and there are 15,6000 rows), as you can see in the image below. 

  • Instruction: This refers to the input or prompt given to the language model, guiding it on what type of response is expected. It sets the context for the model to generate an appropriate response based on the provided instruction.
  • Context: The context in the dataset provides additional information or background that helps the model better understand the instruction and generate a relevant response. It offers a more comprehensive setting for the language model to process the given instruction effectively.
  • Response: The response is the output generated by the language model in reaction to the given instruction and context. It represents the language model's answer or completion based on the input it receives.
  • Category: The category in the dataset categorizes the type of question or instruction provided. It helps organize the dataset into different groups based on the nature of the tasks, such as open-ended questions, multiple-choice questions, summarization tasks, brainstorming prompts, classification queries, creative writing prompts, or information extraction requests. 

The fine-tuning process, literally, teaches the LLM to adjust its weights so that based on the instruction provided, and the context given, it generates a response similar to the fine-tuning dataset responses. 

Which Domains Can You Fine-Tune LLMs for?

This is the thing about LLMs - as long as you can create nicely labeled data, you can fine-tune LLMs for any domain. Healthcare, legal, education, banking, you name it. 

The real challenge is the dataset creation. If you can figure out a way to create the dataset, fine-tuning an LLM is actually a fairly simple process

Why is it so challenging to create a dataset? Let’s take the example of the finance domain. To fine-tune an LLM for that domain, you need to create a dataset similar to the one you saw in Dolly-2K above, explaining specific terms in finance. Where will you get that data from? 

If you want to use internet data, you would have to scrape websites, and perhaps end up violating a number of laws. If you use data from books, once again, you are breaching copyright laws. If you put human beings at task, you would have to spend considerable time and effort. 

One of the ways that has become popular recently is to use an AI platform like ChatGPT to generate a synthetic dataset. You prompt ChatGPT in the right way, and it can generate nicely labeled data for you. This may suffice for simple fine-tuning. 

However, if you want to fine-tune an LLM on your company’s internal data and terminology, which even ChatGPT is not aware of, then this dataset wouldn’t suffice. You can also use a combination of a synthetically generated dataset and human effort. 

This is why companies with access to vast amounts of data can have significant leverage in the AI era. They can use this data to train AI models that effectively work for their use-case and get a leg-up over their competitors, who may not have access to similar amounts of data. The key consideration here for you is: does your organization have access to data that you can put together quickly? 

In any case, the key challenge in fine-tuning is the dataset itself. Once the LLM is fine-tuned, it performs much better than only pre-training. 

Deterministic vs Probabilistic Machines

LLMs are trained to generate desired outputs, but you cannot control what it will eventually generate. Yes, this is where things get interesting. 

Historically, our programming models have been mostly deterministic. You have an SQL database with tables of data, you query it, it gives you the same result every time you run the query. Probabilistic models have mostly existed in domains like predictive analytics, where you are trying to predict the future and, therefore, you expect it to be probabilistic to begin with. 

But with LLMs, there is always a probability that it may not produce the accurate result you are expecting even for historical knowledge. This is so because the entire model is based on a “next token prediction”, as I explained earlier. Note the word “prediction” here. 

This is why LLMs, for now at least, are best suited for tasks where you don’t need 100% accuracy. For instance, if you are an insurance company, do not put LLM-powered chatbots directly in front of your customers. If they give wrong answers, you may end up getting into legal trouble. On the other hand, you could definitely consider using LLM-powered solutions as productivity enhancers for your customer support team.

LLMs, therefore, form great assistants or copilots. They reduce your effort, drive your productivity up, but you may need to edit or double-check what it generates. 

A World of Models and Datasets - Diving into Hugging Face

Someone recently asked me about this “strange company named Hugging Face”. Indeed, the name comes from the "hugging face" emoji, which also happens to be the company’s logo. The founders chose the name to reflect the company's mission of making AI models more accessible and friendly to humans (like a comforting hug).

Before we proceed further, it is important to understand Hugging Face, one of the most important companies in the AI domain today. 

The company is ‘most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.’ (from Wikipedia).

Here’s what Hugging Face does: 

  • Storing and sharing of machine learning models and datasets: Hugging Face is like a GitHub for machine learning. Developers can share and access all sorts of models, from large language models to computer vision models.
  • Tools for working with models: Hugging Face also has a set of powerful libraries, including the Transformers library, which makes it easier to use AI models by simplifying downloading and training.
  • Open-source collaboration: A big part of their mission is open collaboration. Many of the models, datasets and tools are open source, allowing anyone to see the code and contribute.
  • AI Community: They are building a community of machine learning developers by hosting public leaderboards to compare models, datasets, benchmarks and showcasing demos.

Anyone involved in AI development, in any form or fashion, eventually relies on Hugging Face for discovering new AI models, downloading or hosting models or datasets, for writing code that simplifies AI workflows, or for connecting with AI communities. 

Check these leaderboards for example: 

Building AI Applications - Retrieval Augmented Generation (RAG)

Now that we have gone through a bit of deep-dive into LLMs and how they work, let’s look at how AI applications are built. 

You might be wondering — if we have fine-tuned an LLM for our own use-case, why do we need anything else? Can’t we simply use it as it is? 

It turns out that most real-world applications of AI involve real-time data in some way. LLM fine-tuning is a time-consuming process, and you cannot potentially fine-tune each time there is new information. 

Let’s see some examples: 

  • A customer service chatbot would need access to latest product information, pricing, help docs, and recent customer interactions. 
  • An AI that analyzes stock market data would need updated stock prices to inform its responses.
  • An AI that offers the latest news and event updates would need to base its responses on live news.

The architecture of these AI applications, which are backed by some sort of data store, have come to be known as RAG or Retrieval Augmented Generation. 

RAG-powered AI applications have 3 key components: 

  1. The LLM: A language model like ChatGPT, Gemini, Mistral-7B, Falcon-40B etc., or could be their fine-tuned version as well.
  2. Data Store(s): Where you store the latest context / knowledge that will be used to generate LLM responses. These stores are of many kinds:some text
    1. Vector Stores: which store information based on their semantic similarity. 
    2. Knowledge Graphs: which store information as knowledge relationships or graphs, of entities and relationships.
    3. Relational Databases: classic SQL stores, where information is stored in SQL format. 
    4. Other sources: such as files.
  3. An Orchestration Framework: Such as LangChain or LlamaIndex, which helps pull it all together. 

Source: RAG by AWS

Here, you might ask – if we already have relational databases, like MySQL or Postgres, why do we need Vector Stores or Knowledge Graphs? 

Here's why Vector Stores and Knowledge Graphs are becoming important tools for AI:

  1. Unstructured Information: SQL excels at storing structured data with predefined relationships. AI applications, however, often deal with unstructured or semi-structured data, where relationships are complex or implicit. Knowledge Graphs capture these connections explicitly, allowing AI models to reason and infer new knowledge.
  2. Understanding Similarities: Finding similar items is critical for AI tasks like recommendation systems or image recognition. SQL relies on exact keyword matching, which isn't ideal for semantic similarity. Vector Stores come in here - they store data as vectors, enabling efficient searches based on similarity rather than exact matches.
  3. Scalability for High-Dimensional Data: AI models often use high-dimensional data representations. SQL struggles with managing and querying this efficiently. Vector Stores are designed for this type of data, offering fast retrieval and scalability for complex AI models.
  4. Knowledge Representation: While SQL stores data, it doesn't represent the real world's richness of entities and connections. Knowledge Graphs address this by modeling the world with entities, attributes, and relationships,  providing a deeper understanding for AI models.

Stitching It All Together

Phew, that was a lot. Let’s try and summarize all of the above. 

First, we explained the statistical “next token prediction” model that LLMs are built upon. We then discussed how they are trained: via pre-training and fine-tuning.

  • Pre-training involves exposing the LLM to massive amounts of text data to help it grasp language patterns.
  • Fine-tuning refines the LLM's abilities for specific tasks using labeled data with clear inputs and desired outputs.

We also spoke about how LLMs are inherently probabilistic and, therefore, are ideally deployed in scenarios where 100% accuracy is not mandatory. Therefore, they form great copilots or assistants, and are great productivity tools.

We also explored how LLMs are used in real-world applications like chatbots and news aggregators through a method called Retrieval Augmented Generation (RAG). RAG combines LLMs with data stores like Vector Stores and Knowledge Graphs to provide AI applications with the latest information and context.

Open Source AI vs Proprietary AI Platforms: A Note for Enterprises

We will end this article by touching upon a point of discussion that often comes up amongst enterprise leaders: should we simply use OpenAI or Gemini APIs, or should we deploy open source AI models in our infrastructure? 

The answer to this question isn’t exactly simple, and has a lot to do with the legal requirements of your organization or specific use-case.

Proprietary AI platforms offered by companies such as OpenAI, Cohere, Anthropic, and Google are incredibly powerful and suffice for a large number of use cases that organizations may have. Many startups are utilizing their APIs to build their own AI solutions. Marketers often use these platforms to streamline the content pipeline, while analysts and researchers find them effective as research tools. They also offer APIs for fine-tuning to meet your specific use case.

However, there are several scenarios where deploying open-source AI models in your infrastructure might be the right approach, particularly when compliance, data control, and predictable spending are important considerations. In recent months, several powerful open-source AI models have emerged, such as Mistral-7B, Mixtral 8x7B, Llama2, Falcon, and others, which have demonstrated performance on par with proprietary models but offer the added advantage of deep customization and control.

Here are some specific scenarios where you may need to leverage open source AI models:

1. Highly Regulated Industries

  • Financial Services and Healthcare: In industries like finance and healthcare, regulations dictate strict data privacy and security measures. Open-source AI models offer greater transparency into the inner workings of the model, allowing organizations to ensure compliance with regulations like HIPAA (healthcare data) or GDPR (EU data privacy). This transparency allows for audits and verification of how the model handles sensitive data.
  • Government Agencies: Government agencies dealing with sensitive citizen data might prefer open-source models for the same reason. The ability to inspect the code and understand how the model operates builds trust and ensures adherence to data privacy laws.

2. Data Privacy and Control Concerns

  • Limited Data Availability: Organizations with limited datasets for training AI models might benefit from open-source models. These models can be fine-tuned on their specific data while leveraging the pre-trained knowledge of the open-source model. This reduces the reliance on potentially sensitive customer data for training purposes.
  • Data Residency Requirements: Some regulations mandate that data remains within specific geographical boundaries. Open-source models, if hosted on-premise or within a compliant cloud environment, can address these residency concerns.

3. Cost Considerations and Predictable Spending

  • Startups and Resource-Constrained Businesses: For startups or businesses with limited budgets, open-source AI models offer a cost-effective way to leverage AI capabilities without hefty licensing fees. The predictable costs associated with open-source models can be factored into budgets more easily.
  • Research and Development: In research and development settings, open-source models provide a foundation for experimentation and customization. Researchers can build upon existing open-source models, adapt them to specific needs, and contribute their modifications back to the community, fostering collaboration and innovation.

A Note about Superteams.ai

When organizations attempt to build AI applications or deploy their own private AI models, they often realize the need for an in-house AI team, which can be prohibitively expensive.

Deploying and maintaining open-source models necessitate in-house AI expertise, requiring a team of data scientists, engineers, and project managers skilled in AI.

This is where Superteams.ai comes into play. We provide fully managed, fractional AI teams capable of building proof-of-concept solutions in partnership with organizations for a variety of use-cases, all at a fraction of the cost of hiring a full in-house AI team. This approach allows organizations to test the waters, gain understanding, and then decide whether to invest in building a full team in-house. The best part is that we have partnerships with some of the top AI startups worldwide, allowing us to bring their expertise and insights into the solutions we build for your organization. 

Reach out to us to learn more.