GenAI
Apr 23, 2024

How to Build a RAG-Powered Interactive Chatbot with Llama3, LlamaIndex, and Gradio

Learn How to Build a RAG-Powered Interactive Chatbot with Llama3, LlamaIndex, and Gradio

How to Build a RAG-Powered Interactive Chatbot with Llama3, LlamaIndex, and Gradio
We Help You Engage the Top 1% AI Researchers to Harness the Power of Generative AI for Your Business.

Llama3 is, by far, Meta’s most sophisticated Large Language Model. It currently comes in two sizes, 8B and 70B parameters. Llama3 is much better at responding to queries, generating code, and reasoning than its previous version, Llama2. Human evaluators placed Llama3 70B way ahead of other models like Claude Sonnet, Mistral Medium, and GPT 3.5.

(Screenshot: Emilia David / The Verge)

Llama3 is fully open-source, meaning developers can start using it in their AI applications completely free of charge, and also benefit from resources provided by the open-source community model. 

Building the Chatbot

Our chatbot takes in user queries about hotels, and then returns well-researched responses based on thousands of reviews about these hotels. Users can simply ask a question like, “Which hotels have the best views?”, and expect well-versed answers that have been constructed from factual reviews left by reviewers who have stayed in these hotels and used their services. The chatbot also displays the map of the hotels - this helps users with navigation and in learning about the amenities and attractions they can find in the whereabouts of their stay.

For travel companies interested in incorporating our chatbot into their workflows, these are the benefits:

1. Convenient Research: Quick and easy hotel research.

2. Personalized Recommendations: Tailored hotel suggestions based on user preferences.

3. Visual Location Awareness: Hotel locations displayed on a map.

4. Time-Saving: Fast hotel search and filtering.

5. Improved Decision-Making: Informed hotel choices with reviews and ratings.

6. Enhanced User Experience: Engaging and interactive hotel research.

7. Increased Conversion Rates: Higher booking likelihood with personalized recommendations.

8. Competitive Advantage: Unique selling point for travel websites and apps.

9. Scalability: Efficient customer support and question answering.

10. Data Collection and Analysis: Valuable user insights for hoteliers and travel companies.

Workflow for the Chatbot

  • We used a Hugging Face Dataset that contains about 93.8K reviews of hotels in London city. 
  • The reviews were chunked into smaller sizes of 512 tokens each, and then converted into embeddings.
  • The embeddings were inserted into the default Vector Store provided by LlamaIndex. 
  • We then used the tools provided by LlamaIndex to connect the Vector Store with the LLM Llama3 for RAG-based query retrieval. 
  • The chatbot was designed using Gradio. It also features an interactive map, designed using Folium, that shows the locations of hotels mentioned in the response.

The Code

First, install all the dependencies. You can create a requirements.txt file as follows:


llama-index-embeddings-instructor
llama-index-embeddings-huggingface
llama-index-llms-ollama
gradio
folium
geopy
llama-index

Load the dataset, and convert it into Document format.


from datasets import load_dataset
from llama_index.core import Document

hotel_review_d = load_dataset('ashraq/hotel-reviews', split = 'train')
hotel_review_df = hotel_review_d.to_pandas()

documents = [Document(text=row['review'], metadata={'hotel': row['hotel_name']}) for index, row in hotel_review_df.iterrows()]

Load the embedding model. We’ll use the bge-small embedding model.


from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

We’ll use Ollama to deploy our Llama3 model locally. First install Ollama, run the server, then pull the model onto the server.


curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollma pull llama3:instruct

This will launch the Llama3 model onto localhost:11434. If in case you get an error saying the port 11434 is already in use, you can stop the Ollama server first and restart it again.


sudo systemctl restart ollama

Next create an LLM object using the deployed Llama3 Ollama instance and LlamaIndex:


from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3:instruct", request_timeout=60.0)

response = llm.complete("What is the capital of France?")
print(response)

In LlamaIndex we can use the Settings feature to set the LLM, the chunk size of the documents, and the embed model as global variables.


from llama_index.core import Settings

Settings.llm = llm
Settings.chunk_size = 512
Settings.embed_model = embed_model

Now when we create a Vector Store Index, it will automatically take into account the embed model and the chunk size as defined in the Settings variable.


from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
    documents
)

We can store the Vector Store to disk, so that we don’t have to create the index again and again. We can just load them from the disk.


index.storage_context.persist(persist_dir="hotel")

Code to reload the vector index (no need to run it now):


from llama_index.core import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="hotel")

# load index
vector_index = load_index_from_storage(storage_context)

Next we define the query engine, which is LLM-powered by the Vector Store. Again, the engine will use the LLM that is stored in the Settings variable. We keep the similarity_top_k = 10, which means that it will retrieve the top 10 most relevant documents pertaining to the query in order to prepare the context.


query_engine = vector_index.as_query_engine(similarity_top_k=10)

Next we used a Python library called geopy to extract the latitude and longitude values of all the hotels in the dataset. This will be used later to plot them on the map.


import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import random

# assume your original dataframe is called df

# extract unique hotel names into a new dataframe
hotel_names_df = pd.DataFrame(hotel_review_df['hotel_name'].unique(), columns=['hotel_name'])

# create a geolocator object
geolocator = Nominatim(user_agent="my_app")

# create two new columns for longitude and latitude
hotel_names_df['longitude'] = None
hotel_names_df['latitude'] = None

# define a function to generate a random location in London
def random_london_location():
    # London's bounding box: 51.2868° N, 0.0053° W, 51.6913° N, 0.1743° E
    lat = random.uniform(51.2868, 51.6913)
    lon = random.uniform(-0.1743, 0.0053)
    return lat, lon

# loop through each hotel name and get its lat/long info
for index, row in hotel_names_df.iterrows():
    hotel_name = row['hotel_name'] + ', London'
    location = geolocator.geocode(hotel_name)
    if location:
        hotel_names_df.at[index, 'longitude'] = location.longitude
        hotel_names_df.at[index, 'latitude'] = location.latitude
    else:
        lat, lon = random_london_location()
        hotel_names_df.at[index, 'longitude'] = lon
        hotel_names_df.at[index, 'latitude'] = lat
        print(f"Could not find location for {hotel_name}, using random location instead")

# print the resulting dataframe
print(hotel_names_df)

Now let’s define the functions to be used by the Gradio UI. The function query takes text as input and returns the response by the Llama3-powered query engine. The function generate_map takes the names of hotels as a list, then creates an HTML map using Folium to plot all the hotel locations onto it. The function interface combines these two functions such that when the user inputs a query, they receive the response and the map as output together.


import gradio as gr
import folium

# assume hotel_names_df is the dataframe with hotel names and lat/long info

def query(text):
    z = query_engine.query(text)
    return z

def generate_map(hotel_names):
    # generate map using Folium
    m = folium.Map(location=[51.5074, -0.1278], zoom_start=12)
    for hotel_name in hotel_names:
        lat = hotel_names_df[hotel_names_df['hotel_name'] == hotel_name]['latitude'].values[0]
        lon = hotel_names_df[hotel_names_df['hotel_name'] == hotel_name]['longitude'].values[0]
        folium.Marker([lat, lon], popup=hotel_name).add_to(m)
    return m._repr_html_()

def interface(text):
    z = query(text)
    response = z.response
    hotel_names = list(set([z.source_nodes[i].metadata['hotel'] for i in range(len(z.source_nodes))]))
    map_html = generate_map(hotel_names)
    return response, map_html

Below is the code for the Gradio UI. We’ll use a custom theme called Glass, and change it to dark mode. Also the map_area by default shows the map of London. 


import gradio as gr

with gr.Blocks(theme=gr.themes.Glass().set(block_title_text_color= "black", body_background_fill="black", input_background_fill= "black", body_text_color="white")) as demo:
    
    gr.Markdown("

Hotel Reviews Chatbot

") with gr.Row(): output_text = gr.Textbox(lines=20) map_area = gr.HTML(value=folium.Map(location=[51.5074, -0.1278], zoom_start=12)._repr_html_()) with gr.Row(): input_text = gr.Textbox(label='Enter your query here') input_text.submit(fn=interface, inputs=input_text, outputs=[output_text, map_area]) demo.launch(share=True)

This is how the chatbot looks:

GitHub

https://github.com/vardhanam/llama3_rag_chatbot/tree/main